# Complete Pipeline

With Python, we can collect each of the individual processing components into one location and use them to easily recreate our entire pipeline. First of all, let's move all of the indivual functions that we put together into one file called `project.py`. In this we place, `process_co2`, `process_income`, `process_population`, `process_continents`, `merge_data` (and the supporting function `match`), and `plot_emissions_gdp`. We also include at the top of that file any imports that we need to make those functions work including pandas (and the `difflib` tools we used to do the country text matching).

Now, we can simply import this `project.py` file and have access to any of those functions within the corresponding namespace. For example, if we run `import project` we can then call `process_income` by `running project.process_income()`. Let's use this to rerun our entire project in less than a dozen lines of code. First, we'll list all of the paths to our data sources for inputs and where to store the intermediate and final data products:

In [1]:
# Source data files
raw_co2_file = "data/co2_emissions_tonnes_per_person.csv"
raw_income_file = "data/income_per_person_gdppercapita_ppp_inflation_adjusted.csv"
raw_population_file = "data/population_total.csv"
raw_continents_file = "data/united_nations_continents.csv"

# Intermediate files
co2_file = 'data/intermediate/co2.csv'
income_file = "data/intermediate/income.csv"
population_file = "data/intermediate/pop.csv"
continents_file = "data/intermediate/continent.csv"

# Final files
merged_data_file = 'data/intermediate/data.csv'
plot_file = 'img/finished-product.png'

In [None]:
import project

project.process_co2(raw_co2_file, co2_file)
project.process_income(raw_income_file, income_file)
project.process_population(raw_population_file, population_file)
project.process_continents(raw_continents_file, continents_file)
project.merge_data(co2_file, income_file, population_file, continents_file, merged_data_file)
project.plot_emissions_gdp(merged_data_file, plot_file)

Let's show the resulting plot from this process:

![](img/finished-product.png)

For the approach to work, you'll need to make sure that `project.py` is in your current path. You can check what your current path is by executing the following code:

In [None]:
import os
print(os.getcwd())

The path above should contain your `.py` file so that Python can access it.

This approach of collecting components of a project for reuse is exceptionally valuable. It allows you to rerun the analysis with ease and makes the analysis easier to share with others and collaborate on the development together. 

There are often multiple ways to solve a problem, and what was presented here is but one. For example, you may choose not to save each intermediate product to file as we did here for co2, income, population, etc. Instead, you may directly feed the data in memory from one stage of the analysis to the next. I can't stress enough how important checking your data at every step of the process and inspecting it can be. We all make mistakes and instituting tests and checks to ensure the quality of each step will greatly enhance your ability to be productive with programming tools for data-intensive applications.