# The Emissions Story

Lately there has been a lot of chatter about emissions and their impact on the environment. The main culprit we hear in the news is CO2. Being someone who lives in a province that has a huge chunk of its economic activity in the business of fossil fuels production, I am examining the emissions data to understand if indeed CO2 should be our focus.

In [1]:
# Import the necessary libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set

<function seaborn.rcmod.set(context='notebook', style='darkgrid', palette='deep', font='sans-serif', font_scale=1, color_codes=True, rc=None)>

## Exploring the data

In [2]:
# Read data files

ghg_data = pd.read_csv('greenhouse-gas-emissions-by-gas.csv')
ghg_pot = pd.read_csv('global-warming-potential-of-greenhouse-gases-over-100-year-timescale-gwp.csv')

In [5]:
# Get some basic information about the data we have

print(ghg_data.info())
ghg_data.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13291 entries, 0 to 13290
Data columns (total 9 columns):
Entity                           13291 non-null object
Code                             10805 non-null object
Year                             13291 non-null int64
SF₆ gases (tonnes)               709 non-null float64
PFC gases (tonnes)               704 non-null float64
HFC gases (tonnes)               703 non-null float64
Nitrous oxide (N₂O) (tonnes)     10757 non-null float64
Methane (CH₄) (tonnes)           10714 non-null float64
Carbon Dioxide (CO₂) (tonnes)    12255 non-null float64
dtypes: float64(6), int64(1), object(2)
memory usage: 934.6+ KB
None


Unnamed: 0,Entity,Code,Year,SF₆ gases (tonnes),PFC gases (tonnes),HFC gases (tonnes),Nitrous oxide (N₂O) (tonnes),Methane (CH₄) (tonnes),Carbon Dioxide (CO₂) (tonnes)
0,Afghanistan,AFG,1960,,,,,,414371.0
1,Afghanistan,AFG,1961,,,,,,491378.0
2,Afghanistan,AFG,1962,,,,,,689396.0
3,Afghanistan,AFG,1963,,,,,,707731.0
4,Afghanistan,AFG,1964,,,,,,839743.0


In [6]:
ghg_data.describe()

Unnamed: 0,Year,SF₆ gases (tonnes),PFC gases (tonnes),HFC gases (tonnes),Nitrous oxide (N₂O) (tonnes),Methane (CH₄) (tonnes),Carbon Dioxide (CO₂) (tonnes)
count,13291.0,709.0,704.0,703.0,10757.0,10714.0,12255.0
mean,1987.936875,2888988.0,1560128.0,11646050.0,111428700.0,264307500.0,735993800.0
std,15.536907,13512290.0,6400114.0,60749580.0,323747500.0,803446500.0,2617749000.0
min,1960.0,0.0,0.0,0.0,0.0,0.0,-80674.0
25%,1975.0,0.0,0.0,0.0,525512.0,1629290.0,964421.0
50%,1988.0,0.0,0.0,22300.0,4349734.0,9848605.0,11463040.0
75%,2001.0,386000.0,316625.0,768500.0,28490360.0,69944720.0,143107100.0
max,2014.0,174905400.0,78622310.0,834345600.0,3260053000.0,8014067000.0,36138280000.0


In [14]:
# Rename the columns names
ghg_data.rename(columns = {'SF₆ gases (tonnes)':'SF6', 'PFC gases (tonnes)':'PFC', 'HFC gases (tonnes)':'HFC', 
                           'Nitrous oxide (N₂O) (tonnes)':'N2O', 'Methane (CH₄) (tonnes)':'CH4', 
                           'Carbon Dioxide (CO₂) (tonnes)':'CO2'}, inplace = True)

In [8]:
# Check how many NaN values in each column
ghg_data.isna().sum()

Entity                               0
Code                              2486
Year                                 0
SF₆ gases (tonnes)               12582
PFC gases (tonnes)               12587
HFC gases (tonnes)               12588
Nitrous oxide (N₂O) (tonnes)      2534
Methane (CH₄) (tonnes)            2577
Carbon Dioxide (CO₂) (tonnes)     1036
dtype: int64

Let's examine the code column for the missing values and correct them

In [12]:
code_isna = ghg_data[ghg_data['Code'].isna()]

# Get unique values of the entities to understand the nature of the values
code_isna['Entity'].unique()

array(['Arab World', 'Caribbean small states',
       'Central Europe and the Baltics', 'Early-demographic dividend',
       'East Asia & Pacific', 'East Asia & Pacific (IDA & IBRD)',
       'East Asia & Pacific (excluding high income)', 'Euro area',
       'Europe & Central Asia', 'Europe & Central Asia (IDA & IBRD)',
       'Europe & Central Asia (excluding high income)', 'European Union',
       'Fragile and conflict affected situations',
       'Heavily indebted poor countries (HIPC)', 'High income',
       'IBRD only', 'IDA & IBRD total', 'IDA blend', 'IDA only',
       'IDA total', 'Late-demographic dividend',
       'Latin America & Caribbean',
       'Latin America & Caribbean (IDA & IBRD)',
       'Latin America & Caribbean (excluding high income)',
       'Least developed countries: UN classification',
       'Low & middle income', 'Low income', 'Lower middle income',
       'Middle East & North Africa',
       'Middle East & North Africa (IDA & IBRD)',
       'Middle East & 

The values include various subsets of the total based on categories such as regions, income levels, level of development. Since we don't need these right now, they would be dropped from the main GHG data.

In [13]:
ghg_data.dropna(subset = ['Code'], inplace = True, axis = 0)
#ghg_data[ghg_data['Code'].isna()]

Entity                           0.0
Code                             0.0
Year                             0.0
SF₆ gases (tonnes)               0.0
PFC gases (tonnes)               0.0
HFC gases (tonnes)               0.0
Nitrous oxide (N₂O) (tonnes)     0.0
Methane (CH₄) (tonnes)           0.0
Carbon Dioxide (CO₂) (tonnes)    0.0
dtype: float64

There should be only countries and their GHG contributions left. Next the gasses SF6, PFC and HFC have a lost of missing data. Let's check if there is a particular reason.

In [16]:
ghg_data[ghg_data['SF6'].isna()]

Unnamed: 0,Entity,Code,Year,SF6,PFC,HFC,N2O,CH4,CO2
0,Afghanistan,AFG,1960,,,,,,414371.0
1,Afghanistan,AFG,1961,,,,,,491378.0
2,Afghanistan,AFG,1962,,,,,,689396.0
3,Afghanistan,AFG,1963,,,,,,707731.0
4,Afghanistan,AFG,1964,,,,,,839743.0
...,...,...,...,...,...,...,...,...,...
13285,Zimbabwe,ZWE,2009,,,,4192006.00,8377390.0,5603176.0
13287,Zimbabwe,ZWE,2011,,,,4228947.77,8504705.0,9563536.0
13288,Zimbabwe,ZWE,2012,,,,4270818.54,8588910.0,7792375.0
13289,Zimbabwe,ZWE,2013,,,,,,11675728.0
