# ghg_analytics
## Insights into trends with global greenhouse gases and climate change action.

EPA1333 – Computer Engineering for Scientific Computing Final Project

### Authors
Group 6:
* Aashna Mittal
* Gamze Ünlü
* Jason R Wang

## Executive Summary

In this analysis, we analyzed the implications of United Nations Framework Convention on Climate Change (UNFCCC) member states' Nationally Determined Contributions (NDCs). We compared their NDCs, firstly, emission targets set out by the 2015 Paris Agreement, and secondly with their current and historical greenhouse gas emissions. We also examined countries' contributions to the Green Climate Fund.

_Brief analysis of what we found: which countries are doing well? Which countries need to do more?_

## Introduction

Anthropogenic climate change was first introduced into the global political arena as the United Nations Framework on Climate Change Convention (UNFCCC) in 1992. Since then, other international agreements have continued to refine mitigation action. The [United Nations Sustainable Development Goal 13](https://sustainabledevelopment.un.org/sdg13), 'Take urgent action to combat climate change and its impacts*', specifically targets this global issue.

At 19th Conference of the Parties (to the UNFCCC) in 2013 in Warsaw, the UNFCCC members agreed to submit "Intended Nationally Determined Contributions" (INDCs) to signal what each country's greenhouse gas emission targets would be. At the 21st Conference in 2015, the Paris Agreement formalized these _intended_ emissions into simply "Nationally Determined Contributions" (NDCs).

Furthermore, the signatories to the Paris Agreement (which includes all UNFCCC signatories, and therefore, all UN member nations) have agreed to maintain global warming to 2ºC, but preferrably 1.5ºC, above pre-industrial levels. This Notebook intends to analyze the NDCs to estimate their potential to reach these temperature goals.

For some nations, these NDCs set a net reduction. For industrializing nations, they are simply lower than a calculated 'business-as-usual' (BAU) scenario.

![](https://i.imgur.com/mHjPRPo.png)

## Methodology

Our methodology during the analysis follows the below steps: 

- Gathering data sources 
- Data cleaning
- Selecting the data to use for analysis
- Visual inference 
- Calculating guilt index for the recommendations 

In [None]:
# Library imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from numpy import NaN
plt.style.use('ggplot')

## 1. NDCs and Temperature Targets

To determine the impact of NDCs, we need to first understand their context by answering the following questions:
1. What do global emissions look like today? (and what datasets can we rely on?)
2. If nothing changes, and the world continues doing _business as usual_ (BAU), how will the world look like in 2030?

Then, we can examine how NDCs compare:
3. If all NDCs are met, what will the total amount of emissions be?
4. What emission amounts are required to meet temperature targets?

Finally, all of this will be compared in 1.5.

_Note: Emissions are quantified in units of 'megatons of carbon dioxide-equivalent per year' [MtCO2e/yr] because the strength all greenhouse gases are measured relative to carbon dioxide and because the carbon cycle is a process. Global climate targets assume that natural GHG uptake will continue steadily, so reducing the rate of emissions from countries will lead to a net decrease in the concentration of GHGs in the atmosphere._

### 1.0 Import and Clean Greenhouse Gas Emission Data

Many organizations maintain databases of current (and historical) GHG emissions. The primary data source for most of them is from the UNFCCC's reporting window; each UNFCCC member submits annual 'GHG Inventories', which track national emissions with two-year delay.

The World Bank, the World Resources Institute, and the Potsdam Institute for Climate all have published datasets based on varying methodology. Below is an explanation of our approach to finding a valid set to perform further analysis on.

First examine WB data since it is most conveniently accessible and matches other easily-comparable datasets. The WB set records GHG information in _ktCO2e_, includes natural emissions, and has data from 1970-2012 and uses global warming potentials from the IPCC AR2 report.
    
Source: https://data.worldbank.org/indicator/EN.ATM.GHGT.KT.CE?view=chart

In [None]:
# Import World Bank data on GHGs
ghgDf_WB = pd.read_csv("data/GreenhouseGasData.csv", sep=',', skipinitialspace=True, skiprows=4, index_col=1) 

# Drop the indicator name and indicator code as the values are same across the whole dataframe
ghgDf_WB = ghgDf_WB.drop(["Indicator Code", "Indicator Name"], axis = 1)

# Drop all the columns that contain only null values
ghgDf_WB.dropna(axis = 1, how="all", inplace=True)  

# Drop all the rows that contain only null values, starting from column 2
ghgDf_WB.dropna(axis = 0, how="all", subset = ghgDf_WB.columns[2:], inplace= True)  

# Interpolate missing values and then use backfill to fill starting NA values of a row
ghgDf_WB.iloc[:,2:] = ghgDf_WB.iloc[:,2:].interpolate(axis = 1).bfill(axis=1)

# Convert all emissions data into MtCO2e
ghgDf_WB.iloc[:,1:] = ghgDf_WB.iloc[:,1:].divide(1000)

In [None]:
# View the cleaned WB GHG DataFrame
ghgDf_WB.head()

We create a dictionary for simplicity with key as country codes and values as the country names. Since the World Bank's naming scheme matches the [ISO-3166-1 standard](https://en.wikipedia.org/wiki/ISO_3166-1) for country names and country codes, it will be used as the reference list.

In [None]:
countryDictionary = dict( ghgDf_WB.reset_index().set_index('Country Code').iloc[:,0] )
list(countryDictionary.items())[:5]

Interestingly, the World Bank's data spans from 1970 to 2012. The CAIT greenhouse gas data spans 1990 to 2014. It would be interesting to see as large a temporal range as possible. Note that the UNFCCC started recording emissions data from 1990.

The CAIT Excel workbook also contains another sheet with total CO2 emissions from 1850 to 2014. This may also be interesting for analysis.

In [None]:
# For both sets, use the ISO code as the index because it follows the ISO-3166 standard, unlike the country names!
# GHGs from 2013 and 2014
ghgDf_CAITghg = pd.read_excel("data/wri/CW_CAIT_GHG_Emissions_31102017_ISO.xlsx",
                               sep=',', sheet_name='GHG Emissions', skipinitialspace=True, skiprows=1, index_col=1)

# CO2 emissions from 1850
ghgDf_CAITco2 = pd.read_excel("data/wri/CW_CAIT_GHG_Emissions_31102017_ISO.xlsx",
                               sep=',', sheet_name='CO2 Total Emissions', skipinitialspace=True, index_col=1).dropna()

In [None]:
ghgDf_CAITghg.head()

But since we already have a large set of World Bank greenhouse gas data, we shall first attempt to append the CAIT greenhouse gas data onto it. To do so, we must filter, clean, and structure the 2013 and 2014 years that we want and the section that we want into the same format that the World Bank uses.

In [None]:
ghgDf_CAITghg1314 = ghgDf_CAITghg[ (ghgDf_CAITghg['Year'] == 2013) | (ghgDf_CAITghg['Year'] == 2014) ]\
    .loc[:,['Year','Total GHG Emissions Including Land-Use Change and Forestry (MtCO₂e‍)']]

# Pivot the table to be in the same format as the World Bank data, which is in a nicer format
# (since we are only looking at total emissions).
ghgDf_CAITghg1314 = ghgDf_CAITghg1314.pivot(columns='Year',
                  values='Total GHG Emissions Including Land-Use Change and Forestry (MtCO₂e‍)')

ghgDf_CAITghg1314.head()

Before we merge the datasets, it is important to see how the two sets of data might align. Do the countries match? Are the greenhouse gases quantified in the same way?

In [None]:
# These countries are in CAIT data but not the World Bank's:
ghgDf_CAITghg.loc[ ghgDf_CAITghg1314.index[~ghgDf_CAITghg1314.index.isin(ghgDf_WB.index)], 'Country' ].unique()

In [None]:
# These countries are in World Bank data but not CAIT's:
[countryDictionary[i] for i in ghgDf_WB.index[~ghgDf_WB.index.isin(ghgDf_CAITghg1314.index)] ]

Clearly, there are some discrepancies. There is some CAIT data for smaller states that do not appear in the World Bank's data. The World Bank includes many regions that CAIT does not care for. 'World' shows up in both but uses a different Code ('World' is not in ISO-3166 since it is not a country).

But, for the rest of the ISO-3166 countries, we can join the datasets.

In [None]:
# Join by matching index. Recall that we pivoted ghgDf_CAITghg1314 to be in the same format as the World Bank data.
ghgDf_merged = ghgDf_WB.join(ghgDf_CAITghg1314)

And, while we are working with this set, we should drop the regions that are only in the World Bank's Dataset.

In [None]:
#Import the dataframe which contains the codes of country group aggregates
CountryGroupCodes = pd.read_excel("data/CountryGroups.xls", sheet_name = "List of economies", skiprows=226, header = None)
CountryGroupCodes.dropna(how="all", inplace=True, axis=1)
CountryGroupCodes.drop(columns=0,inplace=True)
CountryGroupCodes.dropna(how="all", inplace=True, axis=0)
CountryGroupCodes.columns = ["Aggregate Name", "Aggregate Code"]
CountryGroupCodes.head()

In [None]:
# Drop the rows corresponding to aggregate country codes from existing dataframe to create a new country dataframe 
ghgDf_merged = ghgDf_merged.drop(CountryGroupCodes["Aggregate Code"].values)

# And let's fill in missing data using `interpolate`.
ghgDf_merged.iloc[:,2:] = ghgDf_merged.iloc[:,2:].interpolate(axis = 1).bfill(axis=1)

# Convert all emissions data into MtCO2e
ghgDf_merged.iloc[:,1:] = ghgDf_merged.iloc[:,1:].divide(1000)

ghgDf_merged.head()

Lastly, let's add in the Potsdam Institute for Climate's (PIK) PRIMAP data, which also interpolates in years where data is missing for countries like we did.

>Gütschow, Johannes; Jeffery, Louise; Gieseke, Robert; Gebel, Ronja (2018): The PRIMAP-hist national historical emissions time series (1850-2015). V. 1.2. GFZ Data Services. http://doi.org/10.5880/PIK.2018.003

In [None]:
ghgDf_PIK = pd.read_csv('data/primap-hist_v1/PRIMAP-hist_v1.2_14-Dec-2017.csv')
ghgDf_PIK = ghgDf_PIK.rename(columns = {'country': 'Country Code'}).drop(columns=['scenario'])

This data set only has country codes and not country names. Fortunately, since it uses standardized names, the World Bank maintains a matching sheet.

In [None]:
# Drop the columns with countries not in the WB database
ghgDf_PIK = ghgDf_PIK[ ghgDf_PIK['Country Code'].isin(countryDictionary.keys()) ]

In [None]:
# The PIK data doesn't come with names, so let's add them.
ghgDf_PIK['Country Name'] = [ countryDictionary[i] for i in ghgDf_PIK['Country Code'] ]

In [None]:
# From the user guide file included with this database, we only want:
# scenario = 'HISTORY' and category = 'CAT0' (all emissions including LULUCF).
# Luckily, the country codes are in ISO format and the format is otherwise similar
# to the World Bank's dataset. 'Country Name' is also used as an index to match WB.

ghgDf_PIK = ghgDf_PIK.set_index(ghgDf_PIK['Country Code'])\
                .query("category == 'CAT0'").query("entity == 'KYOTOGHG'")\
                .drop(columns=['Country Code','category','entity','unit'])


In [None]:
# Convert from GgCO2e (same as KtCO2e) to MtCO2e
ghgDf_PIK.iloc[:,:-1] = ghgDf_PIK.iloc[:,:-1].divide(1000)

In [None]:
ghgDf_PIK.head()

1.1 Data Selection

Now, let's see how all the data compare. Let's take the simple case of world emissions from the World Bank, the CAIT databases, and from PIK.

In [None]:
from ipywidgets import interact
@interact( x1=(1850,2014), x2=(1851,2015) )
def h( x1=1970, x2=2015 ):
    CAIT_world = ghgDf_CAITghg\
        .loc['WORLD',['Year','Total GHG Emissions Including Land-Use Change and Forestry (MtCO₂e‍)']]
    plt.plot(CAIT_world['Year'].values,CAIT_world.iloc[:,1])

    WB_world = ghgDf_WB.loc[['WLD']].melt(id_vars='Country Name',var_name='Year')
    plt.plot(WB_world.iloc[:,1].astype(int).values,WB_world.iloc[:,2])

    CAIT_world_co2 = ghgDf_CAITco2.loc['WORLD']
    plt.plot(CAIT_world_co2['Year'].values,CAIT_world_co2.iloc[:,-1])

    PIK_world = ghgDf_PIK.iloc[:,:-1].sum()
    plt.plot( PIK_world.index.astype(int) ,PIK_world.values)
    
    plt.xlim([x1,x2])
    plt.xlabel('Year')
    plt.ylabel('Emissions [GtCO2e]')
    plt.title('Emissions [GtCO2e/yr] from All GHG Data Sources')
    plt.legend(['CAIT - All GHGs','World Bank - All GHGs','CAIT – CO2','PIK - All GHGs'])
    plt.figure(figsize=(10, 10), dpi=80)
    plt.show()

Unfortunately, there are discrepancies in the data that overlaps between these countries in all of these datasets! Note the huge change between 2012 and 2013 data between the World Bank and CAIT data. Therefore, merging them really isn't a good idea.

Upon further investigation, these discrepancies are rooted in the different methodologies. Looking deeper in both datasets' sources, they both use CO2 emissions from the International Energy Agency (IEA) but other sources separately too.

* World Bank: Uses IEA and their own independent research ([World Bank Methodology](http://edgar.jrc.ec.europa.eu/methodology.php))
* CAIT: Uses the "IEA source for CO₂ emissions from fossil fuel combustion from 1971 to 2011, and draws the remaining CO₂ and non-CO₂ emissions data from a variety of other sources including CDIAC, U.S. EPA, and FAO." ([CAIT Methodology](http://cait2.wri.org/faq.html))
* PIK: Consolidates many published datasets similar to the above (see section 3.1 of Nabel et al.). ([PIK Methodology](http://dataservices.gfz-potsdam.de/pik/showshort.php?id=escidoc:2959897) | [Nabel et al. (2011)](https://doi.org/10.1016/j.envsoft.2011.08.004))

Typical discrepancies relate to:
* Accounting for biomass emissions (some forms of biomass is treated as 'biogenic' and counted as zero)
* Natural fires and other land-based occurences, which are incredibly difficult to count.

**Therefore, going forward, we are going to use only the PIK GHG data from 1850 to 2014. It is hte most comprehensive, within the range of the others (through visual inspection), and is equally valid as the others in that it is used by authorities and decision-makers around the world.**

### 1.2 BAU Forecasts

The Climate Watch dataset, which is related to the CAIT data, at https://climatewatchdata.org (maintained by the World Resource Institute and supported by other organizations) includes the Global Change Assessment Model (GCAM), which includes a 'no policy' scenario for global emissions.

The file `GCAM.xlsx` was manually saved into `GCAM.csv`, since only the last sheet was important.

In [None]:
# Use index_col=2, the region, as the index. Drop the Model column, since it is the same across the whole Df.
ghgForecast_GCAM = pd.read_excel("data/wri/Pathways/GCAM.xlsx",
                                 sheet_name = "GCAM_Timeseries data",index_col=2)\
                    .drop(['Model'],axis=1)
ghgForecast_GCAM.info()

In [None]:
# For sake of consistency, we should match these names up with the ISO codes.
# Since this data is from CAIT, hopefully we can match names and ISO codes with `ghgDf_CAITghg`
ghgForecast_GCAM[ghgForecast_GCAM.index.isin(ghgDf_CAITghg['Country'])].index.unique()

In [None]:
# And for completeness, check in with the World Bank country names:
GCAMinWB = ghgForecast_GCAM[ghgForecast_GCAM.index.isin(countryDictionary.values())].index.unique()
GCAMinWB

In [None]:
# Invert our earlier dictionary of country names and codes
countryDictionaryInv = {v: k for k, v in countryDictionary.items()}

In [None]:
# The United States is in the World Bank list. We do not need South Asia, but it will filter itself out later.
ghgForecast_GCAM.loc[ GCAMinWB, 'Country Code'] = \
[ countryDictionaryInv[i] for i in ghgForecast_GCAM.loc[ GCAMinWB ].index ]

# Drop everything else
ghgForecast_GCAM.dropna(axis='rows',subset=['Country Code'],inplace=True)
ghgForecast_GCAM = ghgForecast_GCAM.set_index('Country Code')

In [None]:
ghgForecast_GCAM.head()

Filter the dataset for just the information we're looking for: the 'No policy' scenario and for total GHG emissions. Note that all the emissions are in [MtCO2e/yr] format already.

In [None]:
ghgForecast_GCAM_BAU_all = \
    ghgForecast_GCAM[ (ghgForecast_GCAM['Scenario'] == 'No policy') &
                    (ghgForecast_GCAM['ESP Indicator Name'].str.startswith('Emissions|GHG')) ] \
                    .drop(columns=['Scenario','ESP Indicator Name','Unit of Entry'])

ghgForecast_GCAM_BAU_all.head()

In [None]:
# This dataset gives projects by each type of greenhouse gas but in CO2e. These need to be merged for each country.
ghgForecast_GCAM_BAU_all = ghgForecast_GCAM_BAU_all.reset_index()

ghgForecast_GCAM_BAU = {}

for i in ghgForecast_GCAM_BAU_all['Country Code'].unique():
    ghgForecast_GCAM_BAU[i] = ghgForecast_GCAM_BAU_all[ghgForecast_GCAM_BAU_all['Country Code']==i].sum()['2005':]

ghgForecast_GCAM_BAU = pd.DataFrame(ghgForecast_GCAM_BAU).T

In [None]:
ghgForecast_GCAM_BAU.head()

In [None]:
print('The total projected GHG emissions for the no policy scenario in 2030 is: {:.2f} MtCO2e/yr.'.format(
    ghgForecast_GCAM_BAU.loc['WLD','2030'] ) )

### 1.3 NDCs in 2030

The same ClimateWatch source contains NDCs in the format:

    ISO Country Code, Country Name, Goal Year, Value (in MtCO2e/yr), if goal is a range, and the type of goal.

In [None]:
NDCsDf_raw = pd.read_csv('data/wri/CW_NDC_quantification_April30.csv')
NDCsDf = NDCsDf_raw.dropna(axis=0).drop(328) #328 is a mis-entry, as determined through inspection

# Check data input
NDCsDf.head(5)

Some countries' NDCs are given as a range. For simplicity, this analysis will only examine the mean of that range.

In [None]:
rangedIndices = NDCsDf[NDCsDf['Range'] == 'Yes'].index

# Note that each range is a pair
for i in range(0,len(rangedIndices)-1,2):    
    NDCsDf.loc[rangedIndices[i],'Value'] = (
        (NDCsDf['Value'][rangedIndices[i]] + NDCsDf['Value'][rangedIndices[i+1]])/2
    )
    
# Drop the column 'Range', since it is not really needed anymore,
# and drop the EU-28 (since they have been disaggregated by country).
NDCsDf = NDCsDf.drop(labels=rangedIndices[1::2], axis=0).drop(labels='Range', axis=1)
NDCsDf = NDCsDf.drop(index=NDCsDf.loc[NDCsDf['ISO'] == 'EU28'].index.values, axis=0)

From here on, the situation in 2030 will be the primary focus. Where countries have not submitted data for 2030, the furthest value is used. Furthermore, the best case where the higher goal between choices (e.g. uncondintional if both it and conditional exist) is taken.

Note that the EU, which is collectively a large emitter, has only submitted NDCs for 2020.

In [None]:
NDC_byCountry = []

for i in NDCsDf['ISO'].unique():
    NDC_byCountry.append(NDCsDf[NDCsDf['ISO']==i]['Value'].min() )

In [None]:
NDCs_clean = pd.DataFrame({'Country': NDCsDf['ISO'].unique(),'Goal':    NDC_byCountry})
NDCs_clean = NDCs_clean.set_index('Country')

In [None]:
print('If this best case, where all NDCs are met, then the 2030 emissions will be {:.2f} MtCO2e/yr.' \
     .format( NDCs_clean.values.sum() ))

### 1.4 Comparison of NDCs with Required Temperature Targets

Before we can directly compare NDCs to global emissions and targets, we have to filter some data. Not every country has submitted NDCs – as of 2018-10-23, only 177 of 195 UNFCCC members. For those have not yet submitted NDCs, they will be given the benefit of the doubt; the global pathways projections should also filter out the countries that have not yet submitted NDCs.

The Intergovernmental Panel on Climate Change (IPCC) recently released a report about emissions pathways required to reach 1.5ºC of warming. They noted that "all but one" model require emissions reduce to *at most 35 GtCO2e/yr by 2030*. Most pathways require *at most 50 GtCO2e/yr* in 2030 for 2.0ºC of warming.

>IPCC. (2018). IPCC special report on the impacts of global warming of 1.5 °C - Summary for policy makers. Retrieved from http://www.ipcc.ch/report/sr15/

In [None]:
ghgDf_CAITghg.pivot(columns='Year',
                  values='Total GHG Emissions Including Land-Use Change and Forestry (MtCO₂e‍)').head()

In [None]:
countriesWithNDCs = NDCsDf['ISO'].unique()

# Divide the 'World' values used in the GCAM projection by filtered actual GHG emissions in 2005 and 2010
convFactor = ghgForecast_GCAM_BAU.loc['WLD',['2005','2010']] \
            / ghgDf_PIK.reindex(countriesWithNDCs).loc[:,['2005','2010']].sum().values

# Take the average conversion factor between 2005 and 2010
convFactor = convFactor.mean()
convFactor

In [None]:
plt.plot(2015,ghgDf_PIK['2015'].sum()/convFactor,'o',
         2030,NDCs_clean.values.sum(),'o',
        2030,35000,'o')
plt.xticks(range(2010,2031,5))
plt.xlabel('Year')
plt.ylabel('GHG Emissions per Year [MtCO2e/yr]')
plt.title('Comparison of GHG Emissions per Year, NDCs,\n No-policy GCAM Pathway, and 1.5ºC Requirement')
 
x1 = [2015,2030]
y1 = [ghgDf_PIK['2015'].sum()/convFactor, ghgForecast_GCAM_BAU.loc['WLD','2030']/convFactor]
y2 = [ghgDf_PIK['2015'].sum()/convFactor, 50000]
y3 = [ghgDf_PIK['2015'].sum()/convFactor, NDCs_clean.values.sum()]
y4 = [ghgDf_PIK['2015'].sum()/convFactor, 35000]

CoeffLineNoPolicy = np.polyfit(x1,y1,1)
CoeffLineLowTarget = np.polyfit(x1,y2,1)
CoeffLineNDCs = np.polyfit(x1,y3,1)
CoeffLineHighTarget = np.polyfit(x1,y4,1)

LineNoPolicy = np.poly1d(CoeffLineNoPolicy)
LineLowTarget = np.poly1d(CoeffLineLowTarget)
LineNDCs = np.poly1d(CoeffLineNDCs)
LineHighTarget = np.poly1d(CoeffLineHighTarget)

plt.plot(x1, LineNoPolicy(x1), "-o", color = "black")
plt.plot(x1, LineLowTarget(x1), "-o", color = "grey")
plt.plot(x1, LineNDCs(x1), "-o", color = "orange")
plt.plot(x1, LineHighTarget(x1), "-o", color = "green")

plt.fill_between(x1, LineNoPolicy(x1), LineLowTarget(x1), color = "grey")
plt.fill_between(x1, LineLowTarget(x1), LineNDCs(x1), color = "orange")
plt.fill_between(x1, LineNDCs(x1), LineHighTarget(x1), color = "green")
plt.xticks(range(2015, 2031, 5))
plt.show()

#We need labels for the graph !

## 2. Which countries are polluting more?

### 2.1 Top 10 greenhouse gas emitters

In [None]:
# Sorted bar chart for 2015 greenhouse gas emissions. 

ghgDf_PIK.drop 

GHGTop10 = ghgDf_PIK.sort_values(by = "2015", ascending = False).iloc[:10,:]
GHGTop10["2015"].plot(x=ghgDf_PIK.index, kind="bar")

plt.ylabel("GreenHouse Emission [MtCO2e]")
plt.title('Top 10 Emitters in 2015')
plt.show()

#LDC should not be here...

China, USA and India are the most polluting countries followed by Russia, Indoneis, Brazil, Japan, Iran and Germany. 

### 2.2 Top 10 Polluters by pledged NDCs

In [None]:
# See who the top 10 polluters are going to be once NDCs are met. Sort the NDC values. 

NDCsTop10 = NDCs_clean.sort_values(by="Goal", ascending=False)[:10]

NDCsTop10.plot(kind="bar")
plt.ylabel("NDC [MtCO2e/yr]")
plt.title("Top 10 Emitters if Pledged Nationally Determined Contributions Are Met")
plt.show()

Note that: IND-India and IDN-Indonesia.

This graph shows the greenhouse gas emissions in 2030 in case countries meet their currently pledged NDCs. We observe that in this case, the major polluters don't change and the top 3 stays same: China, USA and India. We observe that Pakistan and Malasia are added to the list. Even though they are not among top 10 polluters in the first graph, with their pledged NDC targets they are among the top 10 emitters in 2030. This might be due to better reduction performance of the other countries or less reduction amount of Pakistan and Malaysia compared to others. 

### 2.3 2030 Projected Emissions vs. Emissions with NDCs Achieved

It would be interested to see how projected emissions match with NDCs and to then see who is reducing the most.

In [None]:
GHGTop10.index.values

In [None]:
from numpy import array
import matplotlib.patches as mpatches

# Some countries don't have projections for 2030. 6 of the top10 counties have projections. 
#Obtain the ones that also have projection values. 

top6Forecast = ghgForecast_GCAM_BAU.reindex(GHGTop10.index.values).dropna()
top6Forecast_sorted = top6Forecast["2030"].sort_values(ascending=False)

# NDCs of the top 10 current countries. 

NDCsTop10 = NDCsTop10.sort_values(by="Goal", ascending=False)

# Values to use in the graph.

y1 = top6Forecast_sorted.values

#Missing countries are dropped from NDC list. 
# y2=NDCsTop10[NDCsTop10['Country']!= "RUS"][NDCsTop10['Country']!= "DEU"][NDCsTop10['Country']!= "COD"]["Goal"].values
y2 = NDCsTop10[ NDCsTop10.index.isin(top6Forecast.index) ]["Goal"].values

x = np.arange(len(y1))

# Plot bar-chart for the 6 countries. 

bar_width = 0.35 
plt.bar(x,y1,width=bar_width,color="green")
plt.bar(x+bar_width,y2,width=bar_width,color="purple")

plt.xticks(x+bar_width/2,["China","USA","India","Indonesia","Brazil","Japan","Canada"])
plt.title("2030 Projections vs. NDCs")

# Patches are used to plot the NDCs and 2030 projections side by side. 

green_patch=mpatches.Patch(color="green",label="2030 Projetion")
purple_patch=mpatches.Patch(color="purple",label="2030 NDC")
plt.legend(handles=[green_patch,purple_patch])

#The % amount that countries need to reduce to achieve their targets:    

difference=[]
for i in range(len(y1)):
    percentageDifference= round((y1[i]-y2[i])*100/y1[i],2)
    txt= "%" +  repr(percentageDifference)
    difference.append(txt)

for i in range(len(y1)):
    plt.text(x=x[i],y=y1[i],\
         s = difference[i])

plt.show()
    
#Limit the decimal points. Interpretation of the graph will be added. 

This graph shows for 2030, the differences between Bussines As Usual case and the NDCs for the countries. The amount that countries need to reduce to achieve their targets are calculated in percentage. We observe that USA has to do more compared to China, the top polluter. Brazil is the country which has to reduce its emissions most in order to reach its targets. Later we plan to compare the percentage reductions with the historical debts of the countries to see the relation between their targets and historical emissions. (How much they aim to do vs. How much their responsibility is) --- write more according to the new graph 

### 3. Historical responsibility for climate change

#### Is it fair to put the same burden of greenhouse reduction on developing countries considering the historical emissions produced by developed countries?

Developing countries and international advocacy organization have argued that owing to their historical emissions, the developed countries owe a "climate debt" to poor countries (Pickering & Barry, 2012). The developed countries have enjoyed the fruit of industrial development way before the developing countries, and have used more than their fair share of Earth's ability to absorb greenhouse gases. Now, the call for reducing global emissions to combat climate change constrains the development of developing countries. Therefore, the developed countries should repay the climate debt by rapdily reducing their emissions and providing financial support to developing countries to upgrade their technologies (Pickering & Barry, 2012). 

The UNFCCC acknowledges this point of contention through the principle of Common but Differentiated Responsibilities and Respective Capabilities (CBDR–RC) stating that the countries should *"protect the climate system for the benefit of present and future generations of humankind, on the basis of equity and in accordance with their common but differentiated responsibilities and respective capabilities"* thereby urging the developed countries to take the lead on climate action (UNFCCC, 1992). The Paris Agreement also reaffirmed this obligation of developed countries.

However, the developed countries have argued to revise the crude 1992 definition of developing countries that sees 6 out of the 10 richest nations of the world as 'developing'. They have stressed that countries who are in a position to contribute financially should do so. 

This section analyses the cumulative historical emissions of the top polluters of the world and throws light on the their NDC reductions in relation to the cummulative emissions and GDP per capita. Further, the countries' contribution to the Green Climate Fund is analyzed to understand if the historical polluters are doing their bit to support climate mitigation in the developing countries.

References:
UNFCCC. (1992). United Nations Framework Convention on Climate Change. Retrieved from http://unfccc.int/files/essential_background/convention/background/application/pdf/convention_text_with_annexes_english_for_posting.pdf

Pickering, J., & Barry, C. (2012). On the concept of climate debt: Its moral and political value. Critical Review of International Social and Political Philosophy, 15(5), 667–685. https://doi.org/10.1080/13698230.2012.727311

The Telegraph. 2018. What is the Paris Agreement on climate change? Everything you need to know. https://www.telegraph.co.uk/business/0/paris-agreement-climate-change-everything-need-know/

### 3.1 Carbon Debt

Which countries are the major emitters of greenhouse gases considering the emissions from the year 1850?

In [None]:
CO2From1850_CAIT = pd.read_excel("data/wri/CW_CAIT_GHG_Emissions_31102017_ISO.xlsx", sheet_name="CO2 Total Emissions",\
                                 index_col = "ISO")
CO2From1850_CAIT.index.rename("Country Code", inplace=True)
CO2From1850_CAIT.fillna(value = 0, inplace=True)
CO2From1850_CAIT.drop(columns = "Country", inplace = True)
CO2From1850_CAIT.head()

In [None]:
CO2From1850_CAIT = CO2From1850_CAIT.pivot_table(values = "Total CO2 Emissions Excluding Land-Use Change and Forestry (MtCO2)",\
                                                index = "Country Code", columns = "Year")
CO2From1850_CAIT["Cummulative Emissions"] = CO2From1850_CAIT.sum(axis=1)
CountriesHighHistDebt = CO2From1850_CAIT.sort_values(by = "Cummulative Emissions", ascending=False).iloc[:15, :]
CountriesHighHistDebt = CountriesHighHistDebt[["Cummulative Emissions"]]
CountriesHighHistDebt.drop(index= ["WORLD", "EU28"], inplace=True) 

CountriesHighHistDebt.head()

In [None]:
CountriesHighHistDebt.plot(kind="bar", title = "Historical CO2 emissions of countries")
plt.ylabel("MTCO2")

The above plot shows that USA has been the major historical emitter of greenhouse gases followed by China, Russia, Germany and the Great Britain. 

### 3.2 Time series for greenhouse gases of major (top 10 depending on V1) present polluters (from 1990 to 2014)

In [None]:
GHGTop10.columns

In [None]:
# Time series for ghg emissions of the top 10 most polluting countries. The countries will be changed!!!

plt.figure(figsize=(12, 10), dpi=80) 

for i in range(0,10):
    
    row = GHGTop10.iloc[i,140:-2]
    plt.plot(row)

plt.MaxNLocator(5)
plt.xlabel("Years")
plt.ylabel("GHG Emissions-kt CO2 eq.")
plt.title("GHG Emissions starting between 1990-2014")
plt.legend(loc=((1.05,0.3)))
plt.legend(GHGTop10.index)

plt.show()

In this graph we observe the time series emissions for 
We observe that the top polluter of the world China has increasing emissions starting from 1990s till 2012. The second polluter US, on the other hand shows a stable curve over the years. Until 2004 it is the top polluter of the world. India........

### 3.3 Relation between historical debt, % reduction in greenhouse gases promised and GDP per capita
How does the historical debt relate to countries' present GDP per capita (2018 values, `https://data.worldbank.org/indicator/NY.GDP.PCAP.CD`) and the % reduction in greenhouse gases promised for the year 2030?

In [None]:
#Random values of % reduction in historical debt to generate data for making the scatter plot
CountriesHighHistDebt["% reduction"] = np.random.uniform(size = len(CountriesHighHistDebt.index))

GDPperCapitaWB = pd.read_csv("data/world bank/GDP_per_capita.csv", skiprows=4, index_col = "Country Code", header = 0)
GDPperCapitaWB.dropna(how="all", axis=0, inplace=True)
GDPperCapitaWB.dropna(how="all", axis=1, inplace=True)
GDPperCapitaWB.drop(columns = ["Indicator Name","Indicator Code"], inplace = True)
GDPperCapitaWB.iloc[:,1:] = GDPperCapitaWB.iloc[:,1:].interpolate(axis = 1)
GDPperCapitaWB.head()

In [None]:
GDPperCapita2017 = GDPperCapitaWB[["2017"]]
GDPperCapita2017.head()

In [None]:
#Merge the two datasets here based on the Country Name
HistoricalDebtMergedDf = CountriesHighHistDebt.join(GDPperCapita2017)
HistoricalDebtMergedDf

In [None]:
plt.scatter(x=HistoricalDebtMergedDf["Cummulative Emissions"],y=HistoricalDebtMergedDf["% reduction"],\
            s = HistoricalDebtMergedDf["2017"]/100)
plt.xscale("log")
plt.xlabel("Cummulative Emissions")
plt.ylabel("%reduction in ghgs by 2030")
for i in range(len(list(HistoricalDebtMergedDf.index))):
    plt.text(x=HistoricalDebtMergedDf.ix[i,"Cummulative Emissions"],y=HistoricalDebtMergedDf.ix[i,"% reduction"],\
         s = list(HistoricalDebtMergedDf.index)[i])
plt.grid(True)
plt.xticks([10000,100000,1000000], ["10k","100k", "1000k"])

#Divide horizontally
plt.axhline(y=0.5, color='b')

#Divide vertically
plt.axvline(x=100000, color='b')

# The graph is shifted when we do this, I commented. 
# Divide Vertically 

#t1 = np.arange(0.0, 5.0, 0.1)
#len(t1)

#L1=[]
#for i in range(len(t1)):
    #L1.append(800000)
    
#plt.plot(L1,t1)


# Divide Horizontally

#t1 = np.arange(0.0, 5.0, 0.1)
#len(t1)

#L1=[]
#for i in range(len(t1)):
    #L1.append(2.5)
    
#plt.plot(t1,L1)


#To divide into four something like. 

plt.text(x=150000, y=0.95, s="Much guilt, much effort!", color = 'purple')
plt.text(x=15000, y=0.95, s="Less guilt, much effort!", color = 'purple')
plt.text(x=15000, y=0.02, s="Less guilt, less effort!", color = 'purple')
plt.text(x=150000, y=0.02, s="Much guilt, less effort!", color = 'purple')

plt.show()

#We need to fix the random %reductions. Difference btw.2030 projections and NDCs. From the previous graph. Interpret the graph!!

### 3.4 Green Climate Fund Pledges

The Green Climate Fund (GCF) is an international fund set up through the UNFCCC to help developing nations build projects that align with global emissions reduction goals. Projects include those for renewable elecricity, reducing 

All pledges made by countries are [listed online](https://www.greenclimate.fund/how-we-work/resource-mobilization) in an interactive table but does not provide the data cleanly.

In [None]:
import requests
from bs4 import BeautifulSoup

page_name = 'https://www.greenclimate.fund/how-we-work/resource-mobilization'
page = requests.get(page_name)

soup = BeautifulSoup(page.text, 'html.parser')

contrib = soup.find(class_='res-table')
contribItems = contrib.find_all('tbody')[0].find_all('tr')

country, announced = [],[]

for i in range(len(contribItems)):
    scrapedInfo = contribItems[i].find_all('td')
    country.append(scrapedInfo[0].contents[0])
    announced.append(float(
        scrapedInfo[1].contents[0].replace('$','').strip('M').replace(',','').replace('<','').strip() )
                    * 1000000)



In [None]:
gcfBS = pd.DataFrame({'Pledges':announced},index=country)

In [None]:
for i in gcfBS.index:
    if i in countryDictionaryInv:
        gcfBS.loc[i,'Country Code'] = [countryDictionaryInv[i]]
    else: # escape from keys that don't exist
        next
gcfBS.index.name = 'Country'

In [None]:
# gcfBS.reset_index().set_index('Country Code')

In [None]:
# From visual inspection, the column 'Year' and index 'World' is not necessary
gcfDf = pd.read_csv('data/green-climate-gcf-fund-pledges.csv',index_col=0).drop(columns='Year').drop('World')

# Rename the columns to be more readable and index name to be 'Country' instead of 'Entity'
gcfDf = gcfDf.rename(columns={'Code':'Country Code',
                      'Signed pledges (GCF) (US$ per year)':'Pledges'})
gcfDf.index.name = 'Country'

# Check the data
gcfDf.head()

In [None]:
gcfBS.plot.bar(by='Country',y='Pledges')
plt.show()

In [None]:
gcfBS['Pledges'].divide(1e9).sort_values(ascending=False).plot.bar(by='Pledges')
plt.ylabel('Signed Pledge in billions $US/year')
plt.show()