# Import/Export-Adjusted Green House Gas Emissions
## Final Assignment for EPA1333 Computer Engineering for Scientific Computing

Authors:

Patrick Steinmann #4623991

Stefan Wigman     #4016246


## Abstract

High-pollutant industrial processes often take place in developing countries, the resulting products often being exported to developed countries. We analyze this "offshoring" of green house gas (GHG) emissions by considering country-to-country import/export balances and national GHG emissions. We attempt to assign each country each it's "true" GHG emissions by determining which emissions that country causes in other countries, and then attributing these "offshored" emissions accordingly. We find that #TODO

## Introduction

In partial fulfillment of the course requirements of EPA1333, we were tasked to conduct an original and non-trivial data analysis related to climate change.

We chose to investigate the phenomenon of "outsourcing" green house gas (GHG) emissions. Many emission-intensive activities take place in countries with poor emissions records - however, these countries often export the products of these activities to countries with much better emissions records. In essence, the emissions are being outsourced. A simple example is the import of electrical energy - highly polluting coal is burned in a power plant in poorly developed country A, and the generated energy is exported to highly developed country B. Country B can claim low GHG emissions - after all, the coal is being burned in A, which, as a poorly developed country, has much more leeway regarding pollution. However, the resulting emissions should really be attributed to country B, since that is where the energy ends up.

Our research question therefore is as follows:

*When considering a country's import/export-adjusted emissions, does this differ significantly from their claimed emissions records, and how has this developed over time?*

## Methodology

### Approach

We tackled our research question by first finding, importing and cleaning import/export data between countries. Specifically, we were interested in total import/export (that is, goods and services) from and to each country, for our time range of interest. We defined this time range as 1995 to 2015, giving us a long enough 20-year interval while staying largely inside the time range where useful data was available.

We then obtained data on every country's GDP and GHG emissions. These emissions are reported as total GHG emitted over a year in a country, irrespective of use/destination.

$ emissions_{nominal} = emissions_{self} + emissions_{export} $

By comparing export volume and GDP, we were able to determine which percentage of a country's GHG emissions are "self-caused", and which are "offshored" - that is, emissions created by products destined to by exported. In essence, these emissions are the fault of the country importing those products, not the emittant's.

We could then assign each country, based on its imports, a percentage of their import partners' GHG emissions, thus arriving at each country's import/export adjusted (or "true") emissions.

$ emissions_{true} = emissions_{self} + emissions_{import} $

### Assumptions & Simplifications

* countries export a broadly similar product palette to every export partner
* no re-export or re-import

## Results/Work

### Setup

In a first step, we import all packages used throughout this notebook. These packages add functionality and features. Most of the packages are Anaconda-default. wbdata is the exception - this package is essentially an API for accessing World Bank Development Indicators data in an efficient, pandas-integrated fashion.

We also import two custom .py files, DataFunctions and ProjectFunctions. DataFunctions consists of a specialized set of functions

#TODO difference between DataFunctions and ProjectFunctions?

#TODO explain package-dependent functions...

In [11]:
import requests
import pandas as pd
from pathlib import Path
import numpy as np
import os
from DataFunctions import *
from ProjectFunctions import *
import datetime
import wbdata

We override a default pandas option to make chained assignments not throw warnings.

In [4]:
pd.options.mode.chained_assignment = None  # default='warn'

As we intend to use a pandas multi-index dataframe, we create an IndexSlice object to make multi index slicing syntax more natural. This is optional.

In [5]:
idx = pd.IndexSlice

### Data Import & Cleaning

#### Country to Country Trade Data

We first import the raw country-to-country trade data from a CSV file, using suitable encoding.

In [6]:
trade_data=pd.read_csv("raw_data/DataJobID-1257172_1257172_TestQuery.csv" , encoding = "ISO-8859-1")

A thorny aspect of dealing with country-level data is the wildly differing standards for labelling the data. Various databases use full country names in various spellings, two-character ISO codes, three-character ISO codes, three-character IOC (International Olympic Committee) codes, or other identifiers. Thus, data alignment can be an issue. We decide to use ISO3 as our common identifier, and therefore create a dictionary to manage the conversions.

In [8]:
dic_cols=['ReporterISO3', 'ReporterName']
dic_df=trade_data[dic_cols].drop_duplicates()
country_dic=dic_df.set_index('ReporterName')['ReporterISO3'].to_dict()
inv_country_dic = {v: k for k, v in country_dic.items()}

We intend to build a multi-index dataframe to hold trade data between countries over a range of years. Multi-index dataframes are n-dimensional dataframes. In our case, we will use three dimensions - for each year (time being the third dimension), a two-dimensional dataframe holds the country-to-country trade data.

To build the multi-index, we need to define the indices first.

In [9]:
years = list(range(1995,2016))
countries=list(trade_data['ReporterName'].unique())

We can then build the structure of the multi-index dataframe.

In [12]:
data = build_multi_index_df(years,countries)

We can then fill the structure with values from the trade data. This iterative approach is quite slow. We use iPython magic to measure execution time. Anecdotally, execution time seems to be around 6-8 minutes.

In [14]:
%%timeit -n1 -r1

#Caution, takes roughly 6-8 minutes!
for index, row in trade_data.iterrows():
    for year in years:
        year_key=str(year)+" in 1000 USD "
        data.loc[year][row['ReporterName']][row['PartnerName']]=row[year_key]

1 loop, best of 1: 7min 13s per loop


This data contains many NaN (Not a Number) values, which we fill with 0.

In [15]:
data_filled=data.fillna(0)

To make data handling easier, we write the created multi-index dataframe to a TSV (tab-separated values) file.

In [16]:
data_filled.to_csv('trade_data.tsv', sep='\t')

We then re-import that TSV file. This makes working with the data much easier, as we don't have to recreate it every time we run the notebook, we can just load it from the TSV file.

In [17]:
imported_data = pd.read_table('trade_data.tsv', index_col=[0,1])

To ensure the data has not been re-shaped during the write/read, we compare it to the original.

In [18]:
all(imported_data == data_filled)

True

#### World Bank: World Development Indicators Data

In an external Excel sheet, we first define which WDI indicators we would like to import through the wbdata API.

In [19]:
indicator_dataframe, indicators, tabnames=GetIndicatorsWB(file='Selected_Indicators.xlsx', sheet='Indicators')

We first import income and region data for every country.

In [20]:
countries1=GetRegionIncomeDataWB()

We then import WDI data for the selected indicators based on 2015 numbers. Our custom function for this attempts to fill in missing values using older data where possible, going back to 2010 at the earliest.

In [21]:
wbdata = GetDataWB(indicators,2010, 2015)

We add the indicators data to the countries' income and region data.

In [22]:
wb_data_countries = countries1.join(wbdata, how='inner')

To account for missing income data, we use two functions. The first function identifies which countries are missing data, and then attempts to find other countries in that country's region with comparable income levels to fill the data. We do this because we assume that similarly developed countries in the same region will have comparable WDI indicators statistics.

As this does not cover all countries, we then run a simplified version of this method, matching only on region. This guarantees that there will be data for every country, but the data is less accurate.

In [23]:
region_income_data=FillByRegionAndIncomeWB(wb_data_countries)
region_income_data=FillByRegionWB(region_income_data)

We verify that we have a complete data set using another custom function.

In [24]:
DataCompleteness(region_income_data)

Country Data                                                    100.0
Region                                                          100.0
IncomeGroup                                                     100.0
Exports of goods and services (% of GDP)                        100.0
GDP (current US$)                                               100.0
Total greenhouse gas emissions (kt of CO2 equivalent)           100.0
Exports of goods and services (% of GDP) source                 100.0
GDP (current US$) source                                        100.0
Total greenhouse gas emissions (kt of CO2 equivalent) source    100.0
dtype: float64


### Shaping Data

### Connecting Data

### Visualizations

## Analysis

## Conclusion

## Reflection

## References & Sources