# Importing the Raw Datasets

In [1]:
import pandas as pd

wdi_df = pd.read_csv("./data/WDIData.csv")

# Output the first 100 records as a sample set
wdi_df.head(100).to_csv("./data/WDIData_sample.csv")

wdi_df.head()


Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2013,2014,2015,2016,2017,2018,2019,2020,2021,Unnamed: 66
0,Africa Eastern and Southern,AFE,Access to clean fuels and technologies for coo...,EG.CFT.ACCS.ZS,,,,,,,...,16.936004,17.337896,17.687093,18.140971,18.491344,18.82552,19.272212,19.628009,,
1,Africa Eastern and Southern,AFE,Access to clean fuels and technologies for coo...,EG.CFT.ACCS.RU.ZS,,,,,,,...,6.499471,6.680066,6.85911,7.016238,7.180364,7.322294,7.517191,7.651598,,
2,Africa Eastern and Southern,AFE,Access to clean fuels and technologies for coo...,EG.CFT.ACCS.UR.ZS,,,,,,,...,37.855399,38.046781,38.326255,38.468426,38.670044,38.722783,38.927016,39.042839,,
3,Africa Eastern and Southern,AFE,Access to electricity (% of population),EG.ELC.ACCS.ZS,,,,,,,...,31.79416,32.001027,33.87191,38.880173,40.261358,43.061877,44.27086,45.803485,,
4,Africa Eastern and Southern,AFE,"Access to electricity, rural (% of rural popul...",EG.ELC.ACCS.RU.ZS,,,,,,,...,18.663502,17.633986,16.464681,24.531436,25.345111,27.449908,29.64176,30.404935,,


In [2]:
climate_df = pd.read_csv("./data/climate_watch_data.csv")

climate_df.head()


Unnamed: 0,Country,Data source,Sector,Gas,Unit,2019,2018,2017,2016,2015,...,1999,1998,1997,1996,1995,1994,1993,1992,1991,1990
0,World,CAIT,Total including LUCF,All GHG,MtCO₂e,49758.23,49368.04,48251.88,47531.68,46871.77,...,35101.9,35099.21,35537.18,34179.33,33805.61,33015.04,32729.06,32588.09,32670.51,32523.58
1,World,CAIT,Total excluding LUCF,All GHG,MtCO₂e,48116.56,47980.47,47031.82,46264.07,46085.31,...,33282.05,33088.07,32855.85,32467.96,31890.42,31105.68,30819.73,30678.77,30761.18,30614.25
2,World,CAIT,Energy,All GHG,MtCO₂e,37636.1,37603.22,36777.63,36188.89,36173.7,...,25389.87,25257.77,25075.28,24656.11,24150.88,23501.77,23395.58,23265.41,23365.41,23244.24
3,World,CAIT,Total including LUCF,CO2,MtCO₂e,36874.11,36669.4,35736.22,35223.63,34559.71,...,25064.85,25003.46,25396.41,24289.64,23964.26,23334.16,23191.71,23049.37,23098.3,22943.37
4,World,CAIT,Total excluding LUCF,CO2,MtCO₂e,35512.86,35476.59,34689.96,34144.04,34076.29,...,23506.53,23348.62,23185.61,22769.14,22289.91,21665.65,21523.23,21380.89,21429.81,21274.89


In [3]:
un_sds_df = pd.read_excel("./data/un_sds_poverty_data.xlsx")

un_sds_df.head()


Unnamed: 0,Goal,Target,Indicator,SeriesCode,SeriesDescription,GeoAreaCode,GeoAreaName,TimePeriod,Value,Time_Detail,...,FootNote,Age,Hazard type,Location,Nature,Observation Status,Quantile,Reporting Type,Sex,Units
0,1,1.1,1.1.1,SI_POV_DAY1,Proportion of population below international p...,1,World,1981,42.6,1981,...,"Accessed April 8, 2022",ALLAGE,,ALLAREA,G,,,G,BOTHSEX,PERCENT
1,1,1.1,1.1.1,SI_POV_DAY1,Proportion of population below international p...,1,World,1982,42.1,1982,...,"Accessed April 8, 2022",ALLAGE,,ALLAREA,G,,,G,BOTHSEX,PERCENT
2,1,1.1,1.1.1,SI_POV_DAY1,Proportion of population below international p...,1,World,1983,41.3,1983,...,"Accessed April 8, 2022",ALLAGE,,ALLAREA,G,,,G,BOTHSEX,PERCENT
3,1,1.1,1.1.1,SI_POV_DAY1,Proportion of population below international p...,1,World,1984,39.6,1984,...,"Accessed April 8, 2022",ALLAGE,,ALLAREA,G,,,G,BOTHSEX,PERCENT
4,1,1.1,1.1.1,SI_POV_DAY1,Proportion of population below international p...,1,World,1985,38.0,1985,...,"Accessed April 8, 2022",ALLAGE,,ALLAREA,G,,,G,BOTHSEX,PERCENT


# Matching Country Names

Some countries do not share a common name between the datasets. The following code was used to generate lists of unique country names not included in the other datasets.

Since the GHG data contained the smallest number of country names it was chosen as the “source of truth” for country labels. 


In [4]:
climate_countries = climate_df["Country"].unique()
wdi_countries = wdi_df["Country Name"].unique()
un_sds_countries = sorted(un_sds_df["GeoAreaName"].unique())

in_climate_not_wdi = list(set(climate_countries).difference(set(wdi_countries)))
in_climate_not_un = list(set(climate_countries).difference(set(un_sds_countries)))


Mapping GHG Country names to other datasets was not a process that led itself to automation. In the case of countries like `Kyrgyz Republic/Kyrgyzstan` it might've been possible, but difficult to use partial matching to connect the names together. With a total list of 34 country names, this was a task that we undertook manually with discretion.


| Greenhouse Gas Emissions         | World Development Indicators (GDP Data) | Global Poverty Data                                      |
| -------------------------------- | --------------------------------------- | -------------------------------------------------------- |
| Bahamas                          | **Bahamas, The**                        | Bahamas                                                  |
| Bolivia                          | Bolivia                                 | **Bolivia (Plurinational State of)**                     |
| Brunei                           | **Brunei Darussalam**                   | **Brunei Darussalam**                                    |
| Cape Verde                       | **Cabo Verde**                          | **Cabo Verde**                                           |
| Cook Islands                     | _N/A_                                   | Cook Islands                                             |
| Côte d'Ivoire                    | **Cote d'Ivoire**                       | Côte d'Ivoire                                            |
| Czech Republic                   | Czech Republic                          | **Czechia**                                              |
| Democratic Republic of the Congo | **Congo, Dem. Rep.**                    | Democratic Republic of the Congo                         |
| Egypt                            | **Egypt, Arab Rep.**                    | Egypt                                                    |
| European Union (27)              | **European Union**                      | _N/A_                                                    |
| Gambia                           | **Gambia, The**                         | Gambia                                                   |
| Iran                             | **Iran, Islamic Rep.**                  | **Iran (Islamic Republic of)**                           |
| Kyrgyzstan                       | **Kyrgyz Republic**                     | Kyrgyzstan                                               |
| Laos                             | **Lao PDR**                             | **Lao People's Democratic Republic**                     |
| Macedonia                        | **North Macedonia**                     | **North Macedonia**                                      |
| Micronesia                       | **Micronesia, Fed. Sts.**               | Micronesia                                               |
| Moldova                          | Moldova                                 | **Republic of Moldova**                                  |
| Niue                             | _N/A_                                   | Niue                                                     |
| North Korea                      | **Korea, Dem. People's Rep.**           | **Democratic People's Republic of Korea**                |
| Republic of Congo                | **Congo, Rep.**                         | **Congo**                                                |
| Russia                           | **Russian Federation**                  | **Russian Federation**                                   |
| Saint Kitts and Nevis            | **St. Kitts and Nevis**                 | Saint Kitts and Nevis                                    |
| Saint Lucia                      | **St. Lucia**                           | Saint Lucia                                              |
| Saint Vincent and the Grenadines | **St. Vincent and the Grenadines**      | Saint Vincent and the Grenadines                         |
| Slovakia                         | **Slovak Republic**                     | Slovakia                                                 |
| South Korea                      | **Korea, Rep.**                         | **Republic of Korea**                                    |
| Syria                            | **Syrian Arab Republic**                | **Syrian Arab Republic**                                 |
| Tanzania                         | Tanzania                                | **United Republic of Tanzania**                          |
| Turkey                           | **Turkiye**                             | **Türkiye**                                              |
| United Kingdom                   | United Kingdom                          | **United Kingdom of Great Britain and Northern Ireland** |
| United States                    | United States                           | **United States of America**                             |
| Venezuela                        | **Venezuela, RB**                       | **Venezuela (Bolivarian Republic of)**                   |
| Vietnam                          | Vietnam                                 | **Viet Nam**                                             |
| Yemen                            | **Yemen, Rep**                          | Yemen                                                    |


In [5]:
climate_rename_list = [
    "Bahamas",
    "Bolivia",
    "Brunei",
    "Cape Verde",
    "Cook Islands",
    "Côte d'Ivoire",
    "Czech Republic",
    "Democratic Republic of the Congo",
    "Egypt",
    "European Union (27)",
    "Gambia",
    "Iran",
    "Kyrgyzstan",
    "Laos",
    "Macedonia",
    "Micronesia",
    "Moldova",
    "Niue",
    "North Korea",
    "Republic of Congo",
    "Russia",
    "Saint Kitts and Nevis",
    "Saint Lucia",
    "Saint Vincent and the Grenadines",
    "Slovakia",
    "South Korea",
    "Syria",
    "Tanzania",
    "Turkey",
    "United Kingdom",
    "United States",
    "Venezuela",
    "Vietnam",
    "Yemen",
]

wdi_rename_list = [
    "Bahamas, The",
    "Bolivia",
    "Brunei Darussalam",
    "Cabo Verde",
    "",
    "Cote d'Ivoire",
    "Czech Republic",
    "Congo, Dem. Rep.",
    "Egypt, Arab Rep.",
    "European Union",
    "Gambia, The",
    "Iran, Islamic Rep.",
    "Kyrgyz Republic",
    "Lao PDR",
    "North Macedonia",
    "Micronesia, Fed. Sts.",
    "Moldova",
    "",
    "Korea, Dem. People's Rep.",
    "Congo, Rep.",
    "Russian Federation",
    "St. Kitts and Nevis",
    "St. Lucia",
    "St. Vincent and the Grenadines",
    "Slovak Republic",
    "Korea, Rep.",
    "Syrian Arab Republic",
    "Tanzania",
    "Turkiye",
    "United Kingdom",
    "United States",
    "Venezuela, RB",
    "Vietnam",
    "Yemen, Rep.",
]

un_sds_rename_list = [
    "Bahamas",
    "Bolivia (Plurinational State of)",
    "Brunei Darussalam",
    "Cabo Verde",
    "Cook Islands",
    "Côte d'Ivoire",
    "Czechia",
    "Democratic Republic of the Congo",
    "Egypt",
    "",
    "Gambia",
    "Iran (Islamic Republic of)",
    "Kyrgyzstan",
    "Lao People's Democratic Republic",
    "North Macedonia",
    "Micronesia",
    "Republic of Moldova",
    "Niue",
    "Democratic People's Republic of Korea",
    "Congo",
    "Russian Federation",
    "Saint Kitts and Nevis",
    "Saint Lucia",
    "Saint Vincent and the Grenadines",
    "Slovakia",
    "Republic of Korea",
    "Syrian Arab Republic",
    "United Republic of Tanzania",
    "Türkiye",
    "United Kingdom of Great Britain and Northern Ireland",
    "United States of America",
    "Venezuela (Bolivarian Republic of)",
    "Viet Nam",
    "Yemen",
]


# Generating the Clean Output

Generating the list of years that we are focusing on for our analysis, 1990-2019.

In [6]:
years = list(map(str, range(1990, 2020)))


In [7]:
wdi_df_clean = wdi_df[
    [
        "Country Name",
        "Country Code",
        "Indicator Name",
        "Indicator Code",
        *years,
    ]
]

wdi_rename_dict = dict(zip(wdi_rename_list, climate_rename_list))
wdi_df_clean = wdi_df_clean.replace(wdi_rename_dict).dropna(subset=years, how="all")
wdi_df_clean = wdi_df_clean[wdi_df_clean["Country Name"].isin(climate_countries)]

# Uncomment to overwrite clean output
# wdi_df_clean.to_csv("./data/wdi_clean.csv", index=False)

# Output the first 100 records as a sample set
wdi_df_clean.head(100).to_csv("./data/wdi_clean_sample.csv", index=False)

wdi_df_clean.head()


Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1990,1991,1992,1993,1994,1995,...,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019
18746,European Union (27),EUU,Access to clean fuels and technologies for coo...,EG.CFT.ACCS.ZS,,,,,,,...,98.995793,99.060575,99.098573,99.134294,99.169027,99.197882,99.221345,99.424531,99.428916,99.458748
18747,European Union (27),EUU,Access to clean fuels and technologies for coo...,EG.CFT.ACCS.RU.ZS,,,,,,,...,97.497321,97.638137,97.732486,97.857445,97.96716,98.018315,98.10391,98.208828,98.274355,98.321068
18748,European Union (27),EUU,Access to clean fuels and technologies for coo...,EG.CFT.ACCS.UR.ZS,,,,,,,...,99.869291,99.869739,99.874219,99.878765,99.879905,99.881077,99.882307,99.883422,99.881291,99.888532
18749,European Union (27),EUU,Access to electricity (% of population),EG.ELC.ACCS.ZS,100.0,100.0,100.0,100.0,100.0,100.0,...,100.0,100.0,100.0,100.0,100.0,100.0,99.960196,100.0,100.0,100.0
18750,European Union (27),EUU,"Access to electricity, rural (% of rural popul...",EG.ELC.ACCS.RU.ZS,100.0,100.0,100.0,100.0,100.0,100.0,...,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0


The UN SDS Poverty data required some extra manipulation here. We dropped irrelevant columns and pivoted the data from a long to wide format to match the other two datasets.


In [8]:
# fmt:off
un_sds_clean = un_sds_df[
    un_sds_df["TimePeriod"].ge(1990) & un_sds_df["TimePeriod"].le(2019) & ~un_sds_df["Quantile"].eq("Q1")
].drop(["Goal", "Target", "Indicator", "GeoAreaCode", "Time_Detail", "TimeCoverage", "UpperBound", "LowerBound", "BasePeriod", "Source", "GeoInfoUrl", "FootNote", "Hazard type", "Quantile", "Nature", "Observation Status", "Reporting Type"], axis="columns" )
# fmt:on

un_sds_rename_dict = dict(zip(un_sds_rename_list, climate_rename_list))
un_sds_clean = un_sds_clean.replace(un_sds_rename_dict)
un_sds_clean = un_sds_clean[un_sds_clean["GeoAreaName"].isin(climate_countries)]


In [9]:
un_sds_clean_pivot = un_sds_clean.pivot(
    index=[
        "SeriesCode",
        "SeriesDescription",
        "GeoAreaName",
        "Age",
        "Location",
        "Sex",
        "Units",
    ],
    columns="TimePeriod",
    values="Value",
)

# Uncomment to overwrite clean output
# un_sds_clean_pivot.to_csv("./data/un_sds_clean.csv")

un_sds_clean_pivot.head()


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,TimePeriod,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,...,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019
SeriesCode,SeriesDescription,GeoAreaName,Age,Location,Sex,Units,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1
DC_ODA_POVDLG,"Official development assistance grants for poverty reduction, by donor countries (percentage of GNI)",Australia,,,,PERCENT,,,,,,,,,,,...,0.0329,0.0376,0.0383,0.0321,0.0307,0.0325,0.023,0.0235,0.0296,0.022
DC_ODA_POVDLG,"Official development assistance grants for poverty reduction, by donor countries (percentage of GNI)",Austria,,,,PERCENT,,,,,,,,,,,...,0.0071,0.0066,0.0073,0.0062,0.0086,0.0057,0.0053,0.0045,0.0056,0.0046
DC_ODA_POVDLG,"Official development assistance grants for poverty reduction, by donor countries (percentage of GNI)",Azerbaijan,,,,PERCENT,,,,,,,,,,,...,,,,,0.0013,,0.0004,,0.0002,0.0047
DC_ODA_POVDLG,"Official development assistance grants for poverty reduction, by donor countries (percentage of GNI)",Belgium,,,,PERCENT,,,,,,,,,,,...,0.0345,0.0315,0.0274,0.0234,0.0265,0.0215,0.0233,0.0199,0.0187,0.0191
DC_ODA_POVDLG,"Official development assistance grants for poverty reduction, by donor countries (percentage of GNI)",Canada,,,,PERCENT,,,,,,,,,,,...,0.0364,0.041,0.0409,0.0422,0.0317,0.0301,0.0303,0.0277,0.0261,0.0217


In [10]:
%reload_ext watermark

%watermark -iv -v -m

Python implementation: CPython
Python version       : 3.10.6
IPython version      : 8.5.0

Compiler    : Clang 13.1.6 (clang-1316.0.21.2.5)
OS          : Darwin
Release     : 21.5.0
Machine     : x86_64
Processor   : i386
CPU cores   : 8
Architecture: 64bit

sys   : 3.10.6 (main, Aug 30 2022, 05:12:36) [Clang 13.1.6 (clang-1316.0.21.2.5)]
pandas: 1.5.0

