### Assumptions
- All answers to our research question is from an observational statistics perspective.
- Pollution is measured by CO2 and GHG emissions.
- Wealth of a country is measured by its GDP per capita. While this relationship is not a perfect indicator of a country’s wealth it does measure how much economic activity has occurred during a timespan and is commonly used as a metric for a healthy economy.  
- Severe weather is a subset of natural disasters. E.g. Floods, storms, droughts.  
- Collection of data is consistent over the years in the datasets.  Any variation is not due to enhancements in tracking mechanisms and/or policies.  
- Each country has unique variations in land mass, GDP, population, industries, and other metrics that may result in bias toward pollution, prosperity, or increased likelihood of disasters.  There will not be a need to do robust normalization based on the context of countries because analysis will be executed on the full set of all countries.   


# A Descriptive Analysis of Climate Change Accountability
### *By Sophie Yeh, Ben Bluhm, Yeonsoo Kim*

**GitHub repository: Project2_Bluhm_Kim_Yeh**

**Data Sci W200 Section Th 630**

12 December, 2021




# Analysis to answer the following
**How does severe weather experience compare between high- and low- polluters in the world?**
 - Are there more severe weather events over time? 
 - Are rich countries polluting more than poorer countries? 
 - Should rich nations compensate poorer nations for severe climate change? 


In [830]:
import pandas as pd

## read clean disaster data
co2_disaster_df = pd.read_pickle('../data/co2-disaster-data.pkl')
co2_disaster_df

Unnamed: 0,iso_code,country,year,co2,consumption_co2,co2_growth_prct,co2_growth_abs,trade_co2,co2_per_capita,consumption_co2_per_capita,share_global_co2,cumulative_co2,share_global_cumulative_co2,co2_per_gdp,consumption_co2_per_gdp,population,gdp,total_ghg,ghg_per_capita,disaster_count
0,AFG,Afghanistan,1956,0.18,,19.05,0.03,,0.02,,0.00,0.83,0.00,0.02,,8.40e+06,1.16e+10,,,0
1,AFG,Afghanistan,1957,0.29,,60.00,0.11,,0.03,,0.00,1.12,0.00,0.03,,8.54e+06,1.16e+10,,,0
2,AFG,Afghanistan,1958,0.33,,12.50,0.04,,0.04,,0.00,1.46,0.00,0.03,,8.68e+06,1.22e+10,,,0
3,AFG,Afghanistan,1959,0.39,,16.62,0.06,,0.04,,0.00,1.84,0.00,0.03,,8.83e+06,1.26e+10,,,0
4,AFG,Afghanistan,1960,0.41,,7.62,0.03,,0.05,,0.00,2.25,0.00,0.03,,9.00e+06,1.30e+10,,,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13320,ZWE,Zimbabwe,2016,10.74,12.15,-12.17,-1.49,1.42,0.77,0.87,0.03,736.47,0.05,0.51,0.58,1.40e+07,2.10e+10,65.98,4.7,1
13321,ZWE,Zimbabwe,2017,9.58,11.25,-10.77,-1.16,1.67,0.67,0.79,0.03,746.05,0.05,0.44,0.51,1.42e+07,2.19e+10,,,2
13322,ZWE,Zimbabwe,2018,11.85,13.16,23.72,2.27,1.31,0.82,0.91,0.03,757.90,0.05,0.52,0.58,1.44e+07,2.27e+10,,,1
13323,ZWE,Zimbabwe,2019,10.95,12.42,-7.64,-0.91,1.47,0.75,0.85,0.03,768.85,0.05,,,1.46e+07,,,,2


## Create chart for emissions and natural disasters
Lets explore the relationship between the increase of emissions and the increase of disasters since 1956.

In [831]:
# select only what we want for temp df for the chart
df = co2_disaster_df[['country','year','co2','disaster_count']].copy()
df['disaster_count']=df['disaster_count'].fillna(0).astype(int)
df = pd.DataFrame(df.groupby('year')[['co2','disaster_count']].sum().reset_index())
df.tail(10)

Unnamed: 0,year,co2,disaster_count
55,2011,33350.4,355
56,2012,33866.45,368
57,2013,34159.21,353
58,2014,34399.73,346
59,2015,34318.08,396
60,2016,34260.67,349
61,2017,34687.41,363
62,2018,35375.09,334
63,2019,35438.72,441
64,2020,33797.0,395


### Create dual axis line chart
See if there's a relationship between emissions and disaster events over time

In [832]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Create figure with secondary y-axis
fig = make_subplots(specs=[[{"secondary_y": True}]])

# Add traces
fig.add_trace(
    go.Scatter(x=df.year, y=df.co2, name="CO2 Data"),
    secondary_y=False
)

fig.add_trace(
    go.Scatter(x=df.year, y=df.disaster_count, name="Disaster Count"),
    secondary_y=True
)

# Add figure title
fig.update_layout(
    title_text="Emissions vs Disaster Count"
)

# Set template to be consistent with others
fig.update_layout(template='plotly_white')

# Set x-axis title
fig.update_xaxes(title_text="Year")

# Set y-axes titles
fig.update_yaxes(title_text="CO2 Emission", secondary_y=False)
fig.update_yaxes(title_text="Disaster Count", secondary_y=True)

fig.show()


**Observations**
Appears to be a correlation between the increase.  Lets dive deeper into the types of disasters.

## Lets look at the types of natural disasters since 1956

In [833]:
disaster_df = pd.read_pickle("../data/disaster-data.pkl")
disaster_df

Unnamed: 0,year,start_year,start_month,start_day,disaster_subgroup,disaster_type,country,iso_code,total_deaths,no_injured,no_affected,no_homeless,total_affected,total_damages,total_damages_adjusted,cpi
0,1957,1957,12.0,,Hydrological,Flood,Brazil,BRA,112.0,,,,,,,10.85
1,1957,1957,7.0,,Hydrological,Flood,China,CHN,560.0,,,,,,,10.85
2,1957,1957,10.0,,Hydrological,Flood,Spain,ESP,77.0,,,,,,,10.85
3,1957,1957,9.0,20.0,Meteorological,Storm,Hong Kong,HKG,8.0,111.0,,,111.0,,,10.85
4,1958,1958,7.0,,Hydrological,Flood,Argentina,ARG,360.0,,,,,,,11.15
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15364,2021,2021,7.0,16.0,Hydrological,Flood,Yemen,YEM,11.0,,67980.0,,67980.0,,,
15365,2021,2021,2.0,1.0,Hydrological,Flood,South Africa,ZAF,31.0,,400.0,,400.0,75000.0,,
15366,2021,2021,9.0,7.0,Biological,Epidemic,Democratic Republic of Congo,COD,131.0,301.0,,,301.0,,,
15367,2021,2021,1.0,11.0,Hydrological,Flood,Serbia,SRB,,,22.0,,22.0,,,


In [834]:
# create slim working df for plotting
df = disaster_df[['year', 'disaster_type']].groupby(by=['year','disaster_type']).size().reset_index()
df.rename(columns={0:'disaster_count'}, inplace=True)
df

Unnamed: 0,year,disaster_type,disaster_count
0,1957,Earthquake,7
1,1957,Flood,11
2,1957,Storm,9
3,1957,Volcanic activity,1
4,1958,Earthquake,5
...,...,...,...
592,2021,Glacial lake outburst,2
593,2021,Landslide,11
594,2021,Storm,90
595,2021,Volcanic activity,8


In [835]:
import plotly.express as px

fig = px.area(df, x="year", y="disaster_count", color="disaster_type",
	      line_group="disaster_type", title='Disasters Over The Years', template='plotly_white', \
			labels={"year":"Year","disaster_count":"Number of Disasters", "disaster_type":"Disaster Type"})
fig.show()

**Observations**
This answers the first subquestion: *Are there more disasters over time?* **Yes there are.**

## Lets look at Countries

We will analyze if certain countries are feeling more disaster impact.

In [836]:
# create slim working df for plotting
df = pd.DataFrame(disaster_df.groupby(by=['country']).size()).reset_index()
df.rename(columns={0:'count'},inplace=True)
df.sort_values(by="count", ascending=False, inplace=True)
df

Unnamed: 0,country,count
213,United States of America,999
40,China,917
89,India,713
154,Philippines,641
90,Indonesia,552
...,...,...
14,Bahrain,1
60,Equatorial Guinea,1
27,Brunei Darussalam,1
164,"Saint Helena, Ascension and Tristan da Cunha",1


### See where geographically disasters are occuring

In [837]:
fig = px.choropleth(df, title='Natural Disasters', locations='country',
                    locationmode="country names", color='count', scope="world",template='plotly_white',range_color=[0,500],color_continuous_scale=px.colors.sequential.YlOrBr)
fig.show()

**Observations** 
Doesn't seem to be a geographic relationship between the countries with high disaster counts.

### Lets Look at disaster data in relation to GDP which could be an indicator for production and wealth

In [838]:
df = co2_disaster_df[['country','gdp','population','disaster_count']].copy()
# aggregate the gdp by mean and the sum of disasters
df = pd.DataFrame(df.groupby('country')[['gdp','disaster_count']].agg({'gdp':'mean','disaster_count':'sum'}).reset_index())
df = df.sort_values('disaster_count', ascending=False)
df.head(10)

Unnamed: 0,country,gdp,disaster_count
208,United States,9280000000000.0,969
41,China,5140000000000.0,905
91,India,2380000000000.0,696
156,Philippines,283000000000.0,628
92,Indonesia,919000000000.0,532
16,Bangladesh,179000000000.0,338
99,Japan,2950000000000.0,306
127,Mexico,883000000000.0,270
213,Vietnam,179000000000.0,241
28,Brazil,1330000000000.0,240


In [839]:
fig = px.bar(df.head(20), x="country", y="disaster_count", color='gdp', template="plotly_white",\
            title="Disasters by Country with GDP", color_continuous_scale=px.colors.sequential.YlGn,\
	        labels={"country":"Country","disaster_count":"Number of Disasters", "gdp":"GDP($)"})
fig.show()

**Observations**
Surprisingly it looks like the countries with high GDP are also getting the most disaster exposure.

In [840]:
fig = px.choropleth(df, title='Disasters by Country Mapped with GDP', locations='country',
                    locationmode="country names", color='gdp', scope="world",template='plotly_white',color_continuous_scale=px.colors.sequential.YlGn)
fig.show()

**Observations** 
Map isn't overly informative for answering our question.

In [841]:
trace = go.Heatmap(
   x = df.head(10).country,
   y = df.head(10).disaster_count,
   z = df.head(10).gdp,
   type = 'heatmap',
   colorscale = 'Viridis'
)
data = [trace]

fig = go.Figure(data = data)
fig.show()


**Observations** 
Not that useful.  Best chart was the first bar chart to show the disasters and GDP.  Takeaway: the countries with high GDP are feeling the impact of disasters. 

In [842]:
co2_disaster_df

Unnamed: 0,iso_code,country,year,co2,consumption_co2,co2_growth_prct,co2_growth_abs,trade_co2,co2_per_capita,consumption_co2_per_capita,share_global_co2,cumulative_co2,share_global_cumulative_co2,co2_per_gdp,consumption_co2_per_gdp,population,gdp,total_ghg,ghg_per_capita,disaster_count
0,AFG,Afghanistan,1956,0.18,,19.05,0.03,,0.02,,0.00,0.83,0.00,0.02,,8.40e+06,1.16e+10,,,0
1,AFG,Afghanistan,1957,0.29,,60.00,0.11,,0.03,,0.00,1.12,0.00,0.03,,8.54e+06,1.16e+10,,,0
2,AFG,Afghanistan,1958,0.33,,12.50,0.04,,0.04,,0.00,1.46,0.00,0.03,,8.68e+06,1.22e+10,,,0
3,AFG,Afghanistan,1959,0.39,,16.62,0.06,,0.04,,0.00,1.84,0.00,0.03,,8.83e+06,1.26e+10,,,0
4,AFG,Afghanistan,1960,0.41,,7.62,0.03,,0.05,,0.00,2.25,0.00,0.03,,9.00e+06,1.30e+10,,,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13320,ZWE,Zimbabwe,2016,10.74,12.15,-12.17,-1.49,1.42,0.77,0.87,0.03,736.47,0.05,0.51,0.58,1.40e+07,2.10e+10,65.98,4.7,1
13321,ZWE,Zimbabwe,2017,9.58,11.25,-10.77,-1.16,1.67,0.67,0.79,0.03,746.05,0.05,0.44,0.51,1.42e+07,2.19e+10,,,2
13322,ZWE,Zimbabwe,2018,11.85,13.16,23.72,2.27,1.31,0.82,0.91,0.03,757.90,0.05,0.52,0.58,1.44e+07,2.27e+10,,,1
13323,ZWE,Zimbabwe,2019,10.95,12.42,-7.64,-0.91,1.47,0.75,0.85,0.03,768.85,0.05,,,1.46e+07,,,,2


## Bar graph for country emissions
- Question 2: Are rich countries polluting more than poorer countries? 
- Columns used: CO2.country, CO2.co2, CO2.total_ghg


To understand the countries with highest emissions, we'll analyze the CO2 and GHG emission data.

In [843]:
#%run CO2_Cleansing_Validation.ipynb # runs data cleaning file for CO2 data
co2_data = pd.read_pickle("../data/co2-data.pkl")

In [844]:
# computing sum of co2, ghg emissions based on country and sorting by co2
co2_by_country = co2_data[['country','co2','total_ghg']].groupby(['country']).sum().sort_values(by='co2', ascending=False)
pd.set_option('display.max_rows', co2_by_country.shape[0]+1) #showing all countries
print(co2_by_country)

                                        co2  total_ghg
country                                               
United States                     311737.12  165734.57
China                             232977.94  179061.84
Russia                            106056.13   69251.04
Japan                              60824.13   31811.53
Germany                            60250.35   25108.81
India                              52063.32   53209.94
United Kingdom                     36257.64   17487.48
Canada                             28255.38   21782.21
Ukraine                            27611.60   12224.65
France                             25707.71   11467.98
Italy                              22753.92   12031.34
Poland                             21606.35    9652.69
South Africa                       19307.50   11130.26
Mexico                             18644.60   15538.02
South Korea                        18297.08   13062.92
Iran                               17921.87   15349.74
Australia 

### Experimenting with graph modules and how to represent the data

In [845]:
# using plotly.express module for improved visuals
import plotly.express as px # using plotly.express module for better visuals

fig = px.bar(co2_by_country, title='Emission rates by country', template='plotly_white', barmode="group")
fig.show()

### To make the data clearer on the countries with highest emissions, we'll focus on the top 10 sorted by CO2 emissions.


In [846]:
import numpy as np
def _color_red_or_green(val):
    color = 'red' if val > 30000 else 'green'
    return 'color: %s' % color

top10_co2_countries_df = co2_by_country.head(10) # find top 10 countries for co2 pollutions
top10_co2_countries_df.style.applymap(_color_red_or_green)


Unnamed: 0_level_0,co2,total_ghg
country,Unnamed: 1_level_1,Unnamed: 2_level_1
United States,311737.12,165734.57
China,232977.94,179061.84
Russia,106056.13,69251.04
Japan,60824.13,31811.53
Germany,60250.35,25108.81
India,52063.32,53209.94
United Kingdom,36257.64,17487.48
Canada,28255.38,21782.21
Ukraine,27611.6,12224.65
France,25707.71,11467.98


In [847]:
# creating bar graph to represent top 10 countries
fig = px.bar(top10_co2_countries_df.reset_index(), title='Emission rates by country (top 20)', 
             x='country', y=['co2','total_ghg'],template='plotly_white', barmode="group",
             labels = {'country':"Country",'variable':'Emissions'})
fig.update_yaxes(title_text='Emission Rates (million tonnes)')
fig.show()

**Observations**
Can clearly see that United States, China, and Russia dominate emissions.  

## Table comparing CO2/GHG emissions with GDP

We'll first take a look at the most recent GDP data for countries

In [848]:
'''
Finding most recent gdp data for countries
'''
# find most recent year for gdp
most_recent_year_for_gdp_df = co2_data.copy(deep=True) # deep copy co2_data to not change values
most_recent_year_for_gdp_df.dropna(subset=['gdp'],inplace=True) # drop rows with empty gdp values

# create new data frame with gdp based on most recent year
most_recent_gdp_per_country_df = most_recent_year_for_gdp_df.loc[most_recent_year_for_gdp_df['year']==most_recent_year_for_gdp_df['year'].max()]
most_recent_gdp_per_country_df = most_recent_gdp_per_country_df[['country', 'gdp']]
most_recent_gdp_per_country_df = most_recent_gdp_per_country_df.sort_values(by='gdp',ascending=False)
most_recent_gdp_per_country_df.reset_index(drop=True,inplace=True)
most_recent_gdp_per_country_df.head(10)

Unnamed: 0,country,gdp
0,China,18200000000000.0
1,United States,18100000000000.0
2,India,8840000000000.0
3,Japan,4870000000000.0
4,Germany,3890000000000.0
5,Russia,3610000000000.0
6,Indonesia,3080000000000.0
7,Brazil,2970000000000.0
8,France,2580000000000.0
9,United Kingdom,2540000000000.0


We'll then join this with the existing table on the top countries with CO2, GHG emissions.

In [849]:
'''
Creating table that joins CO2, GHG emission and GDP data to compare the two, then looking at top 10 GDP countries
'''
# Join GDP data frame with CO2 emissions by country
co2_emissions_gdp_df = pd.merge(co2_by_country, most_recent_gdp_per_country_df, on="country")
co2_emissions_gdp_df = co2_emissions_gdp_df.sort_values(by='gdp',ascending=False) # sort by gdp
# rename columns
co2_emissions_gdp_df.rename(columns={"country": "Country", "co2": "CO2 Emissions", "total_ghg":"GHG Emissions"},inplace=True)
co2_emissions_gdp_df.reset_index(drop=True,inplace=True) # reset index
co2_emissions_gdp_df.index += 1 # increase index to start with 1
co2_emissions_top10_gdp = co2_emissions_gdp_df.head(10).iloc[: , :-1] # drop gdp column
pd.set_option('precision', 2) # format to two decimal places
co2_emissions_top10_gdp

Unnamed: 0,Country,CO2 Emissions,GHG Emissions
1,China,232977.94,179061.84
2,United States,311737.12,165734.57
3,India,52063.32,53209.94
4,Japan,60824.13,31811.53
5,Germany,60250.35,25108.81
6,Russia,106056.13,69251.04
7,Indonesia,13746.35,53385.09
8,Brazil,15847.89,42729.29
9,France,25707.71,11467.98
10,United Kingdom,36257.64,17487.48


We'll compare this with the bottom 10 countries in terms of GDP.

In [850]:
'''
Creating table on CO2, GHG emissions for bottom 10 GDP countries
'''
# find bottom gdp countries and remove gdp column
co2_emissions_bottom10_gdp = co2_emissions_gdp_df.sort_values(by='gdp').head(10).iloc[: , :-1] 
co2_emissions_bottom10_gdp.reset_index(drop=True,inplace=True) # reset index
co2_emissions_bottom10_gdp.index += 1 # increase index to start with 1
co2_emissions_bottom10_gdp

Unnamed: 0,Country,CO2 Emissions,GHG Emissions
1,Dominica,4.79,8.1
2,Sao Tome and Principe,3.35,4.82
3,Comoros,5.3,11.92
4,Saint Lucia,14.31,21.28
5,Guinea-Bissau,10.05,101.98
6,Seychelles,13.38,15.94
7,Djibouti,18.75,35.75
8,Barbados,54.21,97.61
9,Central African Republic,10.55,1671.03
10,Cape Verde,13.01,8.83


## Time series graph based on emissions

- Question 1: Are there more severe weather events over time? 
- Columns used: CO2.year, CO2.co2, CO2.total_ghg

To understand how CO2, GHG emissions patterns over time, we'll examine time series data from 1956-2020.

In [851]:
# computing sum of co2, ghg emissions based on year
co2_by_year = pd.DataFrame(co2_data[['year','country','co2','total_ghg']].groupby(['year']).sum())
co2_by_year.reset_index(inplace=True)
co2_by_year.replace(0.00,np.nan, inplace=True) #replace zero values with NaN for graph
co2_by_year.tail(30)

Unnamed: 0,year,co2,total_ghg
35,1991,22206.84,33971.29
36,1992,22006.92,33838.75
37,1993,22237.99,33929.3
38,1994,22371.86,34113.24
39,1995,22827.77,34801.62
40,1996,23477.19,34821.11
41,1997,23587.0,36088.77
42,1998,23450.24,35662.27
43,1999,23738.13,35449.63
44,2000,24442.4,36009.12


In [852]:
# creating line graph based on co2, ghg emissions
fig = px.line(co2_by_year,title='Time series graph of emission rates', 
              x="year", y=['co2','total_ghg'],template='plotly_white',
              labels={'year':"Year",'variable':'Emissions','value':'Emission rates (million tonnes)'})
fig.show()

**Observations** Based on the continued increase we see, we'll take a look at the countries contributing the most to CO2, GHG increases.

### Time series graph for top 10 countries for CO2 emissions

To understand the trends of the leading 10 countries with top CO2 emissions, we'll examine patterns in the 1956-2020 timespan.

In [853]:
# gets list of top 10 countries with co2 pollution based on prior analysis
top_10_co2_countries = co2_by_country.reset_index().head(10)['country'].tolist() 

co2_data_scoped = co2_data[['year', 'co2','country']] # reduces data frame to relevant cols

co2_by_year_country = co2_data_scoped.loc[co2_data_scoped['country'].isin(top_10_co2_countries)].\
                        groupby(['year','co2']).sum()
co2_by_year_country.reset_index(inplace=True)
co2_by_year_country = co2_by_year_country.sort_values(['year','co2'],ascending = (True, False))
co2_by_year_country

Unnamed: 0,year,co2,country
9,1956,2859.82,United States
8,1956,766.19,Germany
7,1956,690.55,Russia
6,1956,574.32,United Kingdom
5,1956,268.77,France
...,...,...,...
644,2020,644.31,Germany
643,2020,535.82,Canada
642,2020,329.58,United Kingdom
641,2020,276.63,France


In [854]:
# creating a line graph to plot CO2 emission trends per country

#defining colors to emphasize leading countries in CO2 emissions
light_grey_rgb = 'rgb(231,231,231)'
color_map_by_country = {'Saudi Arabia':light_grey_rgb, 'South Korea': light_grey_rgb, 'Indonesia': light_grey_rgb, 
                        'Iran':light_grey_rgb, 'Brazil':light_grey_rgb, 'Mexico':light_grey_rgb,'Australia':light_grey_rgb,
                        'Italy':light_grey_rgb,'South Africa':light_grey_rgb, 'United Kingdom':light_grey_rgb,
                        'Poland':light_grey_rgb, 'Canada':light_grey_rgb,'Ukraine':light_grey_rgb, 'France':light_grey_rgb,
                        'Germany':light_grey_rgb, 'United States':'blue', 'China': 'red', 'India':'green', 
                        'Russia':'purple', 'Japan':'orange'}

# using plotly.express library to display graph
fig = px.line(co2_by_year_country, x="year", y="co2", color='country',title='CO2 emissions over years per country', template='plotly_white',
              labels={'co2':'CO2 Emissions (million tonnes)','year':"Year"},
              color_discrete_map=color_map_by_country)

fig.update_layout(showlegend=False) # hide legend to avoid clutter
# creating text to label countries
china_txt = dict(xref='paper', yref='paper', x=0.9, y=0.94,
                              xanchor='center', yanchor='top',
                              text='China',
                              font=dict(family='Arial',
                                        size=12,
                                        color='red'),
                              showarrow=False)
russia_txt = dict(xref='paper', yref='paper', x=0.95, y=0.24,
                              xanchor='center', yanchor='top',
                              text='Russia',
                              font=dict(family='Arial',
                                        size=12,
                                        color='purple'),
                              showarrow=False)
india_txt = dict(xref='paper', yref='paper', x=0.9, y=0.29,
                              xanchor='center', yanchor='top',
                              text='India',
                              font=dict(family='Arial',
                                        size=12,
                                        color='green'),
                              showarrow=False)

us_txt = dict(xref='paper', yref='paper', x=0.9, y=0.57,
                              xanchor='center', yanchor='top',
                              text='United States',
                              font=dict(family='Arial',
                                        size=12,
                                        color='Blue'),
                              showarrow=False)


japan_txt = dict(xref='paper', yref='paper', x=0.95, y=0.15,
                              xanchor='center', yanchor='top',
                              text='Japan',
                              font=dict(family='Arial',
                                        size=12,
                                        color='orange'),
                              showarrow=False)

txt_annotations = [us_txt, india_txt, russia_txt, china_txt, japan_txt]
fig.update_layout(annotations=txt_annotations)
fig.show()

**Observations**  
Very interesting growth of emissions by China.  Others are started to taper a little.

### Time series graph for top 10 countries for GHG emissions 

In [855]:
# gets list of top 10 countries with GHG pollution based on prior analysis
top_10_ghg_countries = co2_by_country.reset_index().sort_values(by='total_ghg',ascending=False).head(10)['country'].tolist() 

co2_data_scoped = co2_data[['year', 'total_ghg','country']] # reduces data frame to relevant cols

# checks for countries with top 10 CO2 emissions
ghg_by_year_country = co2_data_scoped.loc[co2_data_scoped['country'].isin(top_10_ghg_countries)].\
                        groupby(['year','total_ghg']).sum()
ghg_by_year_country.reset_index(inplace=True)
ghg_by_year_country

Unnamed: 0,year,total_ghg,country
0,1990,662.91,Canada
1,1990,758.03,United Kingdom
2,1990,1092.00,Japan
3,1990,1107.23,Germany
4,1990,1183.05,India
...,...,...,...
265,2016,2229.00,Indonesia
266,2016,2391.38,Russia
267,2016,3235.66,India
268,2016,5833.49,United States


In [856]:
# creating a line graph to plot GHG emission trends per country

#defining colors to emphasize leading countries in CO2 emissions
light_grey_rgb = 'rgb(231,231,231)'
color_map_by_country = {'Saudi Arabia':light_grey_rgb, 'South Korea': light_grey_rgb, 'Indonesia': 'purple', 
                        'Iran':light_grey_rgb, 'Brazil':light_grey_rgb, 'Mexico':light_grey_rgb,'Australia':light_grey_rgb,
                        'Italy':light_grey_rgb,'South Africa':light_grey_rgb, 'United Kingdom':light_grey_rgb,
                        'Poland':light_grey_rgb, 'Canada':light_grey_rgb,'Ukraine':light_grey_rgb, 'France':light_grey_rgb,
                        'Germany':light_grey_rgb, 'United States':'blue', 'China': 'red', 'India':'green', 
                        'Russia':light_grey_rgb, 'Japan':light_grey_rgb}

# using plotly.express library for visuals in graph
fig = px.line(ghg_by_year_country, x="year", y="total_ghg", color='country',title='GHG emissions over years per country', template='plotly_white',
              labels={'total_ghg':'GHG Emissions (million tonnes)','year':"Year"},
              color_discrete_map=color_map_by_country)

fig.update_layout(showlegend=False) # hide legend as we'll include annotations

# annotations to label countries in graph
china_txt = dict(xref='paper', yref='paper', x=0.9, y=1,
                              xanchor='center', yanchor='top',
                              text='China',
                              font=dict(family='Arial',
                                        size=12,
                                        color='red'),
                              showarrow=False)
indonesia_txt = dict(xref='paper', yref='paper', x=0.95, y=0.28,
                              xanchor='center', yanchor='top',
                              text='Indonesia',
                              font=dict(family='Arial',
                                        size=12,
                                        color='purple'),
                              showarrow=False)
india_txt = dict(xref='paper', yref='paper', x=0.9, y=0.33,
                              xanchor='center', yanchor='top',
                              text='India',
                              font=dict(family='Arial',
                                        size=12,
                                        color='green'),
                              showarrow=False)

us_txt = dict(xref='paper', yref='paper', x=0.9, y=0.57,
                              xanchor='center', yanchor='top',
                              text='United States',
                              font=dict(family='Arial',
                                        size=12,
                                        color='Blue'),
                              showarrow=False)


txt_annotations = [us_txt, india_txt, indonesia_txt, china_txt]
fig.update_layout(annotations=txt_annotations)
fig.show()

**Observations**
Similar trends to Co2

## Lets explore emission data further

CO2 per capita: Annual production-based emissions of carbon dioxide (CO2), measured in tonnes per person. This is based on territorial emissions, which do not account for emissions embedded in traded goods.
 
Consumption CO2 per capita: Annual consumption-based emissions of carbon dioxide (CO2), measured in million tonnes. Consumption-based emissions are national or regional emissions which have been adjusted for trade (i.e. territorial/production emissions minus emissions embedded in exports, plus emissions embedded in imports). If a country's consumption-based emissions are higher than its production emissions it is a net importer of carbon dioxide. Data has been converted by Our World in Data from million tonnes of carbon to million tonnes of CO2 using a conversion factor of 3.664.
 
Total GHG per capita: Total greenhouse gas emissions including land use change and forestry, measured in tonnes of carbon dioxide-equivalents per capita.

In [857]:
co2_df = pd.read_pickle("../data/co2-data.pkl")
co2_df

Unnamed: 0,iso_code,country,year,co2,consumption_co2,co2_growth_prct,co2_growth_abs,trade_co2,co2_per_capita,consumption_co2_per_capita,share_global_co2,cumulative_co2,share_global_cumulative_co2,co2_per_gdp,consumption_co2_per_gdp,population,gdp,total_ghg,ghg_per_capita
7,AFG,Afghanistan,1956,0.18,,19.05,0.03,,0.02,,0.00,0.83,0.00,0.02,,8.40e+06,1.16e+10,,
8,AFG,Afghanistan,1957,0.29,,60.00,0.11,,0.03,,0.00,1.12,0.00,0.03,,8.54e+06,1.16e+10,,
9,AFG,Afghanistan,1958,0.33,,12.50,0.04,,0.04,,0.00,1.46,0.00,0.03,,8.68e+06,1.22e+10,,
10,AFG,Afghanistan,1959,0.39,,16.62,0.06,,0.04,,0.00,1.84,0.00,0.03,,8.83e+06,1.26e+10,,
11,AFG,Afghanistan,1960,0.41,,7.62,0.03,,0.05,,0.00,2.25,0.00,0.03,,9.00e+06,1.30e+10,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
25199,ZWE,Zimbabwe,2016,10.74,12.15,-12.17,-1.49,1.42,0.77,0.87,0.03,736.47,0.05,0.51,0.58,1.40e+07,2.10e+10,65.98,4.7
25200,ZWE,Zimbabwe,2017,9.58,11.25,-10.77,-1.16,1.67,0.67,0.79,0.03,746.05,0.05,0.44,0.51,1.42e+07,2.19e+10,,
25201,ZWE,Zimbabwe,2018,11.85,13.16,23.72,2.27,1.31,0.82,0.91,0.03,757.90,0.05,0.52,0.58,1.44e+07,2.27e+10,,
25202,ZWE,Zimbabwe,2019,10.95,12.42,-7.64,-0.91,1.47,0.75,0.85,0.03,768.85,0.05,,,1.46e+07,,,


In [858]:

import pycountry_convert as pc
import pycountry
def get_continent(df):
    try: 
        country_alpha2 = pycountry.countries.get(alpha_3=df).alpha_2
        country_continent_code = pc.country_alpha2_to_continent_code(country_alpha2)
        country_continent_name = pc.convert_continent_code_to_continent_name(country_continent_code)
        return country_continent_name
    except:
        if df == 'ATA':
            return 'Antarctica'
        elif df == 'SXM':
            return 'North America'
        elif df == 'TLS':
            return 'Asia'
        elif df == 'OWID_KOS':
            return 'Europe'
        else:
            return 'NaN'
    
co2_df = pd.DataFrame.copy(co2_data.loc[co2_data['year']>=1990,['country', 'year', 'co2_per_capita','ghg_per_capita','population']], deep=True)
co2_df['continent'] = pd.Series.copy(co2_data['iso_code'].apply(get_continent), deep=True)
co2_df['gdp_per_capita'] = pd.Series.copy(co2_data['gdp'].div(co2_data['population']), deep=True)
co2_df.head()

co2_df.population.fillna(0,inplace=True)

## Random Smoke scatter for Cover Art

In [859]:


ghg_fig1 = px.scatter(co2_df, x='gdp_per_capita', y='ghg_per_capita', 
                 hover_data = ['country', 'year','gdp_per_capita', 'ghg_per_capita'],
                 labels= {'gdp_per_capita': 'GDP per capita (USD)', 
                          'ghg_per_capita': 'Annual GHG emissions (per capita)', 
                          'country': 'Country', 'year': 'Year', 'continent': 'Continent'},
               color_continuous_scale=px.colors.sequential.Greys, # grey color scale
               color='year',
               size='population', size_max=100)
ghg_fig1.update_layout(template="plotly_white",
                   title='',
                   title_font_color='darkgreen',
                   xaxis_title= '',
                   yaxis_title= '',
                   coloraxis_showscale=False,
                  )
ghg_fig1.update_xaxes(type="log",showticklabels=False,showgrid=False)
ghg_fig1.update_yaxes(range=[-10,40],showticklabels=False,showgrid=False, zeroline=False)
config = {
  'toImageButtonOptions': {
    'format': 'jpeg', # one of png, svg, jpeg, webp
    'filename': 'co2_gdp',
    'scale': 2 # Multiply title/legend/axis/canvas sizes by this factor
  }
}
ghg_fig1.show(config=config)

No smoking.

### Are rich countries polluting more than poor countries?
## Top & Bottom GDP by Country
To compare rich and poor countries, the first step is to assess our measure: GDP. GDP is often presented with GDP per capita, so let's see if using one or the other will affect our data. For this plot, we assume that 0 GDP means that there was no data for that country and year.

In [860]:
default_year = 2016
gdp_df = co2_data[['country', 'year','gdp', 'population']].groupby(['country','year'], as_index=False).sum()
gdp_df.dropna(axis=0, how='any', subset=['country','year','gdp'], inplace=True)
gdp_df = gdp_df[gdp_df['gdp']>0]
gdp_df['gdp_per_capita'] = gdp_df['gdp'].div(gdp_df['population'])
gdp_df

Unnamed: 0,country,year,gdp,population,gdp_per_capita
0,Afghanistan,1956,1.16e+10,8.40e+06,1378.90
1,Afghanistan,1957,1.16e+10,8.54e+06,1356.62
2,Afghanistan,1958,1.22e+10,8.68e+06,1409.99
3,Afghanistan,1959,1.26e+10,8.83e+06,1424.04
4,Afghanistan,1960,1.30e+10,9.00e+06,1448.63
...,...,...,...,...,...
13318,Zimbabwe,2014,2.12e+10,1.36e+07,1562.00
13319,Zimbabwe,2015,2.10e+10,1.38e+07,1522.11
13320,Zimbabwe,2016,2.10e+10,1.40e+07,1494.03
13321,Zimbabwe,2017,2.19e+10,1.42e+07,1541.65


In [861]:
# 2016
ranked_gdp_df = gdp_df[gdp_df['year']==default_year].sort_values('gdp', ascending=True)
gdp_fig = make_subplots(rows=1, cols=2, 
                    subplot_titles=("GDP", "GDP per capita"), 
                    shared_yaxes=True,
                    horizontal_spacing=0)
gdp_fig.add_trace(go.Bar(
                        x=ranked_gdp_df['gdp'].tail(10),
                        y=ranked_gdp_df['country'].tail(10),
                        orientation='h',
                        name='2016 Top 10 GDP'),
                        row=1,col=1)

gdp_fig.add_trace(go.Bar(
                        x=ranked_gdp_df['gdp_per_capita'].tail(10),
                        y=ranked_gdp_df['country'].tail(10),
                        orientation='h',
                        name='2016 Top 10 GDP per capita'),
                        row=1,col=2)
## Bottom countries data
gdp_fig.add_trace(go.Bar(
                        x=ranked_gdp_df['gdp'].head(10),
                        y=ranked_gdp_df['country'].head(10),
                        orientation='h',
                        name='2016 Bottom 10 GDP',
                        visible='legendonly'),
                        row=1,col=1)

gdp_fig.add_trace(go.Bar(
                        x=ranked_gdp_df['gdp_per_capita'].head(10),
                        y=ranked_gdp_df['country'].head(10),
                        orientation='h',
                        name='2016 Bottom GDP per capita',
                        visible='legendonly'),
                        row=1,col=2)
gdp_fig.update_xaxes(title_text="$ USD")
gdp_fig.update_layout(title_text = "2016 GDPs in the World",
                    template='plotly_white',
                    showlegend=False)
# Add dropdown
gdp_fig.update_layout(
    updatemenus=[go.layout.Updatemenu(
        yanchor='top',
        y=1.12,
        x=-0.06,
        active=0,
        buttons=list(
            [dict(label = 'Top 10 GDP',
                  method = 'update',
                  args = [{'visible': [True, True, False, False]},
                          ]),
             dict(label = 'Bottom 10 GDP',
                  method = 'update',
                  args = [{'visible': [False, False, True, True]},
                          ]),
            ])
        )
    ])
gdp_fig.show()


# 1990
ranked_gdp_df = gdp_df[gdp_df['year']==1990].sort_values('gdp', ascending=True)
gdp_fig = make_subplots(rows=1, cols=2, 
                    subplot_titles=("GDP", "GDP per capita"), 
                    shared_yaxes=True,
                    horizontal_spacing=0)
gdp_fig.add_trace(go.Bar(
                        x=ranked_gdp_df['gdp'].tail(10),
                        y=ranked_gdp_df['country'].tail(10),
                        orientation='h',
                        name='1990 Top 10 GDP'),
                        row=1,col=1)
gdp_fig.add_trace(go.Bar(
                        x=ranked_gdp_df['gdp_per_capita'].tail(10),
                        y=ranked_gdp_df['country'].tail(10),
                        orientation='h',
                        name='1990 Top 10 GDP per capita'),
                        row=1,col=2)
## Bottom countries data
gdp_fig.add_trace(go.Bar(
                        x=ranked_gdp_df['gdp'].head(10),
                        y=ranked_gdp_df['country'].head(10),
                        orientation='h',
                        name='1990 10 Bottom GDP',
                        visible='legendonly'),
                        row=1,col=1)

gdp_fig.add_trace(go.Bar(
                        x=ranked_gdp_df['gdp_per_capita'].head(10),
                        y=ranked_gdp_df['country'].head(10),
                        orientation='h',
                        name='1990 10 Bottom GDP per capita',
                        visible='legendonly'),
                        row=1,col=2)
gdp_fig.update_xaxes(title_text="$ USD")
gdp_fig.update_layout(title_text = "1990 GDPs in the World",
                    template='plotly_white',
                    showlegend=False)
# Add dropdown
gdp_fig.update_layout(
    updatemenus=[go.layout.Updatemenu(
        yanchor='top',
        y=1.12,
        x=-0.06,
        active=0,
        buttons=list(
            [dict(label = 'Top 10 GDP',
                  method = 'update',
                  args = [{'visible': [True, True, False, False]},
                        ]),
             dict(label = 'Bottom 10 GDP',
                  method = 'update',
                  args = [{'visible': [False, False, True, True]},
                          ]),
            ])
        )
    ])
gdp_fig.show()

**Observations**

These graphs compare GDP and GDP per capita for the top and bottom 10 countries ordered by descending GDPs. When comparing 1990 with 2016, GDP and GDP per capita has increased for all countries. Let's discuss the difference between GDP and GDP per capita. GDP is a measure of a country's wealth from production and their economic health. GDP per capita is a measure of people's quality of living by calculating the amount people would receive if GDP was shared equally. Thus, a large country such as China may have a high GDP, surpassing United States. However due to their large population, their GDP per capita is very low. Thus, when looking at our future analysis and defining wealth, it is important to consider the fact that countries with GDP do not necessary have high GDP per capita.

### GDP vs CO2 Emissions
To understand if a country's wealth is correlated with pollution levels, we want to explore the relationship between GDP per capita with CO2 emission levels per capita. To also get a general grasp of which countries are wealthy, a column for continent is also added. Below is the table we will be using for this section. The following is a description of some main variables:

* CO2 per capita (tCO2): Annual production-based emissions of carbon dioxide (CO2), measured in tonnes per person. This is based on territorial emissions, which do not account for emissions embedded in traded goods.
  
* Total GHG per capita (tCO2-eq): Total greenhouse gas emissions including land use change and forestry, measured in tonnes of carbon dioxide-equivalents per capita.

* GDP per capita (\\$ USD): Gross domestic product measured in international-$ using 2011 prices to adjust for price changes over time (inflation) and price differences between countries. Calculated by multiplying GDP per capita with population.

In [862]:
co2_fig1 = px.scatter(co2_df[co2_df['year']==1990], x='gdp_per_capita', y='co2_per_capita', 
                 hover_data = ['country', 'year','gdp_per_capita', 'co2_per_capita'],
                 labels= {'gdp_per_capita': 'GDP per capita (USD)', 
                          'co2_per_capita': 'tCO2', 
                          'country': 'Country', 'year': 'Year', 'continent': 'Continent'},
               color_discrete_sequence=['grey'],
               size='population', size_max=50)
co2_fig1.update_traces(opacity = 0.3, name = '1990', showlegend=True)

co2_fig2 = px.scatter(co2_df[co2_df['year']==2016], x='gdp_per_capita', y='co2_per_capita', 
                 hover_data = ['country', 'year','gdp_per_capita', 'co2_per_capita'],
                 labels= {'gdp_per_capita': 'GDP per capita (USD)', 
                          'co2_per_capita': 'tCO2', 
                          'country': 'Country', 'year': 'Year', 'continent': 'Continent'},
                 color_discrete_sequence=px.colors.qualitative.Bold,
                 color='continent',
                 size='population', size_max=40,
                 )
co2_fig2.update_traces(marker=dict(sizemin=3))
co2_fig3 = go.Figure(data=co2_fig1.data+co2_fig2.data)
co2_fig3.update_layout(template="plotly_white",
                   title='2016 CO2 Emissions per capita vs GDP per capita \t (Semi-Log)',
                   xaxis_title= 'GDP per capita (USD)',
                   yaxis_title= 'Annual CO2 emissions per capita (tCO2)'
                  )
co2_fig3.update_xaxes(type="log")
co2_fig3.update_yaxes(range=[-10,40])
co2_fig3.add_annotation(text="United States",
                  xref="x", yref="y",
                  x=4.9, y=18.3, showarrow=False,
                   font_color='rgb(242,183,1)')
co2_fig3.add_annotation(text="China",
                  xref="x", yref="y",
                  x=4.18, y=12, showarrow=False,
                   font_color='rgb(127,60,141)')
co2_fig3.add_annotation(text="Qatar",
                  xref="x", yref="y",
                  x=5.1, y=36., showarrow=False,
                   font_color='rgb(127,60,141)')
co2_fig3.add_annotation(text="India",
                  xref="x", yref="y",
                  x=3.86, y=-3, showarrow=False,
                   font_color='rgb(127,60,141)')
co2_fig3.add_annotation(text="Lybia",
                  xref="x", yref="y",
                  x=3.5, y=.35, showarrow=False,
                   font_color='rgb(57,105,172)')
co2_fig3.add_annotation(text="Ethiopia",
                  xref="x", yref="y",
                  x=3.28, y=2.5, showarrow=False,
                   font_color='rgb(57,105,172)')
co2_fig3.add_annotation(text="Russia",
                  xref="x", yref="y",
                  x=4.47, y=12, showarrow=False,
                   font_color='rgb(17,165,121)')
config = {
  'toImageButtonOptions': {
    'format': 'jpeg', # one of png, svg, jpeg, webp
    'filename': 'emissions_gdp',
    'scale': 2 # Multiply title/legend/axis/canvas sizes by this factor
  }
}
co2_fig3.show(config=config)

### Observations
Using GDP to measure the wealth of a nation, we can see that there is indeed a positive correlation between the wealth of a nation and amount of CO2 emissions. 

In the graph, the size of the dots represent the size of the population. The majority of the countries are clustered around 10~25k GDP per capita and emit around 5~10 tCO2. Compared to 1990 in grey, many countries in Asia became wealthier as they became the world’s factory, but also increased in emissions. Europe maintained emissions around 5~10 tCO2. Most of Africa has both low emissions and GDP per capita. 

However, it becomes apparent that rich and poor is subjective phrase and GDP can be very misleading. UN Summit debates target this argument towards major countries such as China, Europe, and USA. However, China has a high GDP yet their per capita GDP is very low due to their large population. India also has among the highest GDPs, but their per capita GDP is very low. An interesting observation is Qatar, which is on the extreme end and has highest emissions AND GDP per capita due to having oil and gas as their main economic sector. But when considering their small population, their GDP would not be as high as the Big5 nations. Secondly there is a clear nonlinear trend between CO2 emissions per capita and GDP per capita per capita.

Let's take a look at Total GHG emissions, which accounts for other polluting gases in addition to CO2. 

In [863]:
ghg_fig1 = px.scatter(co2_df[co2_df['year']==1990], x='gdp_per_capita', y='ghg_per_capita', 
                 hover_data = ['country', 'year','gdp_per_capita', 'ghg_per_capita'],
                 labels= {'gdp_per_capita': 'GDP per capita (USD)', 
                          'ghg_per_capita': 'tCO2-eq', 
                          'country': 'Country', 'year': 'Year', 'continent': 'Continent'},
               color_discrete_sequence=['grey'],
               size='population', size_max=50)
ghg_fig1.update_traces(opacity = 0.3, name = '1990', showlegend=True)

ghg_fig2 = px.scatter(co2_df[co2_df['year']==2016], x='gdp_per_capita', y='ghg_per_capita', 
                 hover_data = ['country', 'year','gdp_per_capita', 'ghg_per_capita'],
                 labels= {'gdp_per_capita': 'GDP per capita (USD)', 
                          'ghg_per_capita': 'tCO2-eq', 
                          'country': 'Country', 'year': 'Year', 'continent': 'Continent'},
                 color_discrete_sequence=px.colors.qualitative.Bold,
                 color='continent',
                 size='population', size_max=40,
                 )
ghg_fig2.update_traces(marker=dict(sizemin=3))
ghg_fig3 = go.Figure(data=ghg_fig1.data+ghg_fig2.data)
ghg_fig3.update_layout(template="plotly_white",
                   title='2016 GHG Emissions per capita vs GDP per capita \t (Semi-Log)',
                   xaxis_title= 'GDP per capita (USD)',
                   yaxis_title= 'Annual GHG emissions per capita (tCO2-eq)'
                  )
ghg_fig3.update_xaxes(type="log")
ghg_fig3.update_yaxes(range=[-10,40])
ghg_fig3.add_annotation(text="United States",
                  xref="x", yref="y",
                  x=4.99, y=19.9, showarrow=False,
                   font_color='rgb(242,183,1)')
ghg_fig3.add_annotation(text="China",
                  xref="x", yref="y",
                  x=4.18, y=12.8, showarrow=False,
                   font_color='rgb(127,60,141)')
ghg_fig3.add_annotation(text="Qatar",
                  xref="x", yref="y",
                  x=5.1, y=36., showarrow=False,
                   font_color='rgb(127,60,141)')
ghg_fig3.add_annotation(text="India",
                  xref="x", yref="y",
                  x=3.86, y=-3, showarrow=False,
                   font_color='rgb(127,60,141)')
ghg_fig3.add_annotation(text="Zambia",
                  xref="x", yref="y",
                  x=3.5, y=32, showarrow=False,
                   font_color='rgb(57,105,172)')
ghg_fig3.add_annotation(text="Ethiopia",
                  xref="x", yref="y",
                  x=3.28, y=4, showarrow=False,
                   font_color='rgb(57,105,172)')
ghg_fig3.add_annotation(text="Russia",
                  xref="x", yref="y",
                  x=4.53, y=17, showarrow=False,
                   font_color='rgb(17,165,121)')

ghg_fig3.show(config=config)

### Observations 
Plotting greenhouse gas emissions in the same way as CO2 emissions, the conclusions are slightly different. The correlation between emissions and wealth is not as strong as the CO2 emissions graph which we have included in our appendix. There are countries such as Zambia that have extremely high emissions yet does not have a high GDP per capita. And the reason behind this is that there is many other factors independently influencing GDP and emissions. To have all high-polluters take monetary responsibility for their pollution may not be practical for all countries. And to only charge rich countries would not be very reasonable either.

## (2) Are rich countries polluting more than poorer countries? 
In terms of wealth, the range of GDPs have broadened as time progresses. Countries in Asia have seen a rapid increase in GDP, most likely due to their growth in the manufacturing sector. Over the years, countries all around the world has seen an increase in both GDP and CO2, except for a handful of African countries. There is also a notable amount of countries (belonging to various continents) with extremely high emissions, yet is not among the top in GDP.

There is a lose positive correlation between GDP per capita and CO2 emissions per capita based on the above graph. GHG Emissions has a very loose correlation. Therefore, richer countries are indeed polluting more than poorer countries based on CO2 emissions vs GDP per capita. However, by observing the graph, some factors to consider when incorporating policy would be defining 'the rich' and 'the poor', in addition to 'high emissions'. 