# Course project guidelines

Your assignment for the course project is to formulate and answer a question of your choosing based on one of the following datasets:

1. ClimateWatch historical emissions data: greenhouse gas emissions by U.S. state 1990-present
2. World Happiness Report 2023: indices related to happiness and wellbeing by country 2008-present
3. Any dataset from the class assignments or mini projects

A good question is one that you want to answer. It should be a question with contextual meaning, not a purely technical matter. It should be clear enough to answer, but not so specific or narrow that your analysis is a single line of code. It should require you to do some nontrivial exploratory analysis, descriptive analysis, and possibly some statistical modeling. You aren't required to use any specific methods, but it should take a bit of work to answer the question. There may be multiple answers or approaches to contrast based on different ways of interpreting the question or different ways of analyzing the data. If your question is answerable in under 15 minutes, or your answer only takes a few sentences to explain, the question probably isn't nuanced enough.

## Deliverable

Prepare and submit a jupyter notebook that summarizes your work. Your notebook should contain the following sections/contents:

* **Data description**: write up a short summary of the dataset you chose to work with following the conventions introduced in previous assignments. Cover the sampling if applicable and data semantics, but focus on providing high-level context and not technical details; don't report preprocessing steps or describe tabular layouts, etc.
* **Question of interest**: motivate and formulate your question; explain what a satisfactory answer might look like.
* **Data analysis**: provide a walkthrough with commentary of the steps you took to investigate and answer the question. This section can and should include code cells and text cells, but you should try to focus on presenting the analysis clearly by organizing cells according to the high-level steps in your analysis so that it is easy to skim. For example, if you fit a regression model, include formulating the explanatory variable matrix and response, fitting the model, extracting coefficients, and perhaps even visualization all in one cell; don't separate these into 5-6 substeps.
* **Summary of findings**: answer your question by interpreting the results of your analysis, referring back as appropriate. This can be a short paragraph or a bulleted list.

## Evaluation

Your work will be evaluated on the following criteria:

1. Thoughtfulness: does your question reflect some thoughtful consideration of the dataset and its nuances, or is it more superficial?
2. Thoroughness: is your analysis an end-to-end exploration, or are there a lot of loose ends or unexplained choices?
3. Mistakes or oversights: is your work free from obvious errors or omissions, or are there mistakes and things you've overlooked?
4. Clarity of write-up: is your report well-organized with commented codes and clear writing, or does it require substantial effort to follow?

## Data Description 

The data presented in this analysis focuses on greenhouse gas emissions from 194 countries and sourced from Climate Watch. Under the Paris Agreement, every nation participating is required to submit reports every 5 years detialing their green house emission, which is how ClimateWatch aquired the data. The amount of CO2 green house gasses are reported in metric tons of CO2.

Interestigly, some countries have negative emissions; this being Montenegro, Georgia, Fiji, Latvia, Micronesia, Cape Verde, Finland, Bhutan, Solomon Islands, and Romania. This is because these countries remove more carbon than they amit, either due to techniques in place to reduce emissions or due to a high volume of greenery and vegitation. 

## Data Cleanup
To clean this dataset, we dropped the collumns that had no difference within the categories (Data Source, Sector, Gas, and Unit), as well as the mising data (Namibia was missing a few points). Afterwards, we used an external library (pycountry_convert) to sort each country by continent, this will be useful in seing which country contributes the most CO2. Furthermore, we removed any rows where this function was not able to place the country in the continent and grouped the rest together. 

## Data Analysis
To answer the question of how different regions contribute to CO2 emissions we separated each country into continent. From there we plotted the mean emission per country, when we factored out the outliers there appears to be a realitively similar overall emission per continent. From here we decided to see what trends are in the individual countries. 

We plotted the mean emissions per year and found the countries with the highest change for Asia, then plotted that. Next we plotted the mean emissions per year for North America. There was a spike in North America during the time of the great depression so we decided to look at that. The we looked at Africa, plotted the values of emission per year, then looked at the countries with the biggest Change. Same with Europe. Oceana and South America just had their means plotted as we did not find many big trends within. 

### Output
#### World
Based on the means alone, Asia is the largest producer of green house gasses, followed by North America, South America, Europe, Africa and Oceana. Given that Asia houses two of the most populous countries in the world, it makes sense that they have such high emissions. Furthermore, countries in Asia are known for being manufacturers of a lot of world goods increasing their Emissions. North America is also high; the United States is a giant producer of goods and contributes a lot to emissions. 
The trend is similar as most countries that produce goods, oil, or manufacture tend to have much higher emissions that will increase the continents overall emissiosn. So if we factor out the outliers then the data is much closer in terms of overall means, thus we break it down the analysis to inter-continent. 
#### Asia
Asia houses two of the worlds largest countries, and as expected when we plot the overall emissions per year India and China are at the top. When we break it down by means the biggest means are also in India and China, with Indonesia, Iran, and North Korea also being up there. These countries also have the biggest change, which makes sense as we focus more on manufacturing and oil production these countries with large manufacturing plants and oil reserves will continue to rise in their emissions. 

#### North and Central America
In North America, the major countries of United States, Canada, and Mexico have the highest emissions. These countries being mostly developed does explain that. However, what is interesting is if we look at the graph between 2008 and 2009, the time of the Great Recession. We see many countries have a spike in their emissions during that time. When we analyized this we found that countries like the Bahamas, Barbados, and Guatemala, among others, had big emissions gains over those years. Possibly as a result of financial struggles these countries produced more goods to make up for it. What is interesting is some countries like the United States and Canada actually emitted less possibly due to financial concerns as well. *Maybe mention the fact that the slope of the lines appears to be far less in NA vs Asia*

#### North Africa
Looking at North Africa the graph of yearly emissions is quite interesting. The Democratic Republic of the Congo is at the top and with a big jump in 2010, while South Africa had a much slower rise. Meanwhile Nigeria appears to have gone down a little before coming back up. Interesting as well, It looks like Cote D'voire had something happen in between 2015 and 2016 that caused the emissions to fall. Part of the reason DRC might have had such a big jump could be deforestation, as they hold the second bigest rainforest in the world. South Africa could have also been rising as they are a big miner of coal and that causes a lot of pollution when mined and burned. Cote d'Ivoire actually introduced a plan to tackle the emissions which is how they have dropped. 

#### Europe
When looking at Europe it is very interesting, most of the countries loom to have a more negative trend as time goes on. Especially a country like Romania which has dropped its emissions by a lot Similar to Cote' DIvoire, Romania put more legislations to reduce the countries emissions. In face some countries like Finland and Romania also have negative emissions values. This is due to legislations and other rules that use the excess carbon or reduce it which lowers the emissions overall. So while the countries are not emissions free they have a net negative emissions as they are finding other uses for it.  Europe as a country has been focusing a lot on making themselves more sustainable and hence the reason we see the negative trend. 

#### Oceana
In oceana there are very few heavily populated cities, which is why Australia is at the highest, it is a very big nation with a much larger population than anything else in that region. What is interesting to see is countries like Fiji also have a negative emissions. This count be due to how green the country is that the plants are able to use the emission to reduce the emissions of carbon dioxide. Fiji may have also been efected by the storms and higher sea levels (it is a bunch of islands) and increased tourism which also caused a spike. Outside of Fuji the rest of Oceana seems relativly stable and most countries are rahter low in global emissions

#### South America
South America has a lot of Rain Forests and deforestation is a big problem leading to more emissions. Brazil is the top, being the biggest coutnry and exporter in South America its emssions are high, not to mention there have been a lot of attempts of deforesting the Amazon. However, in 2012 the deforestation rates droped and a drought also reduced the amount of agriculture based emissions. Other interesting things Venuezuela saw an increase in crime and political unrest which could have lowered the gas and other exports lowering their emissions. Interestingly Chile saw a drop and increase in emissions. This could also be due to some geopolitical issues that arouse in South America. 


## Summary of Findings
While some contients do have more carbon emissions that others, if we factor by outliers then it shows that the continents have, mostly, similar carbon emissions. However, the more improtant thing is different continents have different trends and different countries have big roles in the carbon emissions. Countries in Asia have all been big manufacturers (India and China) or big oil countries (Saudia Arabia), and this is why the trend of Asia seems to be increasing carbon emissions. The same trend is present in Africa. In North and Latin America the big three countries of Canada, United States, and Mexico are the biggest emmiters however, the trend is much more stable. Europe, on the other hand seems to be lowering their carbon foot print, with some countries like Finland and Romania even hitting negative emissions. South America and Oceana tend to be stable but with more fluctuations. 

What is interesting is we can explain a lot of these emissions based on a countries status (especially compared to the rest of the region), as the bigger countries tend to emit more carbon, and the countires that manufacture and mine also have huge emissions. Furthermore, a lot of anomolies like the bumpo in 2008-2009 for North and Latin America can be explained with events around the world like the Great Recession. Things like the Democratic Republic of the Congo had a huge increase posibly due to the deforestation in the area.  While other contries like Romania put in legislation to reduce their overall emissions. The graphs show a trend that world events have a big impact on carbon emissions and that countries that are more developed tend to have higher emissions. 

## Conclusion
While it is hard to figure out the impact each continent has on CO2 emissions, we can clearly see that different countries play their role in release of green house gasses. Big countries in Asia have a role in manufacturing goods used in the world that increase emissions. Countries like the US and Canada have big populations and other manufacturing that pushes their emissions higher. We can also see the effect that events like the recession had and deforestation. As a future possibility we could dive into certain contients and subset the time frame to see what an event like the Recession had, or the Cuban political crises. We can see what these events had on the world as well. Another posibility would be finding data of emissions plans that countries have and compare their effectiveness to better determine what we can do to protect our planet. 




## Question

How do different regions contribute to CO2 emissions around the world in the past thirty years, and do certain countries display anything interesting? 

In [1]:
%%capture --no-display
!pip install pycountry_convert
import pandas as pd 
import numpy as np
import altair as alt
import pycountry_convert as pc


In [2]:
data= pd.read_csv('data/historical_emissions/historical_emissions.csv')

# get rid of the columns with no varaince 
data_subset = data.drop(columns = ['Data source', 'Sector', 'Gas', 'Unit'])


# loking at univariate data 
# let's look at the year
# we can make a verticle boxplot of all of the years
# talk about the before and after of missing data. Add this part in. Justify why excluding the observation with missing data is ok. 
data_subset.dropna(inplace = True)


# this function uses an externel library to convert all the country names into the region
# we are doing this to bin the countries into continents to directly answer the question
# write about this 

def country_to_continent(country_name):
    try:
        country_alpha2 = pc.country_name_to_country_alpha2(country_name)
        country_continent_code = pc.country_alpha2_to_continent_code(country_alpha2)
        country_continent_name = pc.convert_continent_code_to_continent_name(country_continent_code)
        return country_continent_name
    except:
        return None
    
# this calls the function    
data_subset['Continent'] = data_subset['Country'].apply(country_to_continent)

# Filter out rows where the continent is None
# explain that some of the rows were things like World and European Union.
# we take them out since we are not interested in them
df_filtered = data_subset.dropna(subset=['Continent'])
continent_df = df_filtered.groupby('Continent').apply(lambda x:x).drop(columns = ['Country','Continent'])

In [3]:
# we take the mean of each continent and compare them
# this is the plot on the left
# we see that asia, north america, and sout america are a lot higher than the others
chart = alt.Chart(continent_df.mean(axis = 1).groupby(level = 0).mean().reset_index().rename(columns = {0:'Mean'})).mark_bar().encode(
    x='Continent',
    y=alt.Y('Mean:Q', title='Mean Emissions'),
    color='Continent'
).properties(
    title='Mean Emissions by Continent',
    width = 500,
    height = 500
    
)

chart


# we plot the distributions using boxplots
# talk about how we are skeptical about the means since they could be skewed by outliars
# this is the reason we also create the plot on the right
# the right plot reveals that our intution was correct
# The medians by cintinent are a lot closer 
# we notice several outliars that are muc h higher for Asia and North America
# also just talk about the graph on the right in general this answers the first part of the question
chart2 = alt.Chart(continent_df.mean(axis = 1).reset_index().drop(columns=['level_1']).rename(columns={0:'Emissions'})).mark_boxplot().encode(
    x='Continent',
    y=alt.Y('Emissions:Q', title='Mean Emissions', scale = alt.Scale(type = 'sqrt')),
    color='Continent'
).properties(
    title='Distributions of Emissions by Continent',
    width = 500,
    height = 500
    
)

alt.hconcat(chart,chart2)

In [4]:
# we will now focus on countries and there differences
# explain that looking at the regions by continent did not explain muc of the variety in our data based off the visuals
# so now we will look at the indiviual countries
# the below chunk is for Asia

asia_df = df_filtered.loc[df_filtered['Continent'] == 'Asia', :].drop(columns = ['Continent'])
fig_1 = alt.Chart(asia_df.melt(id_vars='Country', var_name='Year', value_name='Value')).encode(
    x = alt.X('Year', title = None),
    y = alt.Y('Value', scale = alt.Scale(type = 'sqrt')),
    color = alt.Color('Country')
).mark_line(point = True)

# display
fig_1

In [5]:
# the above plot is sloppy
# there are too many countries
# we compute coountry means and sort
# we want to rank the following 
asia_df['Country means'] = asia_df.iloc[:,1:].mean(axis = 1)
asia_df.sort_values(by = ['Country means'], ascending = False).head()

Unnamed: 0,Country,2019,2018,2017,2016,2015,2014,2013,2012,2011,...,1998,1997,1996,1995,1994,1993,1992,1991,1990,Country means
1,China,12055.41,11821.66,11385.48,11151.31,11108.86,11228.48,11168.26,10675.66,10388.48,...,4095.97,3977.65,3982.11,3960.71,3557.37,3397.8,3168.05,3039.15,2891.73,7074.232
3,India,3363.6,3360.56,3215.07,3076.48,3003.07,2984.52,2804.34,2740.4,2584.75,...,1362.33,1331.88,1272.74,1223.65,1158.48,1114.22,1081.28,1056.25,1002.56,2041.631
5,Indonesia,1959.71,1692.36,1447.22,1434.46,2067.75,2015.5,1638.39,1702.3,1683.13,...,1366.9,2134.8,1164.23,1339.1,1302.7,1282.35,1266.98,1246.27,1226.82,1445.131667
8,Japan,1134.45,1172.32,1214.59,1229.82,1220.73,1256.16,1298.56,1286.53,1243.96,...,1157.78,1204.17,1216.15,1201.33,1185.98,1128.24,1134.81,1121.71,1106.26,1184.871
9,Iran,893.78,925.58,912.77,881.05,844.14,844.13,815.31,793.95,793.62,...,441.72,439.31,420.32,405.33,394.82,362.41,354.38,332.58,304.22,615.616667


In [6]:
# find the change of emmissions
# we also comptue the overall change and sort
asia_df["Change"] = asia_df['2019'] - asia_df['1990']
asia_df.sort_values(by = ['Change'], ascending = False).head()

Unnamed: 0,Country,2019,2018,2017,2016,2015,2014,2013,2012,2011,...,1997,1996,1995,1994,1993,1992,1991,1990,Country means,Change
1,China,12055.41,11821.66,11385.48,11151.31,11108.86,11228.48,11168.26,10675.66,10388.48,...,3977.65,3982.11,3960.71,3557.37,3397.8,3168.05,3039.15,2891.73,7074.232,9163.68
3,India,3363.6,3360.56,3215.07,3076.48,3003.07,2984.52,2804.34,2740.4,2584.75,...,1331.88,1272.74,1223.65,1158.48,1114.22,1081.28,1056.25,1002.56,2041.631,2361.04
5,Indonesia,1959.71,1692.36,1447.22,1434.46,2067.75,2015.5,1638.39,1702.3,1683.13,...,2134.8,1164.23,1339.1,1302.7,1282.35,1266.98,1246.27,1226.82,1445.131667,732.89
9,Iran,893.78,925.58,912.77,881.05,844.14,844.13,815.31,793.95,793.62,...,439.31,420.32,405.33,394.82,362.41,354.38,332.58,304.22,615.616667,589.56
11,Saudi Arabia,723.15,715.23,729.31,739.82,731.89,698.29,654.85,638.88,601.75,...,310.73,304.34,288.63,289.3,289.62,291.77,274.22,241.01,466.897,482.14


In [7]:
# now we replot the interesting countris with each other
# We use the above tables to selectt interesting countries 
# super interesting ones, China, grew a lot 
# india, also grew alot 
subset_asia = asia_df.loc[asia_df['Country'].isin(['China', 'India', 'Indonesia', 'Japan', 'Iran', 'Saudi Arabia', 'North Korea'])].drop(columns = ['Country means', 'Change'])
fig_2 = alt.Chart(subset_asia.melt(id_vars='Country', var_name='Year', value_name='Value')).encode(
    x = alt.X('Year', title = None),
    y = alt.Y('Value', scale = alt.Scale(type = 'sqrt'), title = 'Emissions'),
    color = alt.Color('Country')
).mark_line(point = True)

# display
fig_2

In [8]:
# we perform a similar functions to what we did above
# this time though notice that some countries exhibit this trend where there is the same pattern from 2009-2010
# be careful about the scale 
# just see if you can find a potential reason of why the countries exhibit this similar trend

north_df = df_filtered.loc[df_filtered['Continent'] == 'North America', :].drop(columns = ['Continent'])
fig_4 = alt.Chart(north_df.melt(id_vars='Country', var_name='Year', value_name='Value')).encode(
    x = alt.X('Year', title = None),
    y = alt.Y('Value', scale = alt.Scale(type = 'log'), title = 'Emissions'),
    color = alt.Color('Country')
).mark_line(point = True)

# display
fig_4

In [9]:
# looking at the countries with the greatest means
# we do the same above
north_df['Country Means'] = north_df.iloc[:,1:].mean(axis = 1)
north_df.sort_values(by = ['Country Means'], ascending = False).head()

Unnamed: 0,Country,2019,2018,2017,2016,2015,2014,2013,2012,2011,...,1998,1997,1996,1995,1994,1993,1992,1991,1990,Country Means
2,United States,5771.0,5892.37,5689.61,5743.85,5665.21,5779.54,5734.28,5593.25,5811.96,...,6208.83,6160.86,5901.0,5729.69,5661.57,5567.55,5456.12,5372.08,5417.32,5926.039333
10,Canada,774.29,776.5,757.38,740.67,841.22,842.14,834.47,822.51,821.99,...,696.18,679.96,668.42,646.48,631.24,613.16,617.15,602.81,606.91,804.708
14,Mexico,670.84,669.63,688.06,689.71,681.94,663.46,674.64,687.03,679.06,...,546.58,513.22,490.07,471.42,491.88,465.29,447.97,445.04,426.29,559.809667
102,Guatemala,38.49,37.05,35.21,35.81,38.64,35.66,34.63,33.39,32.71,...,35.52,31.85,31.59,31.31,30.28,30.08,29.44,28.24,27.88,34.382
103,Nicaragua,38.41,37.87,37.8,38.05,29.28,28.36,27.42,27.16,26.8,...,30.87,29.91,29.08,29.06,29.68,29.37,28.7,27.8,29.16,33.110667


In [10]:
# want to see the countries with that big spike in 2009
# notice the recession column this is try to identify wher ethe peaks are occuring
north_df['Recession'] = north_df['2009'] - north_df['2008']
north_df.sort_values(by = ['Recession'], ascending = False).head(8)

Unnamed: 0,Country,2019,2018,2017,2016,2015,2014,2013,2012,2011,...,1997,1996,1995,1994,1993,1992,1991,1990,Country Means,Recession
162,Bahamas,3.18,3.13,2.57,2.39,2.53,2.83,3.17,3.85,2.92,...,2.15,2.35,2.43,2.29,2.32,2.79,2.26,2.18,2.73,3.95
161,Barbados,3.79,3.76,3.69,3.78,3.74,3.73,3.89,4.55,4.17,...,3.48,3.37,3.36,3.3,3.23,3.38,3.14,3.11,3.785333,2.85
102,Guatemala,38.49,37.05,35.21,35.81,38.64,35.66,34.63,33.39,32.71,...,31.85,31.59,31.31,30.28,30.08,29.44,28.24,27.88,34.382,1.64
119,Panama,25.3,21.9,21.74,22.29,22.06,22.07,21.64,21.66,20.96,...,15.69,15.75,15.76,15.56,15.27,15.01,14.34,13.85,17.993667,1.59
155,Belize,6.85,6.74,6.81,6.75,6.99,6.76,6.87,6.89,7.22,...,10.54,11.53,12.69,13.78,13.68,13.61,13.39,13.18,7.791,1.03
171,Antigua and Barbuda,1.22,1.19,1.15,1.12,1.09,1.06,1.05,1.26,1.09,...,0.51,0.48,0.45,0.43,0.42,0.45,0.37,0.35,0.823667,0.96
176,Saint Lucia,0.74,0.72,0.72,0.69,0.68,0.68,0.67,0.91,0.74,...,0.77,0.85,0.91,0.96,0.91,0.9,0.79,0.76,0.741667,0.88
104,Cuba,38.19,39.24,38.78,40.1,32.51,30.52,32.51,32.14,31.13,...,32.73,30.5,27.81,26.66,25.23,28.66,34.83,44.02,32.702333,0.79


In [11]:
north_df.sort_values(by = ['Recession'], ascending = False).tail()

Unnamed: 0,Country,2019,2018,2017,2016,2015,2014,2013,2012,2011,...,1997,1996,1995,1994,1993,1992,1991,1990,Country Means,Recession
115,Trinidad and Tobago,28.47,28.81,28.87,28.65,32.23,33.14,33.11,32.13,32.57,...,13.62,13.15,12.07,12.46,15.88,20.02,20.09,20.17,23.531,-0.81
144,Jamaica,10.15,10.34,8.92,9.27,8.83,8.94,9.13,8.67,9.19,...,11.24,11.1,10.76,10.42,10.38,10.18,9.37,9.37,9.834333,-1.75
14,Mexico,670.84,669.63,688.06,689.71,681.94,663.46,674.64,687.03,679.06,...,513.22,490.07,471.42,491.88,465.29,447.97,445.04,426.29,559.809667,-9.59
10,Canada,774.29,776.5,757.38,740.67,841.22,842.14,834.47,822.51,821.99,...,679.96,668.42,646.48,631.24,613.16,617.15,602.81,606.91,804.708,-34.98
2,United States,5771.0,5892.37,5689.61,5743.85,5665.21,5779.54,5734.28,5593.25,5811.96,...,6160.86,5901.0,5729.69,5661.57,5567.55,5456.12,5372.08,5417.32,5926.039333,-426.48


In [12]:
# This gives you some of the countries talk about the grreatest emmisssions from countries
# Look at some ofthe countries with the peaks
# maybe these epcific countries can provide you a clue about what happened in 08. 

subset_north = north_df.loc[north_df['Country'].isin(['Bahamas', 'Barbados', 'Guatemala', 'Panama', 'Belize', 'United States', 'Canada',
                                                     'Mexico', 'Antigua and Barbuda', 'Dominica'])].drop(columns = ['Recession', 'Country Means'])
fig_2 = alt.Chart(subset_north.melt(id_vars='Country', var_name='Year', value_name='Value')).encode(
    x = alt.X('Year', title = None),
    y = alt.Y('Value', scale = alt.Scale(type = 'log'), title = 'Emissions'),
    color = alt.Color('Country')
).mark_line(point = True)

# display
fig_2

In [13]:
# we do the same process for africa
africa_df = df_filtered.loc[df_filtered['Continent'] == 'Africa', :].drop(columns = ['Continent'])

fig_4 = alt.Chart(africa_df.melt(id_vars='Country', var_name='Year', value_name='Value')).encode(
    x = alt.X('Year', title = None),
    y = alt.Y('Value', scale = alt.Scale(type = 'sqrt'), title = 'Emissions'),
    color = alt.Color('Country')
).mark_line(point = True)

# display
fig_4

In [14]:
# sort by the mean commentate
# grab the top countries and biggest change
africa_df['Mean'] = africa_df.iloc[:,1:].mean(axis = 1)
africa_df['Difference'] = africa_df['2019'] - africa_df['1990']
africa_df.sort_values(by = ['Mean'], ascending=False).head()

Unnamed: 0,Country,2019,2018,2017,2016,2015,2014,2013,2012,2011,...,1997,1996,1995,1994,1993,1992,1991,1990,Mean,Difference
13,Democratic Republic of the Congo,679.57,676.86,682.01,680.03,679.91,677.77,677.01,673.04,676.09,...,435.55,428.89,432.58,432.52,431.04,429.54,429.39,430.77,507.072667,248.8
17,South Africa,562.19,556.72,557.46,542.89,542.51,567.33,554.77,544.98,524.74,...,388.88,372.46,361.11,348.23,341.31,330.74,334.52,338.43,459.168,223.76
26,Nigeria,354.33,345.7,340.02,335.84,325.43,331.28,321.55,307.78,304.25,...,310.65,320.13,294.62,290.32,295.34,298.58,283.9,267.92,301.754333,86.41
28,Egypt,351.96,345.01,338.61,325.66,318.29,312.08,306.32,309.38,298.66,...,170.01,161.15,153.38,143.76,147.48,141.98,139.23,133.65,236.907333,218.31
33,Algeria,282.23,278.23,269.38,265.6,262.52,252.91,239.46,234.99,215.21,...,146.88,146.97,143.9,136.96,135.01,126.53,125.57,121.4,190.458,160.83


In [15]:
# look at the last few
africa_df.sort_values(by = ['Difference'], ascending=False).tail()

Unnamed: 0,Country,2019,2018,2017,2016,2015,2014,2013,2012,2011,...,1997,1996,1995,1994,1993,1992,1991,1990,Mean,Difference
87,Botswana,52.34,54.01,56.22,52.43,53.52,54.09,55.38,54.43,67.14,...,55.66,56.11,54.76,53.86,53.63,53.97,53.74,54.33,54.993667,-1.99
163,Gambia,2.86,2.66,2.8,2.77,2.88,2.72,2.58,2.51,2.43,...,3.83,4.15,4.49,4.96,5.43,5.31,5.2,5.04,3.214333,-2.18
140,Ghana,12.75,10.94,9.3,7.24,61.02,59.55,60.02,57.97,55.38,...,25.69,24.38,23.49,23.14,22.8,22.58,22.05,21.97,42.168667,-9.22
99,Madagascar,40.22,39.67,39.59,40.4,39.48,38.7,38.84,39.93,39.22,...,55.93,56.55,55.92,55.42,55.66,55.27,55.01,54.98,48.665,-14.76
88,Côte d'Ivoire,51.51,50.74,51.0,50.65,49.29,48.52,48.61,47.15,45.5,...,84.08,82.32,81.5,80.58,77.75,75.07,72.44,69.72,60.477333,-18.21


In [16]:
# look at the top one.
# try to figure out what happened to this ocuntry
# same with Ghana
# just comment whats interesting

subset_africa = africa_df[africa_df['Country'].isin(['Democratic Republic of the Congo', 'South Africa', 'Nigeria', 'Egypt', 'Algeria', 'Ethiopia', 'Botswana',
                                                     'Gambia', 'Ghana', 'Madagascar'])].drop(columns = ['Difference', 'Mean'])
fig_2 = alt.Chart(subset_africa.melt(id_vars='Country', var_name='Year', value_name='Value')).encode(
    x = alt.X('Year', title = None),
    y = alt.Y('Value', scale = alt.Scale(type = 'sqrt'), title = 'Emissions'),
    color = alt.Color('Country')
).mark_line(point = True)

# display
fig_2

In [17]:
# comment whatever you want something 
# that orange dip seems super interesting

europe_df = df_filtered.loc[df_filtered['Continent'] == 'Europe', :].drop(columns = ['Continent'])

fig_4 = alt.Chart(europe_df.melt(id_vars='Country', var_name='Year', value_name='Value')).encode(
    x = alt.X('Year', title = None),
    y = alt.Y('Value', scale = alt.Scale(type = 'sqrt'), title = 'Emissions'),
    color = alt.Color('Country')
).mark_line(point = True)

# display
fig_4

In [18]:
europe_df['Mean'] = europe_df.iloc[:,1:].mean(axis = 1)
europe_df['Difference'] = europe_df['2015'] - europe_df['2010']
europe_df.sort_values(by = ['2015'], ascending=True).head()

Unnamed: 0,Country,2019,2018,2017,2016,2015,2014,2013,2012,2011,...,1997,1996,1995,1994,1993,1992,1991,1990,Mean,Difference
67,Romania,78.36,79.91,79.25,76.46,-184.27,-186.55,-185.86,-176.58,-174.27,...,149.46,161.04,157.38,151.83,155.93,166.73,186.41,224.74,77.691333,-287.81
82,Finland,58.42,62.43,60.86,63.65,-0.28,3.2,7.24,6.46,12.1,...,43.79,45.69,39.23,43.21,36.56,35.32,38.43,38.54,43.998333,-60.26
188,Liechtenstein,0.16,0.16,0.17,0.16,0.17,0.18,0.21,0.2,0.19,...,0.25,0.23,0.23,0.23,0.24,0.23,0.23,0.22,0.226667,-0.06
178,Andorra,0.63,0.62,0.59,0.59,0.58,0.57,0.59,0.6,0.6,...,0.51,0.49,0.46,0.44,0.44,0.43,0.43,0.43,0.565667,-0.05
169,Malta,2.13,2.04,2.02,1.88,2.18,2.86,2.84,3.17,2.99,...,2.73,2.58,2.63,2.76,3.06,2.41,2.43,2.53,2.68,-0.82


In [20]:
# when looking at the graphs below there are several countries that share a similar pattern to Romania
# look up why this could be
subset_europe = europe_df[europe_df['Country'].isin(['Russia', 'Germany', 'Poland', 'United Kindom', 'Italy', 'France', 'Serbia',
                                                     'Ukraine', 'Bulgaria', 'Spain','Romania'])].drop(columns = ['Difference', 'Mean'])
fig_2 = alt.Chart(subset_europe.melt(id_vars='Country', var_name='Year', value_name='Value')).encode(
    x = alt.X('Year', title = None),
    y = alt.Y('Value', scale = alt.Scale(type = 'sqrt'), title = 'Emissions'),
    color = alt.Color('Country')
).mark_line(point = True)

# display
fig_2

In [21]:
ocean_df = df_filtered.loc[df_filtered['Continent'] == 'Oceania'].drop(columns = ['Continent'])
fig_4 = alt.Chart(ocean_df.melt(id_vars='Country', var_name='Year', value_name='Value')).encode(
    x = alt.X('Year', title = None),
    y = alt.Y('Value', scale = alt.Scale(type = 'sqrt')),
    color = alt.Color('Country')
).mark_line(point = True)

# display
fig_4

In [22]:
south_df = df_filtered.loc[df_filtered['Continent'] == 'South America', :].drop(columns = ['Continent'])
fig_4 = alt.Chart(south_df.melt(id_vars='Country', var_name='Year', value_name='Value')).encode(
    x = alt.X('Year', title = None),
    y = alt.Y('Value', scale = alt.Scale(type = 'log')),
    color = alt.Color('Country')
).mark_line(point = True)

# display
fig_4

In [23]:
# Chile also exhibited this
# so did one other country in the graph above
# complete what I did for the graphs 
# sort through the means
# the max difference m