In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
df_emissions = pd.read_csv('../input/co2-ghg-emissionsdata/co2_emission.csv')

This dataset contains CO2 and GHG emissions for countries since 1750 until 2017.
The source is OurWorldInData (https://ourworldindata.org/co2-and-other-greenhouse-gas-emissions).

In [None]:
df_emissions.info()

In [None]:
df_emissions.describe()

In [None]:
df_emissions.head()

Let's try to dig a little further into the dataset

In [None]:
df_emissions['Entity'].value_counts()

Apparently we have data referring to macro-areas such as Middle East, Africa etc. It might be useful to isolate these areas, avoiding overlapping in the values

In [None]:
df_emissions[df_emissions['Code'].isnull()]['Entity'].value_counts()

In [None]:
macro_areas = 'Africa,Europe (other),EU-28,Americas (other),Asia and Pacific (other),Middle East,International transport,Statistical differences,World'.split(',')

I'll change the name of the last column for easier typing

In [None]:
df_emissions.columns = ['Entity', 'Code', 'Year', 'Annual CO2 emissions (tonnes)']

In [None]:
df_emissions_countries = df_emissions[df_emissions['Entity'].isin(macro_areas) == False]

In [None]:
df_emissions_areas = df_emissions[df_emissions['Entity'].isin(macro_areas[:7])]

Let's have a general view on the amount of CO2 emissions registered worldwide

In [None]:
df_emissions[df_emissions['Entity'] == 'World'].plot.line(x='Year')

Now let's stop a second on the macro-areas data

In [None]:
sns.lineplot(x='Year', y='Annual CO2 emissions (tonnes)', hue='Entity', data=df_emissions_areas)
plt.tight_layout()
plt.title('Macro-region emissions')

There is an interesting path regarding EU countries, mainly after the '50s, when the yearly emissions start to decrease. Let's try to extrapolate more significant insights.

In [None]:
df_Europe1950 = df_emissions_areas[
    (df_emissions_areas['Entity'].isin('EU-28,Europe (other)'.split(',')))
    & (df_emissions_areas['Year'] > 1950)]

In [None]:
sns.lineplot(x='Year', y='Annual CO2 emissions (tonnes)', hue='Entity', data=df_Europe1950)
plt.title('EU Emissions from 1950')

We see a significant drop in EU-28 countries emissions after the mid '70s, with a relevant further acceleration around 2005.
This information needs to be seen in the perspective of the members countries agenda, mainly consideraing the entering in force of the Kyoto Protocol on the 16th February 2005. The second big step towards the reduction of the emissions was the Paris Agreement in 2015, which lead the EU towards the commitment to reduce greenhouse gas emissions by at least 40% by 2030 compared to 1990. With the European Green Deal voted in January 2020, the targets became even more ambitious, aiming at a reduction for 2030 to at least 50% and towards 55% compared with 1990 levels. The final goal of the program is to turn Europe, by 2050, the first climate-neutral continent. For further details about this exciting program, please refer to https://ec.europa.eu/info/strategy/priorities-2019-2024/european-green-deal_en

So let's see, among EU-27 7 top economies (by nominal GDP), which one is leading the transformation and whether there is any of the members falling short of expectations.

In [None]:
df_EU27_7 = df_emissions_countries[
    (df_emissions_countries['Entity'].isin('Italy Netherlands Poland France Germany Spain Sweden'.split(' ')))
    & (df_emissions_countries['Year'] > 1989)]

sns.lineplot(x='Year', y='Annual CO2 emissions (tonnes)', hue='Entity', data=df_EU27_7)
plt.title('Top 7 EU-27 Economies (1990 - 2017)')
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.figure(figsize=(10,8))
plt.tight_layout()


In [None]:
def EmissionsReduction(listOfCountries, frame):
    for i in listOfCountries:
        E = df_EU27_7[(df_EU27_7['Entity'] == i) & (df_EU27_7['Year'] == 2017)]['Annual CO2 emissions (tonnes)'].iloc[0]
        e = df_EU27_7[(df_EU27_7['Entity'] == i) & (df_EU27_7['Year'] == 1990)]['Annual CO2 emissions (tonnes)'].iloc[0]
        frame[i] = (E - e)/e


In [None]:
frame = {}
EmissionsReduction('Italy Netherlands Poland France Germany Spain Sweden'.split(' '), frame)
frame

Even though we see an emissions increase for Spain (21%), we should take in consideration that this country places quite low in this category among the EU-27 top 7 economies

Let's move now to the top 10 economies in the world.

In [None]:
largestEconomies = ['China', 'United States', 'India', 'Japan', 'Germany', 'Russia', 'Indonesia', 'Brazil', 'United Kingdom', 'France']

In [None]:
df_emissions_topEconomies = df_emissions[
    (df_emissions['Entity'].isin(largestEconomies))
    & (df_emissions['Year'] > 1900)]

In [None]:
sns.lineplot(x='Year', y='Annual CO2 emissions (tonnes)', hue='Entity', data=df_emissions_topEconomies)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.figure(figsize=(10,8))
plt.tight_layout()

Among all the data, the curve related to China's emissions is the most striking. It comes, however, with no surprise, if we consider the overwhelming growth of its economy from the '80s, thanks to the politics of Deng Xiaoping and the establishment of the SEZs (Special Economic Zones), the first huge experiment of capitalism in the Asian giant.
We need to bear in mind that in the "Macro-Region emission" graph "Asia and Pacific" is at the highest place for CO2 emissions.
The growth of Asian economies needs to be taken in consideration. The recent demand of the markets and the very nature of the world and global economy requires from the new raising powers to undergo stressing changes, with limited space for system-wide evaluations.
The current data, however, cannot be isolated from a broader historical context.

In [None]:
df_emissions_topEconomies.groupby('Entity').sum()['Annual CO2 emissions (tonnes)'].sort_values(ascending=False).plot.bar(color='#C87807')
plt.title('Historical Total Emissions (from 1900)')

This graph shows exactly what was mentioned briefly above. As devastating as China's emissions (as well as those of the Asian area) might be at the present, it would be unfair to not look at it from an historic perspective.
The dramatic growth of Asian economies is relatively young. Other big economies such as US, EU and Russia have an oldest history of growth and pollution. Most of these countries, during the last decades, decreased in emissions, mainly because their economies have already reached their maturity in a capitalistic perspective. What is happening now in China and in Asia is what happened in Europe and in the States after the industrial revolution. As regulamentation are needed, mainly for the dramatic situation to which we are leading our planet, we need to take in account this major historical delay that persists among different areas of the world. 

Another big point of discussion is how to read the raw data shown above. There has been attempts in measuring those values in relation to other variables such as population, country areas, etc...
Let's evaluate some stats from this perspective, keeping our focus on the biggest world economies by nominal GDP and narrowing our field to the year 2017 (last available in the dataset)

"Latest official GDP figures published by the World Bank. Population figures based on United Nations data.
World's GDP is $80,934,771,028,340 (nominal, 2017)"
https://www.worldometers.info/gdp/gdp-by-country/

In [None]:
def Emissions2017(entity):
    return df_emissions[(df_emissions['Entity'] == entity) & (df_emissions['Year'] == 2017)]['Annual CO2 emissions (tonnes)'].iloc[0]

gdp_population2017 = {
    1 : ['United States', 19485394000000, 325084756, 9525067, Emissions2017('United States')],
    2 : ['China', 12237700479375, 1421021791, 9596961, Emissions2017('China')],
    3 : ['Japan', 4872415104315, 127502725, 377975, Emissions2017('Japan')],
    4 : ['Russia', 1578417211937, 145530082, 17098246, Emissions2017('Russia')],
    5 : ['India',2650725335364, 1338676785, 3287263, Emissions2017('India')],
    6 : ['Germany', 3693204332230, 82658409, 357114, Emissions2017('Germany')]

}

In [None]:
df_gdp_population = pd.DataFrame.from_dict(gdp_population2017, orient='index', columns = ['Entity', 'GDP nominal 2017', 'Population', 'Total Area (km2)', 'CO2 Emissions 2017'])

In [None]:
df_gdp_population

In [None]:
df_gdp_population['CO2 / GDP'] = df_gdp_population['CO2 Emissions 2017'] /df_gdp_population['GDP nominal 2017']
df_gdp_population['CO2 / Population'] = df_gdp_population['CO2 Emissions 2017'] /df_gdp_population['Population']
df_gdp_population['CO2 / Area'] = df_gdp_population['CO2 Emissions 2017'] /df_gdp_population['Total Area (km2)']

In [None]:
sns.barplot(x='Entity', y='CO2 Emissions 2017', data=df_gdp_population, palette='autumn', order=df_gdp_population.sort_values('CO2 Emissions 2017', ascending=False)['Entity'])
plt.title('CO2 emissions 2017')
plt.tight_layout()

Let's start crossing these values with Total GDP, Population and Total Area

In [None]:
sns.barplot(x='Entity', y='CO2 / GDP', data=df_gdp_population, palette='summer', order=df_gdp_population.sort_values('CO2 / GDP', ascending=False)['Entity'])
plt.title('CO2 / GDP 2017')
plt.tight_layout()

In [None]:
sns.barplot(x='Entity', y='CO2 / Population', data=df_gdp_population, palette='bone', order=df_gdp_population.sort_values('CO2 / Population', ascending=False)['Entity'])
plt.title('CO2 / Population 2017 ')
plt.tight_layout()

In [None]:
sns.barplot(x='Entity', y='CO2 / Area', data=df_gdp_population, palette='copper', order=df_gdp_population.sort_values('CO2 / Area', ascending=False)['Entity'], alpha=.7)
plt.title('CO2 / Area 2017 ')
plt.tight_layout()

No surprise so far.

It would be interesting, however, to compare CO2 emissions with these variables at once.
For this purpouse, let's try to come out with a formula that might help us.

G = Total GDP |
P = Population |
A = Total Area

formula: CO2 / (P+G+A)


In [None]:
G = df_gdp_population['GDP nominal 2017']
P = df_gdp_population['Population']
A = df_gdp_population['Total Area (km2)']
CO2= df_gdp_population['CO2 Emissions 2017']
df_gdp_population['Absolute value'] = CO2 / (G+P+A)

In [None]:
df_gdp_population

In [None]:
sns.barplot(x='Entity', y='Absolute value', data=df_gdp_population, palette='copper', order=df_gdp_population.sort_values('Absolute value', ascending=False)['Entity'])
plt.title('Absolute value')
plt.tight_layout()

As a matter of fact this formula cannot be taken as a true estimation, mainly because the relation between emissions and the 3 criteria analized is not so direct and univocal. I'll leave, therefore, this graph more as a suggestion, without drawing conclusions.

Finally we can say that the future of emissions control has a long way to go. As long as we can see a positive trend for western countries, the biggest challenge would come from the politics in the emerging and growing economies. When regulations and strict rules would be implemented by those countries, we might be able to see a true shift in the overall trend.
A new way of thinking economy, development and energy would be crucial for the years to come.