<a href="https://colab.research.google.com/github/francescajiova/europe-emissions-report-2020-altair/blob/main/Europe's_Emissions_Report_2020.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Europe's Emissions Report 2020
## DTSA 5304: Fundamentals of Data Vizualization at CU Boulder

To preface this project, please note this is my first time using python for an assignment, and I have virtually no experience wrangling data in this language. I have included a plethora of sources to close the gaps left in class, as this course focused on visualization techniques in Altair rather than how to code in python.

This project's focus is to build a tool that would help environmental/climate experts gain insight to Europe's emissions in the year 2020 by region and country. As I am a beginner in this platform, I chose to use a relatively easy/straightforward dataset to work with -- credits to user jonesm198699 for the tidy data I uploaded here.

I held off on analyzing/making conclusione from the data myself, as this is not in scope of the course.

In [24]:
# Import pandas and data
import pandas as pd
import altair as alt

# Here I'm choosing to upload the data via URL to make my work reproducible
emissions = pd.read_csv("https://raw.githubusercontent.com/jonesm198699/emissions_by_country/refs/heads/main/data/emissions_data.csv")

There are 31 years worth of data per country, almost triple the cells Altair can handle. Since I'm only interested in one year's worth of data for my analysis, I will reduce the data to the year 2020

Source: https://www.geeksforgeeks.org/pandas/ways-to-filter-pandas-dataframe-by-column-values/

In [25]:
emissions = emissions[emissions['Year'] == 2020]

Now I'll add a column for each region of the world.

Source for adding column using .loc: https://www.geeksforgeeks.org/pandas/adding-new-column-to-existing-dataframe-in-pandas/#

In [26]:
# Define regions
regions = {
    'E Asia' : ['China', 'Taiwan', 'Japan', 'South Korea'],
    'C Asia' : ['Kazakhstan', 'Turkey'],
    'C and S America' : ['Brazil'],
    'N America' : ['Canada', 'USA'],
    'W Europe' : ['Austria', 'Belarus', 'Belgium', 'Denmark', 'Finland', 'France', 'Germany', 'Iceland', 'Ireland', 'Liechtenstein', 'Luxembourg',
                  'Netherlands', 'Norway', 'Portugal', 'Spain', 'Sweden', 'Switzerland', 'United Kingdom'],
    'E and S Europe' : ['Bulgaria', 'Croatia', 'Cyprus', 'Czech Republic', 'Estonia', 'Greece', 'Hungary', 'Italy', 'Latvia', 'Lithuania', 'Malta',
                        'Poland', 'Romania', 'Russia', 'Slovakia', 'Slovenia', 'Ukraine'],
    'Oceania' : ['Australia', 'New Zealand']
}

# Assign countries to their rightful regions
for region, countries in regions.items():
    emissions.loc[emissions['Country'].isin(countries), 'Region'] = region

Here, I noticed this dataset is incredibly Eurocentric. No Middle Eastern, African, or South Asian countries are even mentioned. The only country listed in all of Meso/South America is Brazil. About half, albeit much smaller, countries are omitted from Eastern Asia.

Because of this, I'm deciding to compare only Eastern/Southern Europe to Western Europe, because those lists are far more comprehensive than the rest. I believe it would be unfair to compare regions that are unrepresentative of such a large quantity of nations. Frankly, even Eastern/Southern Europe is selectively represented -- I don't see Moldova, Bosnia & Herzegovina, Albania, North Macedonia, Serbia, or Montenegro. I'm sure I'm missing some, but I'll proceed noting at least that much.

Source for selecting rows (and the the use of isin() from the last code chunk): https://pandas.pydata.org/docs/getting_started/intro_tutorials/03_subset_data.html

In [27]:
# Isolate only E and S Europe / W Europe for further analysis
eu_emissions = emissions[emissions['Region'].isin(['E and S Europe', 'W Europe'])]
eu_emissions

Unnamed: 0,Country,iso_code,Year,Total,Coal,Oil,Gas,Cement,Flaring,Other,Region
62,Austria,AUT,2020,62.037447,11.393343,30.712898,16.876937,1.821123,0.10925,1.123897,W Europe
94,Belarus,BLR,2020,58.592041,3.777468,17.900204,33.632754,2.302394,0.006232,0.972988,W Europe
126,Belgium,BEL,2020,90.368008,10.651276,39.498344,35.856444,2.634278,0.101449,1.626217,W Europe
190,Bulgaria,BGR,2020,36.967113,16.597579,12.141451,5.36171,1.065905,0.684223,1.116245,E and S Europe
286,Croatia,HRV,2020,16.870528,1.423842,8.273147,5.519125,1.212949,0.288184,0.153282,E and S Europe
318,Cyprus,CYP,2020,7.269578,0.068052,6.299155,0.00022,0.882317,0.0,0.019833,E and S Europe
350,Czech Republic,CZE,2020,91.853879,49.846815,21.129351,17.430786,1.891029,0.052561,1.503336,E and S Europe
382,Denmark,DNK,2020,28.282029,3.142834,18.591156,4.792077,1.226954,0.126262,0.402746,W Europe
414,Estonia,EST,2020,9.34301,5.335555,3.111814,0.809566,0.020121,2.8e-05,0.065924,E and S Europe
446,Finland,FIN,2020,37.595927,11.871713,20.139138,4.361261,0.569667,0.07634,0.577809,W Europe


Now, I'll start with the pie chart of total emissions by type in Europe, including tooltip interaction.

Source for creating new df: https://www.geeksforgeeks.org/pandas/python-pandas-dataframe/

Source for pie chart: https://altair-viz.github.io/altair-viz-v4/gallery/pie_chart.html#gallery-pie-chart

In [28]:
# First I need to create a new df with the totals of all the types of emissions

data = {'Emission Type': ['Coal', 'Oil', 'Gas', 'Cement', 'Flaring', 'Other'],
        'Total CO2 t': [eu_emissions['Coal'].sum(),
                        eu_emissions['Oil'].sum(),
                        eu_emissions['Gas'].sum(),
                        eu_emissions['Cement'].sum(),
                        eu_emissions['Flaring'].sum(),
                        eu_emissions['Other'].sum()]}

df_totals = pd.DataFrame(data)

#Create Europe's pie chart
europe_pie_chart = alt.Chart(df_totals).mark_arc().encode(
    theta=alt.Theta(field = 'Total CO2 t', type = 'quantitative'),
    color=alt.Color(field = 'Emission Type', type = 'nominal'),
    order=alt.Order('Total CO2 t', sort='descending'),
    tooltip=['Emission Type', 'Total CO2 t']
).properties(
    title="Europe's Total CO2 Emissions in Metric Tonnes (t) 2020"
)

europe_pie_chart

Next a bar graph comparing Eastern and Western Europe, including tooltip interaction. First I need to pivot the data frame, I suppose this is called "Melt" in python

Source for melting: https://www.geeksforgeeks.org/python/python-pandas-melt/#

Source for bar chart: https://altair-viz.github.io/altair-viz-v4/gallery/horizontal_stacked_bar_chart.html

In [29]:
# Melt the eu_emissions to long format
eu_emissions_long = pd.melt(eu_emissions, id_vars=['Country', 'iso_code', 'Year', 'Total', 'Region'],
                                      value_vars=['Coal', 'Oil', 'Gas', 'Cement', 'Flaring', 'Other'],
                                      var_name='Emission Type',
                                      value_name='CO2 t')
eu_emissions_long.head()

Unnamed: 0,Country,iso_code,Year,Total,Region,Emission Type,CO2 t
0,Austria,AUT,2020,62.037447,W Europe,Coal,11.393343
1,Belarus,BLR,2020,58.592041,W Europe,Coal,3.777468
2,Belgium,BEL,2020,90.368008,W Europe,Coal,10.651276
3,Bulgaria,BGR,2020,36.967113,E and S Europe,Coal,16.597579
4,Croatia,HRV,2020,16.870528,E and S Europe,Coal,1.423842


In [30]:
# Create regional bar chart
regional_bar_chart = alt.Chart(eu_emissions_long).mark_bar().encode(
    x='sum(CO2 t)',
    y=('Region'),
    color='Emission Type',
    order=alt.Order(field='Emission Type'),
    tooltip=['Region', 'Emission Type',
             alt.Tooltip('CO2 t', aggregate='sum', title='CO2 t'), alt.Tooltip('Total', aggregate='sum', title = 'Total CO2 Emissions')]
).properties(
    title="European Region CO2 Emissions by Metric Tonne (t) in 2020"
)

regional_bar_chart

Now I want to create a similar bar charts, breaking it down by countries. I'll have to make 2 bar charts for this, so I'll break up my dataset into 2.

Source for bar chart: https://altair-viz.github.io/altair-viz-v4/gallery/grouped_bar_chart_horizontal.html

In [31]:
e_s_europe = eu_emissions_long[eu_emissions_long['Region'] == 'E and S Europe']
w_europe = eu_emissions_long[eu_emissions_long['Region'] == 'W Europe']

e_s_europe.head()

Unnamed: 0,Country,iso_code,Year,Total,Region,Emission Type,CO2 t
3,Bulgaria,BGR,2020,36.967113,E and S Europe,Coal,16.597579
4,Croatia,HRV,2020,16.870528,E and S Europe,Coal,1.423842
5,Cyprus,CYP,2020,7.269578,E and S Europe,Coal,0.068052
6,Czech Republic,CZE,2020,91.853879,E and S Europe,Coal,49.846815
8,Estonia,EST,2020,9.34301,E and S Europe,Coal,5.335555


Now to create each region's respective interactive bar chart, including tooltip interaction.

Source for brushing and other dynamic queries: https://idl.uw.edu/visualization-curriculum/altair_interaction.html

In [32]:
# After running only the original bar chart, it looks like Russia really dominates the chart
# I want to play with brushing to essentially zoom in on the smaller bars

brush = alt.selection_interval(encodings=['y'])

# Original bar chart of S/E Europe's emissions
e_s_og = alt.Chart(e_s_europe).mark_bar().add_params( # Apparently add_selection has been renamed add_params
    brush
    ).encode(
    x='sum(CO2 t)',
    y='Country',
    color='Emission Type',
    tooltip=['Country', 'Emission Type', 'CO2 t',
             alt.Tooltip('Total', aggregate='sum', title = 'Total CO2 Emissions')],
).properties(
    title="Eastern and Southern European Country CO2 Emissions by Metric Tonne (t) in 2020")

# Zoomed-in (baby) bar chart based on selected area
e_s_bby = alt.Chart(e_s_europe).mark_bar().encode(
    x='sum(CO2 t)',
    y='Country',
    color='Emission Type',
    tooltip=['Country', 'Emission Type', 'CO2 t',
             alt.Tooltip('Total', aggregate='sum', title = 'Total CO2 Emissions')],
).transform_filter(brush).properties(title="Selected Countries Zoomed-In")

s_e_europe_bar_chart = alt.vconcat(e_s_og, e_s_bby).properties(spacing=5)

s_e_europe_bar_chart

Replicating the chart with Western Europe's dataset

In [33]:
# Implement brush
brush = alt.selection_interval(encodings=['y'])

# Original bar chart of S/E Europe's emissions
w_og = alt.Chart(w_europe).mark_bar().add_params( # Apparently add_selection has been renamed add_params
    brush
    ).encode(
    x='sum(CO2 t)',
    y='Country',
    color='Emission Type',
    tooltip=['Country', 'Emission Type', 'CO2 t',
             alt.Tooltip('Total', aggregate='sum', title = 'Total CO2 Emissions')],
).properties(
    title="Western European Country CO2 Emissions by Metric Tonne (t) in 2020")

# Zoomed-in (baby) bar chart based on selected area
w_bby = alt.Chart(w_europe).mark_bar().encode(
    x='sum(CO2 t)',
    y='Country',
    color='Emission Type',
    tooltip=['Country', 'Emission Type', 'CO2 t',
             alt.Tooltip('Total', aggregate='sum', title = 'Total CO2 Emissions')],
).transform_filter(brush).properties(title="Selected Countries Zoomed-In")

w_europe_bar_chart = alt.vconcat(w_og, w_bby).properties(spacing=5)

w_europe_bar_chart

Finally, I'll create a linked SPLOM of all of Europe's data split by type of emissions, color coded by region.

Source for slider filter: https://altair-viz.github.io/altair-viz-v4/gallery/multiple_interactions.html

In [34]:
# Implement brush & sliders
brush = alt.selection_interval()
emission_slider = alt.binding_range(min=0, max=eu_emissions['Total'].max(), step=10)
emission_selection = alt.param(bind=emission_slider, name="Emissions_Maximum", value=eu_emissions['Total'].max())
# selection_single has depreciated, use alt.param

# Build the SPLOM
europe_splom = alt.Chart(eu_emissions).mark_circle().encode(
    alt.X(alt.repeat("column"), type="quantitative"),
    alt.Y(alt.repeat("row"), type="quantitative"),
    color=alt.condition(brush, 'Region:N', alt.value('grey')),
    tooltip=["Country"]
).add_params(
    brush,
    emission_selection
).transform_filter('datum.Total <= Emissions_Maximum'
).properties(
    width=125,
    height=125
).repeat(
    row=['Coal', 'Oil', 'Gas', 'Cement', 'Flaring', 'Other'],
    column=['Coal', 'Oil', 'Gas', 'Cement', 'Flaring', 'Other']
)

europe_splom

Finally, I'll save all the visuals to HTML to preserve the interactive capabilities.

Source: https://github.com/altair-viz/altair_saver/issues/82

In [35]:
# First I need to install the altair_saver package
!pip install altair_saver vega fusion



In [36]:
from altair_saver import save

save(europe_pie_chart,'europe_pie_chart.html')
save(regional_bar_chart,'regional_bar_chart.html')
save(s_e_europe_bar_chart, 's_e_europe_bar_chart.html')
save(w_europe_bar_chart, 'w_europe_bar_chart.html')
save(europe_splom, 'europe_splom.html')