## Data Story
20-06-23
- Maurits van der Does Willebois
- Luc buijs
- Lucas Woudstra
- Finn Govers


## Table of contents ##

- ### [Introduction](#Introduction)
- ### [Dataset and preprocessing](#dataset-and-preprocessing) 
- ### [Link to Github Online Repository](#link-to-github-online-repository)
- ### [Load Data](#load-data)
- ### [Reflection](#reflection)
- ### [Work distribution](#work-distribution)
- ### [First Pespective](#first-perspective)
    - ##### [Argument 1](#argument-11)
    - ##### [Argument 2](#argument-12)
    - ##### [Argument 3](#argument-13)
- ### [Second Pespective](#second-perspective)
    - ##### [Argument 1](#argument-21)
    - ##### [Argument 2](#argument-22)
    - ##### [Argument 3](#argument-23)
- ### [Use of generative AI](#use-of-generative-ai)

<a id="Introduction"></a>

## Introduction

Climate change is one of the most pressing debates of todays society, (most) scientists have come to the conclusion that if we don't do something quickly, humanity is in serious danger from the consequences. We decided to look at what the root of this problem is, more specificly, which industrial sector is currently the biggest threat in the context of climate change. There are a few obvious contenders: many may think the biggest threat is fossil energy, or more generally the energy sector. We will try to see wether this is accurate or not by comparing the CO2 gas emissions of different
industries throughout the years. We have found two different perspectives: 

Our first perspective on the topic is that it should be our priority to focus on the switch from fossil energy to
renewable energy sources. Fossil energy has a massive impact on climate change due to it's large
amount of CO2 gas emission. Immediate action is required to look for greener energy
with lower amounts of CO2 gas emission. This would help slow down the climate change drastically.

Our second perspective is that our priority should not be to focus on the energy sector, but something
entirely different. For instance a while ago it came to light how big of an impact the meat
industry has on CO2 gas emissions across the world, because of how many acres of land
espescially cows need to be fed and kept. Examples like this lead us to believe that our priority should not be to focus on the energy sector in the endeavour to reduce CO2 gas emissions to fight the rapid climate change
happening right now.

<a id="dataset-and-preprocessing"></a>

## Dataset and Preprocessing

The dataset "Energy data 1990 - 2020.csv" contains information about different aspects of the energy production and consumption for different countries from 1990 to 2020, such as the share of wind/solar energy in the total energy production of a country in a certain year.

The dataset "historical_emissions.csv" contains information about the CO2 emissions of different industry sectors from different countries from 1990 to 2019, such as the CO2 emissions of the energy sector of a country in a certain year.

We did not preprocess the dataset "Energy data 1990 - 2020.csv" in any way.

We did preprocess the dataset "historical_emissions.csv" and saved the result as "historical_emissions (cleanest).csv". The preprocessing of this dataset went as follows:

We removed all columns that only had one unique value. We also merged all rows that contained data about the same year but about different countries by summing up the CO2 emissions, so that we now had data about the global CO2 emissions instead of emissions of individual countries. Furthermore, we reorganised the data in a way that resembles a transposition: The original data had a unique column for every year, and had one column for all sectors with the value of that column for any row being the name of the sector about which that row contained data. But after our transformation, the dataset had one unique column for every sector, and had one column for all years with the value of that column for any row being the year about which that row contained data. Lastly, We merged the sectors "Energy" and "Electricity/Heat" by summing up their CO2 emission, because those sectors both belong to the energy production sector, which is the sector that we are interested in.

We did all this preprocessing with the code in the file "clean.ipynb", which can be found in the table of contents to the left of this page or our GitHub online repository.

<a id="link-to-github-online-repository"></a>

## Link to GitHub Online Repository
https://github.com/finn-uva/IV

<a id="load-data"></a>

## Load Data

In [12]:

import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
import matplotlib.pyplot as plt
import numpy as np

df = pd.read_csv('historical_emissions (cleanest).csv')
df1 = pd.read_csv('Energy data 1990 - 2020.csv')

<a id="reflection"></a>

## Reflection

<a id="work-distribution"></a>

## Work Distribution

### Maurits ###


### Luc ###
Luc worked together with Maurits on all the three visualisations and arguments of the first perspective. He also wrote the reflection of the feedback on the draft version together with Maurits.

### Finn ###


### Lucas ###
Lucas has written most of the introduction, he has helped with finding the correct datasets and has worked out argument 1 of the second perspective, of which both the visualisation and the argument itself. After this he made sure the table of contents is functional.

<a id="first-perspective"></a>

## First Perspective

In [13]:
# Calculate the average MtCO₂e for all sectors per year
df['Average MtCO₂e'] = df.mean(axis=1)

# Create the plot using Plotly
import plotly.express as px
from IPython.display import display

fig = px.line(df, x='Year', y='Average MtCO₂e', title='Average MtCO₂e for All Sectors per Year')

# Add a caption annotation
caption_text = "Fig.1. This graph shows the average MtCO₂e for all countries per year"
fig.add_annotation(
    x=0.5,  # x-coordinate for the annotation (0.5 = center of x-axis)
    y=-0.24,  # y-coordinate for the annotation (negative value to position it below the graph)
    xref="paper",
    yref="paper",
    text=caption_text,
    showarrow=False,
    font=dict(
        size=13,
        color="black"))

#fig.write_image("figure1.png")
fig.show()

<a id="argument-11"></a>

### Argument 1
Figure 1 shows an increasing trend of CO2 production per year of all the countries combined. The most shocking aspect of this graph is the fact that the increasing trend stays the same. In other words, the total emission of CO2 keeps rising at the same rate, with no sign of slowing down.This shows the urgency of developing a more efficient, more renewable and greener form of energy source to reduce the total emissions. 


In [14]:
sectors = df.columns[3:]  # Assuming the sector columns start from the fourth column

fig = go.Figure()

for sector in sectors:
    average_values = df[sector].mean(axis=0)
    fig.add_trace(go.Scatter(x=df['Year'], y=df[sector], mode='markers+lines', name=sector))

fig.update_layout(title='Average MtCO₂e per Year for each Sector',
                  xaxis_title='Year',
                  yaxis_title='MtCO₂e')

# Add a caption annotation
caption_text = "Fig.2. This graph shows the average MtCO₂e for each Sector per year"
fig.add_annotation(
    x=0.5,  # x-coordinate for the annotation (0.5 = center of x-axis)
    y=-0.24,  # y-coordinate for the annotation (negative value to position it below the graph)
    xref="paper",
    yref="paper",
    text=caption_text,
    showarrow=False,
    font=dict(
        size=13,
        color="black"
    )
)


#fig.write_image("figure2.png")
fig.show()

<a id="argument-12"></a>

### Argument 2
Figure 2 shows the average emission of each sector per year, with every sector given a different color. The energy sector shows to have the highest average emissions of CO2 per year, followed by the electricity/heat sector. Only using this graph already gives the implication that the energy sector is the biggest cause of CO2 emissions. (Furthermore, if the energy sector is to be compared with the electricity/heat sector, there could be a correlation between the two sectors based on the growth trend of the two sectors.)


In [15]:
sectors = df.columns[3:]  # Assuming the sector columns start from the fourth column
average_values_total = df[sectors].mean().values  # Use .values to convert to a numpy array

fig = go.Figure(data=[go.Pie(labels=sectors, values=average_values_total, hole=0.5)])

fig.update_layout(title='Average MtCO₂e of Each Sector')

# Add a caption annotation
caption_text = "Fig.3. This pie chart shows the percentage of MtCO₂e for each Sector from 1990 till 2019"
fig.add_annotation(
    x=0.5,  # x-coordinate for the annotation (0.5 = center of x-axis)
    y=-0.24,  # y-coordinate for the annotation (negative value to position it below the graph)
    xref="paper",
    yref="paper",
    text=caption_text,
    showarrow=False,
    font=dict(
        size=13,
        color="black"
    )
)


#fig.write_image("figure3.png")
fig.show()

<a id="argument-13"></a>

### Argument 3
Figure 3 displays a donut chart where all the sectors are shown with their contribution of CO2 emission. Again, energy is the biggest polluter, being responsible for almost a half of the total CO2 emissions from all the sectors combined. Based on this chart, if the world wants to emit less CO2, everyone should start with searching for greener options in the energy sector.



<a id="second-perspective"></a>

## Second Perspective

For our second perspective, we have explored the data and found evidence that contradicts our first perspective. We have visualized this evidence below and constructed three arguments that argue against our first perspective, supported by the evidence that we found and visualized below.

In [16]:

total_electricity = df1['Electricity production (TWh)']
renewable_electricity = df1['Share of renewables in electricity production (%)']

df1['Productiojn that is not green (%)'] = 100 - renewable_electricity
average_df = df1.groupby('Year').mean().reset_index()

fig = px.area(average_df, x='Year', y=['Share of renewables in electricity production (%)', 'Productiojn that is not green (%)'],
              labels={'value': '%', 'variable': 'Energy Type'},
              title='Percentages of non-renewable vs renewable energy being produced')


fig.update_layout(legend_title_text='Energy Type', width=800)


# Add a caption annotation
caption_text = "This figure displays the fact that the percentage of renewable energy being produces is already growing"
fig.add_annotation(
    x=0,  # x-coordinate for the annotation (0.5 = center of x-axis)
    y=-0.24,  # y-coordinate for the annotation (negative value to position it below the graph)
    xref="paper",
    yref="paper",
    text=caption_text,
    showarrow=False,
    font=dict(
        size=13,
        color="black"
    )
)
fig.show()


In [17]:
X = np.array(range(1990, 2021))

Y_prod = np.array(
    [df1[df1['Year'] == x]['Total energy production (Mtoe)'].sum()
    for x in X]
)
Y_cons = np.array(
    [df1[df1['Year'] == x]['Total energy consumption (Mtoe)'].sum()
    for x in X]
)

prod = Y_prod / Y_prod * 100
cons = Y_cons / Y_prod * 100
diff = (Y_prod - Y_cons) / Y_cons * 100
    
fig = go.Figure()
fig.add_trace(go.Scatter(
    name='Energy Need',
    x=X, y=cons,
    hoverinfo='x+y',
    mode='lines',
    line=dict(width=0, color='green'),
    stackgroup='one' # define stack group
))
fig.add_trace(go.Scatter(
    name='Energy Surplus',
    x=X, y=diff,
    hoverinfo='x+y',
    mode='lines',
    line=dict(width=0, color='red'),
    stackgroup='one'
))

fig.update_xaxes(title_text="Year")
fig.update_yaxes(title_text="Percentage of Produced Energy")
fig.update_layout(yaxis_range=(0, 100), height=500, width=800,
                  title='Share of Needed and Unutilized Energy of Total Energy Production')

caption_text = 'Fig.5. This area plot shows that we need almost all of the energy that we produce.'
fig.add_annotation(
    x=0.5,  # x-coordinate for the annotation (0.5 = center of x-axis)
    y=-0.24,  # y-coordinate for the annotation (negative value to position it below the graph)
    xref="paper",
    yref="paper",
    text=caption_text,
    showarrow=False,
    font=dict(
        size=13,
        color="black"
    )
)

#fig.write_image('figure5.png')
fig.show()

In [18]:
last_years = 10

x = np.array(df['Year'])[-last_years:]
sectors = list(df.columns)

# Make dict
energy_rest_dict = {
    'Year': x,
    'Energy': np.array(df['Energy'])[-last_years:],
    'Rest': np.zeros(len(x[-last_years:]))
}

# Fill 'Rest' category
for sector in sectors:
    if sector != 'Year' and sector != 'Energy':
        energy_rest_dict['Rest'] += np.array(df[sector])[-last_years:]
        
# Find derivatives
for key in energy_rest_dict:
    if key != 'Year':
        y = np.array(energy_rest_dict[key])
        
        # Normalize
        y /= max(y)

        dy = np.zeros(y.shape)
        dy[0:-1] = np.diff(y)/np.diff(x)
        dy[-1] = (y[-1] - y[-2])/(x[-1] - x[-2])
        
        energy_rest_dict[key] = dy
        
frame = []

for i in range(len(x)):
    temp_lst = []
    
    for key in energy_rest_dict:
        temp_lst.append(energy_rest_dict[key][i])
    
    frame.append(temp_lst)
    
df_vis_4 = pd.DataFrame(
             frame,
             columns = ['Year', 'Energy', 'Average of All Other Sectors'])

frame = []

for i in range(10):
    frame.append(np.array(df_vis_4.loc[i]))
    
    if i == 9:
        break
    
    for j in range(1, 100):
        lower_fraction = (100 - j) / 100 * np.array(df_vis_4.loc[i])
        upper_fraction = j / 100 * np.array(df_vis_4.loc[i+1])
        
        frame.append(lower_fraction + upper_fraction)
        
df_vis_5 = pd.DataFrame(
             frame,
             columns = ['Year', 'Energy', 'Average of All Other Sectors'])

df_5 = df_vis_5.copy()

df_vis_5['label'] = np.where(df_vis_5['Energy'] > df_vis_5['Average of All Other Sectors'], 1, 0)

df_vis_5['group'] = df_vis_5['label'].ne(df_vis_5['label'].shift()).cumsum()
df_vis_5 = df_vis_5.groupby('group')
dfs = []
for name, data in df_vis_5:
    dfs.append(data)
    
# custom function to set fill color
def fillcol(label):
    if label >= 1:
        return 'rgba(250,0,0,0.23)'
    else:
        return 'rgba(0,250,0,0.23)'

fig = go.Figure()

for df in dfs:
    fig.add_traces(go.Scatter(x=df['Year'], y = df['Energy'], showlegend=False,
                              line = dict(color='rgba(0,0,0,0)')))
    
    fig.add_traces(go.Scatter(x=df['Year'], y = df['Average of All Other Sectors'], showlegend=False,
                              line = dict(color='rgba(0,0,0,0)'),
                              fill='tonexty', 
                              fillcolor = fillcol(df['label'].iloc[0])))

fig.add_traces(go.Scatter(x=df_5['Year'], y = df_5['Average of All Other Sectors'],
                          line = dict(color='rgba(75,75,75,1)', width=2),
                          name='Average of All Other Sectors'))
    
fig.add_traces(go.Scatter(x=df_5['Year'], y = df_5['Energy'],
                          line = dict(color='blue', width=2),
                          name='Energy Sector'))

fig.update_xaxes(title_text="Year")
fig.update_yaxes(title_text="Growth of CO2 Emissions")
fig.update_layout(height=500, width=800, showlegend=True,
                  title='Growth of CO2 Emissions for Different Sectors in the Last 10 Years')

caption_text = "Fig.6. This line chart shows that the energy sector is \
already slowing it's CO2 emissions faster than other sectors."
fig.add_annotation(
    x=0.5,  # x-coordinate for the annotation (0.5 = center of x-axis)
    y=-0.2,  # y-coordinate for the annotation (negative value to position it below the graph)
    xref="paper",
    yref="paper",
    text=caption_text,
    showarrow=False,
    font=dict(
        size=13,
        color="black"
    )
)

#fig.write_image('figure6.png')
fig.show()

<a id="argument-21"></a>

### Argument 1
Figure 4 displays an area chart that gives us a good look at how the energy sector is already improng on using renewable energy sources, in the last 10 years almost 10% of energy produced has gotten renewable. This may not seem like a lot, but when you see that the trend is also rising, meaning that the growth is getting larger and larger each year, one can be positive-minded about the future of humanity. This shows that the energy sector is already working hard at finding greener sources, so to really make all the difference we might need to look elsewhere in sectors where this kind of development is not yet in full motion.

<a id="argument-22"></a>

### Argument 2
There is a reason that the energy sector is so big: we need energy for a lot of things and we can't so easily use less energy, which means that we should be careful about setting limitations for how much energy we can produce. The only CO2 emissions from energy that we should try to decrease are the emissions that result from unneccessary energy production. However, there is only a small amount of energy that we produce but don't use, as seen in figure 5. This means that there is only a small amount of unneccessary CO2 emissions that is caused by the energy sector: much less than one would initially expect.

<a id="argument-23"></a>

### Argument 3
The energy sector may be responsible for most CO2 emissions because of the sheer size of the sector, but the energy sector is already making big steps in reducing it's CO2 emissions. In fact, the energy sector is reducing it's CO2 emissions faster than most other sectors, as seen in figure 6. In the last ten years, it has been rare for the energy sector to have faster growing CO2 emissions than the average of all other sectors. The energy sector even shows a downward trend in it's CO2 emission growth, whereas the average of all other sectors shows an upward trend. It seems that the energy sector is already making great progress in it's CO2 emissions, and that we should in fact focus on the other sectors to stimulate a reduction in their CO2 emission growth.

<a id="use-of-generative-ai"></a>

## Use of Generative AI
We used ChatGPT for the following things:

Generate code that produced figure 4 when given a self-made figure made with matplotlib.pyplot.

Generate code that produced figure 3 when given the relevant data from our datasets.

Solve an issue with Github that prevented us from displaying figures.