# Overview on worldwide carbon emissions and electricity sources

#### Analysis on worldwide CO2 emissions, the impact different electricity sources have on the carbon footprint of a country and how they can help smooth sudden energy prices movements.

### 1.0 Purpose of this analysis
This analysis can be divided into two parts.
In the first part, the goal is to assess the status of ___worldwide CO2 emissions___ as of 2023, as well as the impact of ___various electricity production sources___ on a country's carbon footprint and how these trends evolved over time. Particular focus has been dedicated to CO2 emissions from combustion processes (in mln tons) rather than countries' total emissions, as the former are more relevant to electricity production processes.    
In the last sections, the information from the first part is utilized to assess how the energy mix of a country can help smooth the impact of sudden movements in energy's market prices.

### 2.0 Overview of the data  
The dataset presented in this analysis has been sourced from the Energy Institute website, and includes information on carbon emissions and electricity production from various sources, mainly fossil fuels, nuclear energy, renewables and other sources.    
The data is a panel data, providing insight on countries from all over the globe and across various years. It has been however chosen to focus on the year 2023, the latest year available, to provide a quick overview of the most recent information on the matter.   
An overview of the most relevant attributes is provided below, after selecting the features to be included in the analysis.


In [2]:
%load_ext autoreload
%autoreload 2
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import requests
import os
import warnings
import seaborn as sns
import polars
from scipy.stats import pearsonr
from scipy.stats import ks_2samp
from sklearn.preprocessing import PowerTransformer
from utils import DownloadSave

warnings.filterwarnings('ignore')

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [3]:
# Download and save data
panelUrl = "https://www.energyinst.org/__data/assets/excel_doc/0007/1055752/merged_panel.xlsx"
panelFile = "C:/Users/Acer/Documents/carbon-footprint/Data/Panel Data.xlsx"

glossaryUrl = "https://www.energyinst.org/__data/assets/excel_doc/0020/1540505/Consolidated-Dataset-Narrow-Format-Glossary.xlsx"
glossaryFile = "C:/Users/Acer/Documents/carbon-footprint/Data/Glossary.xlsx"

panelDataDowloader = DownloadSave(panelUrl, panelFile)
panelData = panelDataDowloader.downloadSave()

glossaryDownloader = DownloadSave(glossaryUrl, glossaryFile)
glossaryData = glossaryDownloader.downloadSave()

panelData.head()
glossaryData.head()

Unnamed: 0,Code,Variable,Units
0,biodiesel_cons_kboed,biodiesel consumption,thousand barrels of oil equivalent per day
1,biodiesel_cons_pj,biodiesel consumption,Petajoules
2,biodiesel_prod_kboed,biodiesel production,thousand barrels of oil equivalent per day
3,biodiesel_prod_pj,biodiesel production,Petajoules
4,biofuels_cons_ej,biofuels consumption,Exajoules


### 3.0 Data Cleaning and Feature Engineering

#### 3.1 Data Cleaning

Most of the data cleaning process involves cleaning the 'Country' column from values that represented aggregate information for other countries and, thus, provided no additional information for our purposes. Therefore, these rows have been removed from the dataset.  
Finally, a low number of missing values has been observed regarding carbon emissions and primary energy consumption columns. All missing values, including those in electricity production features, are preserved here, as possibly indicators of a country's specific energy policy or transition towards greener sources. In other words, missing information on coal electricity production for a specific country does not necessarily imply a defect in the data, but can be interpreted as the country transitioning towards other energy sources.

In [7]:
# keep only relevant columns
selectedColumns = ['Country', 'Year', 'Region', 'OPEC', 'EU', 'OECD', 'CIS',
                   'co2_combust_mtco2', 'co2_combust_pc', 'co2_combust_per_ej', 'co2_mtco2', 'elect_twh']
electByFuel = panelData.filter(like = "electbyfuel")
primaryEnergyCons = panelData.filter(like = "primary_")
panelDataFiltered = pd.concat([panelData[selectedColumns], electByFuel, primaryEnergyCons], axis = 1)

glossaryData = glossaryData[glossaryData['Code'].isin(panelDataFiltered.columns.tolist())]
newRows = [
    {'Code':'Country', 'Variable':'Name of each country, for 107 total nations', 'Units':'-'},
    {'Code':'Region', 'Variable':'Region to which each country belongs to', 'Units':'-'},
    {'Code':'OPEC', 'Variable':'1 if is an OPEC country, 0 otherwise', 'Units':'-'},
    {'Code':'EU', 'Variable':'1 if is an EU country, 0 otherwise', 'Units':'-'},
    {'Code':'OECD', 'Variable':'1 if is an OECD country, 0 otherwise', 'Units':'-'},
    {'Code':'CIS', 'Variable':'1 if is a CIS* country, 0 otherwise', 'Units':'-'}
]
newRows = pd.DataFrame(newRows)
glossaryData = pd.concat([glossaryData, newRows], ignore_index = True)
# print the list of features used in the analysis
glossaryData.style.set_table_attributes('style="width:100%; display:block; overflow:auto;"').set_table_styles([
    {'selector': 'thead th', 'props': [('text-align', 'center')]},  # Center align headers
    {'selector': 'tbody td', 'props': [('text-align', 'center')]},   # Center align data
])

print(glossaryData)
print("*Note - CIS: Commonwealth of Independent States")

                     Code                                           Variable  \
0       co2_combust_mtco2                      CO2 emissions from combustion   
1          co2_combust_pc                      CO2 emissions from combustion   
2      co2_combust_per_ej                                   Carbon intensity   
3               co2_mtco2                               Total CO2 emissions    
4               elect_twh                                        Electricity   
5        electbyfuel_coal                   Electricity generation from coal   
6         electbyfuel_gas                    Electricity generation from gas   
7       electbyfuel_hydro                  Electricity generation from hydro   
8     electbyfuel_nuclear                Electricity generation from nuclear   
9         electbyfuel_oil                    Electricity generation from oil   
10      electbyfuel_other                  Electricity generation from other   
11  electbyfuel_ren_power             El

In [10]:
panelDataFiltered = panelDataFiltered[~panelDataFiltered['Country'].str.contains('^Total', na = False)]
panelDataFiltered = panelDataFiltered[~panelDataFiltered['Country'].str.contains('^Other', na = False)]
panelDataFiltered = panelDataFiltered[panelDataFiltered['Country'] != 'Rest of World']

# Extract data as of 2023
energyData = panelDataFiltered.copy()
energyData = energyData[energyData['Year'] == 2023]

# Skewness of data
numCols = energyData.select_dtypes(include = ['float64']).columns.tolist()
print(energyData[numCols].skew().sort_values(ascending = False))

co2_combust_mtco2        6.701821
elect_twh                6.590418
co2_mtco2                6.298515
primary_ej               6.136746
electbyfuel_coal         4.996785
electbyfuel_oil          4.698540
electbyfuel_gas          4.642247
electbyfuel_ren_power    4.267929
electbyfuel_total        4.193242
electbyfuel_hydro        3.943784
CIS                      3.914198
electbyfuel_other        3.275147
co2_combust_pc           3.251153
electbyfuel_nuclear      2.701736
primary_eintensity       2.690109
primary_ej_pc            2.354356
OPEC                     2.350201
EU                       1.215595
OECD                     0.657687
co2_combust_per_ej      -0.531039
dtype: float64
