## Dataset and Preprocessing
The dataset "Energy data 1990 - 2020.csv" contains information about different aspects of the energy production and consumption for different countries from 1990 to 2020, such as the share of wind/solar energy in the total energy production of a country in a certain year.

The dataset "historical_emissions.csv" contains information about the CO2 emissions of different industry sectors from different countries from 1990 to 2019, such as the CO2 emissions of the energy sector of a country in a certain year.

We did not preprocess the dataset "Energy data 1990 - 2020.csv" in any way.

We did preprocess the dataset "historical_emissions.csv" and saved the result as "historical_emissions (cleanest).csv". The preprocessing of this dataset went as follows:

We removed all columns that only had one unique value. We also merged all rows that contained data about the same year but about different countries by summing up the CO2 emissions, so that we now had data about the global CO2 emissions instead of emissions of individual countries. Furthermore, we reorganised the data in a way that resembles a transposition: The original data had a unique column for every year, and had one column for all sectors with the value of that column for any row being the name of the sector about which that row contained data. But after our transformation, the dataset had one unique column for every sector, and had one column for all years with the value of that column for any row being the year about which that row contained data. Lastly, We merged the sectors "Energy" and "Electricity/Heat" by summing up their CO2 emission, because those sectors both belong to the energy production sector, which is the sector that we are interested in.

We did all this preprocessing with the code below:

In [1]:
import pandas as pd

df2 = pd.read_csv('historical_emissions.csv')

X = range(1990, 2020)
sectors = list(df2['Sector'].unique())

sector_dict = {}

for sector in sectors:
    if sector != 'Electricity/Heat':
        sector_dict[sector] = []

        for i in X:
            if sector != 'Energy':
                total_sector_emission = df2[df2['Sector'] == sector][f"{i}"].sum()
                sector_dict[sector].append(total_sector_emission)
            else:
                total_energy_emission = df2[df2['Sector'] == sector][f"{i}"].sum()
                total_elec_heat_emission = df2[df2['Sector'] == 'Electricity/Heat'][f"{i}"].sum()

                total_sector_emission = total_energy_emission + total_elec_heat_emission
                sector_dict[sector].append(total_sector_emission)

frame = []

for i in range(len(X)):
    temp_lst = [X[i]]
    
    for key in sector_dict:
        temp_lst.append(sector_dict[key][i])
    
    frame.append(temp_lst)
    
sectors.remove('Electricity/Heat')
    
df_vis_2 = pd.DataFrame(
             frame,
             columns = ['Year'] + sectors)
             
df_vis_2.to_csv('historical_emissions (cleanest).csv')