In [1]:
!pip install pandas

Collecting pandas
  Using cached pandas-2.2.3-cp312-cp312-macosx_11_0_arm64.whl.metadata (89 kB)
Collecting numpy>=1.26.0 (from pandas)
  Downloading numpy-2.2.3-cp312-cp312-macosx_14_0_arm64.whl.metadata (62 kB)
Collecting pytz>=2020.1 (from pandas)
  Using cached pytz-2025.1-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.7 (from pandas)
  Downloading tzdata-2025.1-py2.py3-none-any.whl.metadata (1.4 kB)
Using cached pandas-2.2.3-cp312-cp312-macosx_11_0_arm64.whl (11.4 MB)
Downloading numpy-2.2.3-cp312-cp312-macosx_14_0_arm64.whl (5.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.1/5.1 MB[0m [31m20.6 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hUsing cached pytz-2025.1-py2.py3-none-any.whl (507 kB)
Downloading tzdata-2025.1-py2.py3-none-any.whl (346 kB)
Installing collected packages: pytz, tzdata, numpy, pandas
Successfully installed numpy-2.2.3 pandas-2.2.3 pytz-2025.1 tzdata-2025.1


In [2]:
import pandas as pd

In [32]:
energy_consumption = pd.read_csv('DataSets/change-energy-consumption.csv')
renewable_energy = pd.read_csv('DataSets/modern-renewable-prod.csv')
per_capita_energy_use = pd.read_csv('DataSets/per-capita-energy-use.csv')
primary_energy = pd.read_csv('DataSets/primary-energy-cons.csv')

In [33]:
energy_consumption.head()

Unnamed: 0,Entity,Code,Year,Annual change in primary energy consumption (%)
0,Afghanistan,AFG,1981,12.663031
1,Afghanistan,AFG,1982,6.505477
2,Afghanistan,AFG,1983,22.33379
3,Afghanistan,AFG,1984,0.462401
4,Afghanistan,AFG,1985,-2.365375


Dataset modern_renewable_prod, in Africa has only data till the year 2022. In this way, we are going to eliminate from all datasets the data from the year 2023. 

In [24]:
energy_consumption = energy_consumption[energy_consumption['Year'] != '2023']
primary_energy = primary_energy[primary_energy['Year'] != '2023']
renewable_energy = renewable_energy[renewable_energy['Year'] != '2023']
per_capita_energy_use = per_capita_energy_use[per_capita_energy_use['Year'] != '2023']

This script performs an inner join on multiple energy-related datasets to integrate them into a single DataFrame.

In [35]:
energy_analysis1 = pd.merge(energy_consumption,primary_energy, on=['Entity','Code','Year'], how='inner')
energy_analysis2 = pd.merge(energy_analysis1,renewable_energy, on=['Entity','Code','Year'], how='inner')
energy_analysis = pd.merge(energy_analysis2, per_capita_energy_use, on=['Entity','Code','Year'], how='inner')

In [36]:
energy_analysis

Unnamed: 0,Entity,Code,Year,Annual change in primary energy consumption (%),Primary energy consumption (TWh),Electricity from wind - TWh,Electricity from hydro - TWh,Electricity from solar - TWh,Other renewables including bioenergy - TWh,Primary energy consumption per capita (kWh/person)
0,Afghanistan,AFG,2000,-12.373829,5.913606,0.0,0.31,0.00,0.00,302.59482
1,Afghanistan,AFG,2001,-21.129734,4.664077,0.0,0.50,0.00,0.00,236.89185
2,Afghanistan,AFG,2002,-5.058175,4.428160,0.0,0.56,0.00,0.00,210.86215
3,Afghanistan,AFG,2003,17.603290,5.207662,0.0,0.63,0.00,0.00,229.96822
4,Afghanistan,AFG,2004,-7.628947,4.810372,0.0,0.56,0.00,0.00,204.23125
...,...,...,...,...,...,...,...,...,...,...
7843,Zimbabwe,ZWE,2017,-2.984351,45.256546,0.0,3.97,0.01,0.15,3068.01150
7844,Zimbabwe,ZWE,2018,14.479410,51.809430,0.0,5.05,0.02,0.19,3441.98580
7845,Zimbabwe,ZWE,2019,-10.981565,46.119940,0.0,4.17,0.03,0.19,3003.65530
7846,Zimbabwe,ZWE,2020,-8.940124,41.996760,0.0,3.81,0.02,0.10,2680.13180


**Dropping rows with missing Code values**

Ensures all entries have a valid country or region code.

Rows without a Code correspond to continents rather than individual countries.

Removing these rows prevents aggregation errors and ensures consistency in country-level analysis.

The code 'OWID_WRL' represents a global entity that aggregates data from all countries. Since the analysis focuses on individual countries, excluding this row ensures that the dataset only includes country-specific data, preventing distortions in statistical calculations.

In [37]:
energy_analysis = energy_analysis.dropna(subset=['Code'])

In [39]:
energy_analysis = energy_analysis[energy_analysis['Code'] != 'OWID_WRL']

In [None]:
# Exporting our new DataSet into a csv type file

energy_analysis.to_csv('Energy_Analysis.csv', index=False)

In [None]:
# Checking for missing values to identify columns with incomplete data

energy_analysis.isna().sum()

Entity                                                  0
Code                                                    0
Year                                                    0
Annual change in primary energy consumption (%)         0
Primary energy consumption (TWh)                        0
Electricity from wind - TWh                           811
Electricity from hydro - TWh                           99
Electricity from solar - TWh                          843
Other renewables including bioenergy - TWh            629
Primary energy consumption per capita (kWh/person)      0
dtype: int64

The missing values correspond to renewable energy data for the years 1966 to 1989. This likely indicates that renewable energy sources were not widely used or reported during this period.
To ensure consistency in the dataset and avoid misinterpretation of missing values, we will replace these NA values with 0, assuming that renewable energy usage was negligible or nonexistent at that time.