<a href="https://colab.research.google.com/github/pravin-nawghare/Clean-Energy-Global-Stats/blob/main/Data_cleaning_file.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Global Data on Sustainable Energy (2000-2020)**

🌏⚡Explore 20-year Insights on Sustainable Energy⚡🌏

**Description**

Uncover this dataset showcasing sustainable energy indicators and other useful factors across all countries from 2000 to 2020. Dive into vital aspects such as electricity access, renewable energy, carbon emissions, energy intensity, Financial flows, and economic growth. Compare nations, track progress towards Sustainable Development Goal 7, and gain profound insights into global energy consumption patterns over time.

**Key Features**
- Entity: The name of the country or region for which the data is reported.
- Year: The year for which the data is reported, ranging from 2000 to 2020.
- Access to electricity (% of population): The percentage of population with access to electricity.
- Access to clean fuels for cooking (% of population): The percentage of the population with primary reliance on clean fuels.
- Renewable-electricity-generating-capacity-per-capita: Installed Renewable energy capacity per person
- Financial flows to developing countries (USD): Aid and assistance from developed countries for clean energy projects.
- Renewable energy share in total final energy consumption (%): Percentage of renewable energy in final energy consumption.
- Electricity from fossil fuels (TWh): Electricity generated from fossil fuels (coal, oil, gas) in terawatt-hours.
- Electricity from nuclear (TWh): Electricity generated from nuclear power in terawatt-hours.
- Electricity from renewables (TWh): Electricity generated from renewable sources (hydro, solar, wind, etc.) in terawatt-hours.
- Low-carbon electricity (% electricity): Percentage of electricity from low-carbon sources (nuclear and renewables).
- Primary energy consumption per capita (kWh/person): Energy consumption per person in kilowatt-hours.
- Energy intensity level of primary energy (MJ/$2011 PPP GDP): Energy use per unit of GDP at purchasing power parity.
- Value_co2_emissions (metric tons per capita): Carbon dioxide emissions per person in metric tons.
- Renewables (% equivalent primary energy): Equivalent primary energy that is derived from renewable sources.
- GDP growth (annual %): Annual GDP growth rate based on constant local currency.
- GDP per capita: Gross domestic product per person.
- Density (P/Km2): Population density in persons per square kilometer.
- Land Area (Km2): Total land area in square kilometers.
- Latitude: Latitude of the country's centroid in decimal degrees.
- Longitude: Longitude of the country's centroid in decimal degrees.

**Potential Use Cases**
- Energy Consumption Prediction: Predict future energy usage, aid planning, and track SDG 7 progress.
- Carbon Emission Forecasting: Forecast CO2 emissions, support climate strategies.
- Energy Access Classification: Categorize regions for infrastructure development, understand sustainable energy's role.
- Sustainable Development Goal Tracking: Monitor progress towards Goal 7, evaluate policy impact.
- Energy Equity Analysis: Analyze access, density, and growth for equitable distribution.
- Energy Efficiency Optimization: Identify intensive areas for environmental impact reduction.
- Renewable Energy Potential Assessment: Identify regions for green investments based on capacity.
- Renewable Energy Investment Strategies: Guide investors towards sustainable opportunities.

In [1]:
!kaggle datasets download -d anshtanwar/global-data-on-sustainable-energy

Dataset URL: https://www.kaggle.com/datasets/anshtanwar/global-data-on-sustainable-energy
License(s): Attribution 4.0 International (CC BY 4.0)
global-data-on-sustainable-energy.zip: Skipping, found more recently modified local copy (use --force to force download)


In [2]:
# Unpack zip file:

# importing required modules
from zipfile import ZipFile

# specifying the zip file name
file_name = "/content/global-data-on-sustainable-energy.zip"

# opening the zip file in READ mode
with ZipFile(file_name, 'r') as zip:
	# printing all the contents of the zip file
     #zip.printdir()

	# extracting all the files
     print('Extracting all the files now...')
     zip.extractall()
     print('Done!')



Extracting all the files now...
Done!


In [3]:
# import libraries
import pandas as pd
import numpy as np

In [4]:
# to display all columns
pd.set_option('display.max_columns', None)

In [5]:
path = "/content/global-data-on-sustainable-energy (1).csv"

In [6]:
# read the file
df = pd.read_csv(path)
df.head()

Unnamed: 0,Entity,Year,Access to electricity (% of population),Access to clean fuels for cooking,Renewable-electricity-generating-capacity-per-capita,Financial flows to developing countries (US $),Renewable energy share in the total final energy consumption (%),Electricity from fossil fuels (TWh),Electricity from nuclear (TWh),Electricity from renewables (TWh),Low-carbon electricity (% electricity),Primary energy consumption per capita (kWh/person),Energy intensity level of primary energy (MJ/$2017 PPP GDP),Value_co2_emissions_kt_by_country,Renewables (% equivalent primary energy),gdp_growth,gdp_per_capita,Density\n(P/Km2),Land Area(Km2),Latitude,Longitude
0,Afghanistan,2000,1.613591,6.2,9.22,20000.0,44.99,0.16,0.0,0.31,65.95744,302.59482,1.64,760.0,,,,60,652230.0,33.93911,67.709953
1,Afghanistan,2001,4.074574,7.2,8.86,130000.0,45.6,0.09,0.0,0.5,84.745766,236.89185,1.74,730.0,,,,60,652230.0,33.93911,67.709953
2,Afghanistan,2002,9.409158,8.2,8.47,3950000.0,37.83,0.13,0.0,0.56,81.159424,210.86215,1.4,1029.999971,,,179.426579,60,652230.0,33.93911,67.709953
3,Afghanistan,2003,14.738506,9.5,8.09,25970000.0,36.66,0.31,0.0,0.63,67.02128,229.96822,1.4,1220.000029,,8.832278,190.683814,60,652230.0,33.93911,67.709953
4,Afghanistan,2004,20.064968,10.9,7.75,,44.24,0.33,0.0,0.56,62.92135,204.23125,1.2,1029.999971,,1.414118,211.382074,60,652230.0,33.93911,67.709953


In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3649 entries, 0 to 3648
Data columns (total 21 columns):
 #   Column                                                            Non-Null Count  Dtype  
---  ------                                                            --------------  -----  
 0   Entity                                                            3649 non-null   object 
 1   Year                                                              3649 non-null   int64  
 2   Access to electricity (% of population)                           3639 non-null   float64
 3   Access to clean fuels for cooking                                 3480 non-null   float64
 4   Renewable-electricity-generating-capacity-per-capita              2718 non-null   float64
 5   Financial flows to developing countries (US $)                    1560 non-null   float64
 6   Renewable energy share in the total final energy consumption (%)  3455 non-null   float64
 7   Electricity from fossil fuels (TW

In [8]:
# check null values
df.isna().sum()

Unnamed: 0,0
Entity,0
Year,0
Access to electricity (% of population),10
Access to clean fuels for cooking,169
Renewable-electricity-generating-capacity-per-capita,931
Financial flows to developing countries (US $),2089
Renewable energy share in the total final energy consumption (%),194
Electricity from fossil fuels (TWh),21
Electricity from nuclear (TWh),126
Electricity from renewables (TWh),21


In [16]:
# shape before removing columns with lot of null values
df.shape

(3649, 21)

In [9]:
# Fill missing values with mean
columns_to_fill_mean = ['Access to clean fuels for cooking', 'Renewable energy share in the total final energy consumption (%)',
                        'Electricity from nuclear (TWh)', 'Energy intensity level of primary energy (MJ/$2017 PPP GDP)',
                        'Value_co2_emissions_kt_by_country', 'gdp_growth', 'gdp_per_capita']
df[columns_to_fill_mean] = df[columns_to_fill_mean].apply(lambda x: x.fillna(x.mean()))

In [10]:
# fill some columns with mean value
df.isna().sum()

Unnamed: 0,0
Entity,0
Year,0
Access to electricity (% of population),10
Access to clean fuels for cooking,0
Renewable-electricity-generating-capacity-per-capita,931
Financial flows to developing countries (US $),2089
Renewable energy share in the total final energy consumption (%),0
Electricity from fossil fuels (TWh),21
Electricity from nuclear (TWh),0
Electricity from renewables (TWh),21


In [12]:
# drop columns with lot of null values
df.drop(columns=['Financial flows to developing countries (US $)', 'Renewables (% equivalent primary energy)', 'Renewable-electricity-generating-capacity-per-capita']
         , inplace=True)
df.isna().sum()

Unnamed: 0,0
Entity,0
Year,0
Access to electricity (% of population),10
Access to clean fuels for cooking,0
Renewable energy share in the total final energy consumption (%),0
Electricity from fossil fuels (TWh),21
Electricity from nuclear (TWh),0
Electricity from renewables (TWh),21
Low-carbon electricity (% electricity),42
Primary energy consumption per capita (kWh/person),0


In [14]:
# drop remaining null values
df.dropna(inplace=True)
df.shape

(3597, 18)

In [16]:
df.to_csv('/content/global-data-on-sustainable-energy-.csv')

**Next Analysis is on PowerBI**