# GLOBAL POWER PLANT ANALYSIS

##  Introduction


In a quest to unravel patterns, trends, and insights, this project aims to provide a comprehensive understanding of power plant ownership, fuel types, capacity, generation levels, and the increasing role of renewable energy sources. Through meticulous analysis and predictive modeling, we aim to shed light on the past, present, and future of global power plants, contributing valuable insights for stakeholders and shaping the narrative of sustainable energy.

# Project overview

 ## a) BUSINESS UNDERSTANDING

In an era dominated by energy transitions and sustainability concerns, the Global Plant Analysis project aims to unravel the complexities of the worldwide power plant landscape. By investigating ownership patterns, fuel types, and renewable energy contributions, the project seeks to provide a clear snapshot of the global power ecosystem. The insights derived will empower stakeholders, policymakers, and investors with a deeper understanding of the dynamics shaping the energy industry, fostering informed decision-making in an increasingly dynamic and critical sector.

This Global Power Plant Analysis is set to benefit the following stakeholders;

- Power generation companies-Strategic planning
- Environmental Agencies-Providing environmental impact assessment
- Researchers and Academia-Contributing to advancements in energy studies
- Local communities-Informed community engagement
- Technology providers-Market identification and growth opportunities
- Government and Regulatory Bodies-Informed decision making
- Investors and financial institutions-Risk mitigation and informed investments


### Problem statement

The lack of a comprehensive analysis of the Global Power Plant Database hampers stakeholders' ability to make informed decisions in the evolving energy landscape. There is a need for a systematic exploration of power plant ownership, fuel types, capacity trends, and renewable energy contributions globally. This project addresses the absence of actionable insights, hindering effective decision-making for governments, investors, and energy companies, and aims to provide a clear understanding of the global energy scenario through data-driven analysis

### Objectives

General Objective

- To develop a predictive model leveraging the analysis of the Global Power Plant Database to accurately forecast the power generation of power plants. 

Specific Odjectives

- To investigate the distribution of power plants to identify gaps in the market.

## b) DATA UNDERSTANDING 

The Global Power Plant Database is a comprehensive, open source database of power plants around the world. It centralizes power plant data to make it easier to navigate, compare and draw insights for one’s own analysis. The database covers approximately 35,000 power plants from 167 countries and includes thermal plants (e.g. coal, gas, oil, nuclear, biomass, waste, geothermal) and renewables (e.g. hydro, wind, solar). Each power plant is geolocated and entries contain information on plant capacity, generation, ownership, and fuel type.

The dataset contains the following columns with their descriptions:

- `country` (text): 3 character country code corresponding to the ISO 3166-1 alpha-3 specification [4]
- `country_long` (text): longer form of the country designation
- `name` (text): name or title of the power plant, generally in Romanized form
- `gppd_idnr` (text): 10 or 12 character identifier for the power plant
- `capacity_mw` (number): electrical generating capacity in megawatts
- `latitude` (number): geolocation in decimal degrees; WGS84 (EPSG:4326)
- `longitude` (number): geolocation in decimal degrees; WGS84 (EPSG:4326)
- `fuel (text): energy source used in electricity generation or export
- `commissioning_year` (number): year of plant operation, weighted by unit-capacity when data is available
- `owner` (text): majority shareholder of the power plant, generally in Romanized form
- `source` (text): entity reporting the data; could be an organization, report, or document, generally in Romanized form
- `url` (text): web document corresponding to the `source` field
- `geolocation_source` (text): attribution for geolocation information
- `year_of_capacity_data` (number): year the capacity information was reported
- `generation_gwh_ (number): electricity generation in gigawatt-hours reported 
- `estimated_generation_gwh` (number): estimated annual electricity generation in gigawatt-hours

The database is available for immediate download and use through the http://datasets.wri.org/dataset/globalpowerplantdatabase

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
df = pd.read_csv('global_power_plant_database.csv')
df

  df = pd.read_csv('global_power_plant_database.csv')


Unnamed: 0,country,country_long,name,gppd_idnr,capacity_mw,latitude,longitude,primary_fuel,other_fuel1,other_fuel2,...,estimated_generation_gwh_2013,estimated_generation_gwh_2014,estimated_generation_gwh_2015,estimated_generation_gwh_2016,estimated_generation_gwh_2017,estimated_generation_note_2013,estimated_generation_note_2014,estimated_generation_note_2015,estimated_generation_note_2016,estimated_generation_note_2017
0,AFG,Afghanistan,Kajaki Hydroelectric Power Plant Afghanistan,GEODB0040538,33.0,32.3220,65.1190,Hydro,,,...,123.77,162.90,97.39,137.76,119.50,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1
1,AFG,Afghanistan,Kandahar DOG,WKS0070144,10.0,31.6700,65.7950,Solar,,,...,18.43,17.48,18.25,17.70,18.29,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE
2,AFG,Afghanistan,Kandahar JOL,WKS0071196,10.0,31.6230,65.7920,Solar,,,...,18.64,17.58,19.10,17.62,18.72,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE
3,AFG,Afghanistan,Mahipar Hydroelectric Power Plant Afghanistan,GEODB0040541,66.0,34.5560,69.4787,Hydro,,,...,225.06,203.55,146.90,230.18,174.91,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1
4,AFG,Afghanistan,Naghlu Dam Hydroelectric Power Plant Afghanistan,GEODB0040534,100.0,34.6410,69.7170,Hydro,,,...,406.16,357.22,270.99,395.38,350.80,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
34931,ZMB,Zambia,Ndola,WRI1022386,50.0,-12.9667,28.6333,Oil,,,...,,,,,183.79,NO-ESTIMATION,NO-ESTIMATION,NO-ESTIMATION,NO-ESTIMATION,CAPACITY-FACTOR-V1
34932,ZMB,Zambia,Nkana,WRI1022384,20.0,-12.8167,28.2000,Oil,,,...,,,,,73.51,NO-ESTIMATION,NO-ESTIMATION,NO-ESTIMATION,NO-ESTIMATION,CAPACITY-FACTOR-V1
34933,ZMB,Zambia,Victoria Falls,WRI1022380,108.0,-17.9167,25.8500,Hydro,,,...,575.78,575.78,548.94,579.90,578.32,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1
34934,ZWE,Zimbabwe,Hwange Coal Power Plant Zimbabwe,GEODB0040404,920.0,-18.3835,26.4700,Coal,,,...,,,,,2785.10,NO-ESTIMATION,NO-ESTIMATION,NO-ESTIMATION,NO-ESTIMATION,CAPACITY-FACTOR-V1


In [3]:
def check_dataframe_basics(df):
    # Step 1: Check the shape of the DataFrame
    print("DataFrame Shape:")
    print(df.shape)
    print("\n")

    # Step 2: Check the first few rows of the DataFrame
    print("First few rows:")
    print(df.head())
    print("\n")

    # Step 3: Check the data types of the columns
    print("Data Types of Columns:")
    print(df.dtypes)
    print("\n")

    # Step 4: Check for missing values
    print("Missing Values in Each Column:")
    print(df.isnull().sum())
    print("\n")

    # Step 5: Get a concise summary of the DataFrame
    print("Concise Summary of DataFrame:")
    print(df.info())
    print("\n")

    # Step 6: Get basic statistical details
    print("Basic Statistical Details:")
    print(df.describe(include='all'))  # Describe all columns including non-numeric
    print("\n")

    # Step 7: Check for duplicates
    print("Number of Duplicate Rows:")
    print(df.duplicated().sum())
    print("\n")

In [4]:
check_dataframe_basics(df)

DataFrame Shape:
(34936, 36)


First few rows:
  country country_long                                              name  \
0     AFG  Afghanistan      Kajaki Hydroelectric Power Plant Afghanistan   
1     AFG  Afghanistan                                      Kandahar DOG   
2     AFG  Afghanistan                                      Kandahar JOL   
3     AFG  Afghanistan     Mahipar Hydroelectric Power Plant Afghanistan   
4     AFG  Afghanistan  Naghlu Dam Hydroelectric Power Plant Afghanistan   

      gppd_idnr  capacity_mw  latitude  longitude primary_fuel other_fuel1  \
0  GEODB0040538         33.0    32.322    65.1190        Hydro         NaN   
1    WKS0070144         10.0    31.670    65.7950        Solar         NaN   
2    WKS0071196         10.0    31.623    65.7920        Solar         NaN   
3  GEODB0040541         66.0    34.556    69.4787        Hydro         NaN   
4  GEODB0040534        100.0    34.641    69.7170        Hydro         NaN   

  other_fuel2  ... estimate

       country              country_long           name     gppd_idnr  \
count    34936                     34936          34936         34936   
unique     167                       167          34528         34936   
top        USA  United States of America  Santo Antônio  GEODB0040538   
freq      9833                      9833              6             1   
mean       NaN                       NaN            NaN           NaN   
std        NaN                       NaN            NaN           NaN   
min        NaN                       NaN            NaN           NaN   
25%        NaN                       NaN            NaN           NaN   
50%        NaN                       NaN            NaN           NaN   
75%        NaN                       NaN            NaN           NaN   
max        NaN                       NaN            NaN           NaN   

         capacity_mw      latitude     longitude primary_fuel other_fuel1  \
count   34936.000000  34936.000000  34936.0000

In [5]:
# df['generation_gwh_2019'] = df['generation_gwh_2019'].interpolate()
# df['generation_gwh_2019']

In [6]:
df.isna().sum()

country                               0
country_long                          0
name                                  0
gppd_idnr                             0
capacity_mw                           0
latitude                              0
longitude                             0
primary_fuel                          0
other_fuel1                       32992
other_fuel2                       34660
other_fuel3                       34844
commissioning_year                17489
owner                             14068
source                               15
url                                  18
geolocation_source                  419
wepp_id                           18702
year_of_capacity_data             20049
generation_gwh_2013               28519
generation_gwh_2014               27710
generation_gwh_2015               26733
generation_gwh_2016               25792
generation_gwh_2017               25436
generation_gwh_2018               25299
generation_gwh_2019               25277


In [7]:
df = df.interpolate()
df

Unnamed: 0,country,country_long,name,gppd_idnr,capacity_mw,latitude,longitude,primary_fuel,other_fuel1,other_fuel2,...,estimated_generation_gwh_2013,estimated_generation_gwh_2014,estimated_generation_gwh_2015,estimated_generation_gwh_2016,estimated_generation_gwh_2017,estimated_generation_note_2013,estimated_generation_note_2014,estimated_generation_note_2015,estimated_generation_note_2016,estimated_generation_note_2017
0,AFG,Afghanistan,Kajaki Hydroelectric Power Plant Afghanistan,GEODB0040538,33.0,32.3220,65.1190,Hydro,,,...,123.770,162.900,97.390000,137.760,119.50,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1
1,AFG,Afghanistan,Kandahar DOG,WKS0070144,10.0,31.6700,65.7950,Solar,,,...,18.430,17.480,18.250000,17.700,18.29,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE
2,AFG,Afghanistan,Kandahar JOL,WKS0071196,10.0,31.6230,65.7920,Solar,,,...,18.640,17.580,19.100000,17.620,18.72,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE
3,AFG,Afghanistan,Mahipar Hydroelectric Power Plant Afghanistan,GEODB0040541,66.0,34.5560,69.4787,Hydro,,,...,225.060,203.550,146.900000,230.180,174.91,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1
4,AFG,Afghanistan,Naghlu Dam Hydroelectric Power Plant Afghanistan,GEODB0040534,100.0,34.6410,69.7170,Hydro,,,...,406.160,357.220,270.990000,395.380,350.80,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
34931,ZMB,Zambia,Ndola,WRI1022386,50.0,-12.9667,28.6333,Oil,,,...,392.680,392.680,374.786667,395.450,183.79,NO-ESTIMATION,NO-ESTIMATION,NO-ESTIMATION,NO-ESTIMATION,CAPACITY-FACTOR-V1
34932,ZMB,Zambia,Nkana,WRI1022384,20.0,-12.8167,28.2000,Oil,,,...,484.230,484.230,461.863333,487.675,73.51,NO-ESTIMATION,NO-ESTIMATION,NO-ESTIMATION,NO-ESTIMATION,CAPACITY-FACTOR-V1
34933,ZMB,Zambia,Victoria Falls,WRI1022380,108.0,-17.9167,25.8500,Hydro,,,...,575.780,575.780,548.940000,579.900,578.32,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1
34934,ZWE,Zimbabwe,Hwange Coal Power Plant Zimbabwe,GEODB0040404,920.0,-18.3835,26.4700,Coal,,,...,2287.765,2287.765,2146.980000,2004.275,2785.10,NO-ESTIMATION,NO-ESTIMATION,NO-ESTIMATION,NO-ESTIMATION,CAPACITY-FACTOR-V1


In [8]:
df.isna().sum()

country                               0
country_long                          0
name                                  0
gppd_idnr                             0
capacity_mw                           0
latitude                              0
longitude                             0
primary_fuel                          0
other_fuel1                       32992
other_fuel2                       34660
other_fuel3                       34844
commissioning_year                    9
owner                             14068
source                               15
url                                  18
geolocation_source                  419
wepp_id                           18702
year_of_capacity_data                 0
generation_gwh_2013                 337
generation_gwh_2014                 337
generation_gwh_2015                 337
generation_gwh_2016                 337
generation_gwh_2017                 337
generation_gwh_2018                 337
generation_gwh_2019               24710


In [9]:
df = df.ffill()
df

Unnamed: 0,country,country_long,name,gppd_idnr,capacity_mw,latitude,longitude,primary_fuel,other_fuel1,other_fuel2,...,estimated_generation_gwh_2013,estimated_generation_gwh_2014,estimated_generation_gwh_2015,estimated_generation_gwh_2016,estimated_generation_gwh_2017,estimated_generation_note_2013,estimated_generation_note_2014,estimated_generation_note_2015,estimated_generation_note_2016,estimated_generation_note_2017
0,AFG,Afghanistan,Kajaki Hydroelectric Power Plant Afghanistan,GEODB0040538,33.0,32.3220,65.1190,Hydro,,,...,123.770,162.900,97.390000,137.760,119.50,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1
1,AFG,Afghanistan,Kandahar DOG,WKS0070144,10.0,31.6700,65.7950,Solar,,,...,18.430,17.480,18.250000,17.700,18.29,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE
2,AFG,Afghanistan,Kandahar JOL,WKS0071196,10.0,31.6230,65.7920,Solar,,,...,18.640,17.580,19.100000,17.620,18.72,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE
3,AFG,Afghanistan,Mahipar Hydroelectric Power Plant Afghanistan,GEODB0040541,66.0,34.5560,69.4787,Hydro,,,...,225.060,203.550,146.900000,230.180,174.91,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1
4,AFG,Afghanistan,Naghlu Dam Hydroelectric Power Plant Afghanistan,GEODB0040534,100.0,34.6410,69.7170,Hydro,,,...,406.160,357.220,270.990000,395.380,350.80,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
34931,ZMB,Zambia,Ndola,WRI1022386,50.0,-12.9667,28.6333,Oil,Oil,Solar,...,392.680,392.680,374.786667,395.450,183.79,NO-ESTIMATION,NO-ESTIMATION,NO-ESTIMATION,NO-ESTIMATION,CAPACITY-FACTOR-V1
34932,ZMB,Zambia,Nkana,WRI1022384,20.0,-12.8167,28.2000,Oil,Oil,Solar,...,484.230,484.230,461.863333,487.675,73.51,NO-ESTIMATION,NO-ESTIMATION,NO-ESTIMATION,NO-ESTIMATION,CAPACITY-FACTOR-V1
34933,ZMB,Zambia,Victoria Falls,WRI1022380,108.0,-17.9167,25.8500,Hydro,Oil,Solar,...,575.780,575.780,548.940000,579.900,578.32,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1
34934,ZWE,Zimbabwe,Hwange Coal Power Plant Zimbabwe,GEODB0040404,920.0,-18.3835,26.4700,Coal,Oil,Solar,...,2287.765,2287.765,2146.980000,2004.275,2785.10,NO-ESTIMATION,NO-ESTIMATION,NO-ESTIMATION,NO-ESTIMATION,CAPACITY-FACTOR-V1


In [10]:
df.isna().sum()

country                               0
country_long                          0
name                                  0
gppd_idnr                             0
capacity_mw                           0
latitude                              0
longitude                             0
primary_fuel                          0
other_fuel1                          19
other_fuel2                         141
other_fuel3                       24887
commissioning_year                    9
owner                                19
source                                0
url                                   0
geolocation_source                    0
wepp_id                               0
year_of_capacity_data                 0
generation_gwh_2013                 337
generation_gwh_2014                 337
generation_gwh_2015                 337
generation_gwh_2016                 337
generation_gwh_2017                 337
generation_gwh_2018                 337
generation_gwh_2019               24710
