# Power plants `14 points`

Source: https://datasets.wri.org/dataset/globalpowerplantdatabase

Description: A comprehensive, global, open source database of power plants

**Topics:**

* Filtering
* Aggregation
* Making maps

In [39]:
import pandas as pd
import numpy as np
pd.set_option("display.max_columns", None)
pd.set_option("display.float_format", '{:,.2f}'.format)

## Basic analysis `2 points`

In [6]:
df1 = pd.read_csv("global_power_plant_database.csv")
df1.head()

Unnamed: 0,country,country_long,name,gppd_idnr,capacity_mw,latitude,longitude,primary_fuel,other_fuel1,other_fuel2,other_fuel3,commissioning_year,owner,source,url,geolocation_source,wepp_id,year_of_capacity_data,generation_gwh_2013,generation_gwh_2014,generation_gwh_2015,generation_gwh_2016,generation_gwh_2017,generation_gwh_2018,generation_gwh_2019,generation_data_source,estimated_generation_gwh_2013,estimated_generation_gwh_2014,estimated_generation_gwh_2015,estimated_generation_gwh_2016,estimated_generation_gwh_2017,estimated_generation_note_2013,estimated_generation_note_2014,estimated_generation_note_2015,estimated_generation_note_2016,estimated_generation_note_2017
0,AFG,Afghanistan,Kajaki Hydroelectric Power Plant Afghanistan,GEODB0040538,33.0,32.32,65.12,Hydro,,,,,,GEODB,http://globalenergyobservatory.org,GEODB,1009793.0,2017.0,,,,,,,,,123.77,162.9,97.39,137.76,119.5,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1
1,AFG,Afghanistan,Kandahar DOG,WKS0070144,10.0,31.67,65.8,Solar,,,,,,Wiki-Solar,https://www.wiki-solar.org,Wiki-Solar,,,,,,,,,,,18.43,17.48,18.25,17.7,18.29,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE
2,AFG,Afghanistan,Kandahar JOL,WKS0071196,10.0,31.62,65.79,Solar,,,,,,Wiki-Solar,https://www.wiki-solar.org,Wiki-Solar,,,,,,,,,,,18.64,17.58,19.1,17.62,18.72,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE
3,AFG,Afghanistan,Mahipar Hydroelectric Power Plant Afghanistan,GEODB0040541,66.0,34.56,69.48,Hydro,,,,,,GEODB,http://globalenergyobservatory.org,GEODB,1009795.0,2017.0,,,,,,,,,225.06,203.55,146.9,230.18,174.91,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1
4,AFG,Afghanistan,Naghlu Dam Hydroelectric Power Plant Afghanistan,GEODB0040534,100.0,34.64,69.72,Hydro,,,,,,GEODB,http://globalenergyobservatory.org,GEODB,1009797.0,2017.0,,,,,,,,,406.16,357.22,270.99,395.38,350.8,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1


### According to this dataset, what ten countries have the most power plants?

In [16]:
df1.country_long.value_counts(normalize=False).head(10)

United States of America    9833
China                       4235
United Kingdom              2751
Brazil                      2360
France                      2155
India                       1589
Germany                     1309
Canada                      1159
Spain                        829
Russia                       545
Name: country_long, dtype: int64

### According to this dataset, what ten countries are producing (or can produce) the most energy?

In [23]:
df1.groupby('country_long').capacity_mw.sum().sort_values(ascending = False).head(10)

country_long
China                      1,415,067.38
United States of America   1,204,638.05
India                        316,088.55
Russia                       228,220.05
Japan                        215,365.85
Brazil                       147,589.27
Canada                       143,578.70
Germany                      112,040.37
France                       110,615.93
South Korea                   99,472.68
Name: capacity_mw, dtype: float64

### What is the most common kind of power plant?

In [29]:
df1.primary_fuel.value_counts(normalize=True).head(10)*100

Solar        30.53
Hydro        20.48
Wind         15.30
Gas          11.44
Coal          6.67
Oil           6.64
Biomass       4.09
Waste         3.06
Nuclear       0.56
Geothermal    0.54
Name: primary_fuel, dtype: float64

### What kinds of power plants produce the most energy, in total?

In [51]:
df1.groupby('primary_fuel').capacity_mw.sum().sort_values(ascending = False)

primary_fuel
Coal             1,965,541.00
Gas              1,493,050.60
Hydro            1,053,159.62
Nuclear            407,911.76
Wind               263,053.73
Oil                261,878.71
Solar              188,312.32
Biomass             34,281.30
Waste               14,748.71
Geothermal          12,687.75
Cogeneration         4,048.00
Other                3,612.86
Petcoke              2,424.58
Storage              1,712.30
Wave and Tidal         552.20
Name: capacity_mw, dtype: float64

## Green energy `2 points`

### Create a new column for each power plant, `is_green`

Include... whatever you feel is actually green. Maybe Hydro, nuclear, wind, solar, geothermal, wave/tidal. `0` means it isn't green, `1` means it is green.

In [99]:
conditions = [
    (df1['primary_fuel'] == 'Hydro'),
    (df1['primary_fuel'] == 'Solar'),
    (df1['primary_fuel'] == 'Wind'),
    (df1['primary_fuel'] == 'Geothermal'),
    (df1['primary_fuel'] == 'Waste'),
    (df1['primary_fuel'] == 'Biomass'),
    (df1['primary_fuel'] == 'Wave and Tidal')
    ]

values = [1,1,1,1,1,1,1]

df1["is_green"] = 0
df1["is_green"] = np.select(conditions,values)
df1.head()

Unnamed: 0,country,country_long,name,gppd_idnr,capacity_mw,latitude,longitude,primary_fuel,other_fuel1,other_fuel2,other_fuel3,commissioning_year,owner,source,url,geolocation_source,wepp_id,year_of_capacity_data,generation_gwh_2013,generation_gwh_2014,generation_gwh_2015,generation_gwh_2016,generation_gwh_2017,generation_gwh_2018,generation_gwh_2019,generation_data_source,estimated_generation_gwh_2013,estimated_generation_gwh_2014,estimated_generation_gwh_2015,estimated_generation_gwh_2016,estimated_generation_gwh_2017,estimated_generation_note_2013,estimated_generation_note_2014,estimated_generation_note_2015,estimated_generation_note_2016,estimated_generation_note_2017,is_green
0,AFG,Afghanistan,Kajaki Hydroelectric Power Plant Afghanistan,GEODB0040538,33.0,32.32,65.12,Hydro,0,0,0,0.0,0,GEODB,http://globalenergyobservatory.org,GEODB,1009793,2017.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,123.77,162.9,97.39,137.76,119.5,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1,1
1,AFG,Afghanistan,Kandahar DOG,WKS0070144,10.0,31.67,65.8,Solar,0,0,0,0.0,0,Wiki-Solar,https://www.wiki-solar.org,Wiki-Solar,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,18.43,17.48,18.25,17.7,18.29,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,1
2,AFG,Afghanistan,Kandahar JOL,WKS0071196,10.0,31.62,65.79,Solar,0,0,0,0.0,0,Wiki-Solar,https://www.wiki-solar.org,Wiki-Solar,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,18.64,17.58,19.1,17.62,18.72,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,1
3,AFG,Afghanistan,Mahipar Hydroelectric Power Plant Afghanistan,GEODB0040541,66.0,34.56,69.48,Hydro,0,0,0,0.0,0,GEODB,http://globalenergyobservatory.org,GEODB,1009795,2017.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,225.06,203.55,146.9,230.18,174.91,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1,1
4,AFG,Afghanistan,Naghlu Dam Hydroelectric Power Plant Afghanistan,GEODB0040534,100.0,34.64,69.72,Hydro,0,0,0,0.0,0,GEODB,http://globalenergyobservatory.org,GEODB,1009797,2017.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,406.16,357.22,270.99,395.38,350.8,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1,1


### What percent of the world's energy is green?

In [61]:
(df1[df1['is_green'] ==1].capacity_mw.sum()/df1.capacity_mw.sum()).round(2)*100

27.0

### What countries use the highest percentage of green energy? `2 points`

This one will probably be tricky, and might take a few steps.

In [90]:
df1 = df1.fillna(0)

countries_by_green = (df1[(df1['is_green'] == 1)].groupby('country_long').capacity_mw.sum())\
/(df1.groupby('country_long').capacity_mw.sum())\
.round(2)*100

In [96]:
countries_by_green.sort_values(ascending = False).head(20)

country_long
Lesotho                            100.00
Bhutan                             100.00
Ethiopia                           100.00
Iceland                            100.00
Swaziland                          100.00
Suriname                           100.00
Saint Lucia                        100.00
Paraguay                           100.00
Palestine                          100.00
Burundi                            100.00
Mozambique                         100.00
Mali                               100.00
Nepal                              100.00
Democratic Republic of the Congo    98.75
Malawi                              95.80
Norway                              95.53
Albania                             93.59
Uganda                              93.53
Tajikistan                          88.48
Afghanistan                         86.03
Name: capacity_mw, dtype: float64

### Do you feel like you might need to add some more requirements to be considered a country with 100% green energy? If so, filter and re-calculate.

In [None]:
#No, I am happy with my results.

## Making maps

### Point map `2 points`

Using the tool of your choice, make a point map of power plants in **one specific country**.

In [118]:
df2 = df1[df1['country_long'] =='Angola']
df3 = df2[['name','capacity_mw','latitude','longitude','primary_fuel']]
df3.to_csv('Angolan_Power_Plants.csv')

https://www.datawrapper.de/_/dK9QH/

### Bubble map `2 points`

Make the point map above more exciting by coloring according to fuel type and sizing bubbles according to power production capacity.

https://www.datawrapper.de/_/Xa6w2/

### Hex bin map `4 points`

In the country of your choice, make a hex bin map of hydroelectric power plants (or if there aren't any: something else!).