# CPA01 - Data Analysis of Global Power Plants
This notebook uses the Global Power Plant Database that contains data of power plants around the world. The database contains about 35,000 power plants in 167 countries.
* [Global Power Plant Database](https://datasets.wri.org/dataset/globalpowerplantdatabase) (Source: World Resources Institute)

Here are some questions I'm trying to answer using the dataset:
* How many power plants does the US have?
* Which countries have the most power plants? What kinds of primary fuels do they use?
* Which countries have the highest total electrical generating capacity?
* What are the most common primary fuels around the world?
* Which power plants are in the high latitudes?

##### Note: The answers (analysis) will be marked in bold after each plot/chart/pivot table.

&emsp;

In [None]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

## Read the Data Into a Data Frame

In [None]:
# set low_memory to False to deal with DtypeWarning
df = pd.read_csv("data/global_power_plant_database.csv", low_memory = False)
df

## Data Exploration

#### Rough Overview

In [None]:
df.describe()

#### Columns in the Dataset

In [None]:
df.columns

##### Above is the variable names (column names) of the dataset.

#### Rows in the Dataset

In [None]:
len(df.index)

##### There are 34936 power plants in the world included in the dataset.

#### Countries Included in the Dataset

List of countries:

In [None]:
country_long = df["country_long"].unique()
country_long

##### Above are all countries included in the dataset.

In [None]:
len(country_long)

##### There are 167 countries included in the dataset.

#### Types of Primary Fuels

In [None]:
primary_fuel = df["primary_fuel"].unique()
primary_fuel

##### Above are the types of primary fuels of power plants around the world.

## Answers to the questions
* How many power plants does the US have?

In [None]:
df_usa = df[df["country"] == "USA"]
df_usa

##### Above is data with power plants in the US only.

In [None]:
len(df_usa.index)

##### There are 9833 power plants in the US.

* Which countries have the most power plants? What kinds of primary fuels do they use?

In [None]:
country_pp = df.groupby("country_long").size()
country_pp.sort_values().plot.barh(figsize = (15,35), color = "steelblue")
plt.grid()
plt.title("Number of Power Plants by Country")
plt.xlabel("Number")
plt.ylabel("Country")

In [None]:
country_pp.sort_values(ascending = False).head(5)

##### Above are the top 5 countries having the most power plants. The US has 9,833, China has 4,235, the UK has 2,751, Brazil has 2,360, and France has 2,155 power plants.

In [None]:
# use fill_value = 0 to get rid of NaNs
df_top5 = df[df["country_long"].isin(["United States of America", "China", "United Kingdom", "Brazil", "France"])].copy()
pivot = pd.pivot_table(df_top5, values = ["gppd_idnr"], index = ["primary_fuel"],
                       columns = ["country_long"], aggfunc = "count", fill_value = 0)
pivot

In [None]:
pivot.plot.bar(figsize = (15,10), stacked = True)
plt.title("Primary Fuels")
plt.xlabel("Number of Power Plants Using that Primary Fuel")
plt.ylabel("Kinds of Primary Fuels")
plt.legend(["Brazil", "China", "France", "United Kingdom", "United States of America"])

##### Above are the pivot table and the stack bar chart showing the energy source used in primary electricity generation in the top 5 countries. Solar, gas, hydro, wind, and oill are some fuels used the most within them.

* Which countries have the highest total electrical generating capacity?

In [None]:
country_ca = df.groupby("country_long")["capacity_mw"].agg(sum)
country_ca.sort_values().plot.barh(figsize = (15,35), color = "teal")
plt.grid()
plt.title("Total Electrical Generating Capacity by Country")
plt.xlabel("Million Megawatts")
plt.ylabel("Country")

In [None]:
country_ca.sort_values(ascending = False).head(5)

##### Above are the top 5 countries having the highest total electrical generating capacity. China and the US still top the chart. While not being the top 5 countries having the most power plants, India, Russia, and Japan have fairly high total electrical generating capacities in the world.

* What are the most common primary fuels around the world?

In [None]:
fuels = df.groupby("primary_fuel").size()
fuels.plot.bar(figsize = (10,5),color = "palevioletred")
plt.title("Primary Fuels")
plt.xlabel("")
plt.ylabel("Number of Power Plants")

In [None]:
fuels.sort_values(ascending = False)

##### According to the results, the 2 most common kinds of primary fuels are solar and hydro, which are used by 10,665 and 7,156 power plants around the world.

* Which power plants are in the high latitudes?

In [None]:
lat = df[["country_long", "name", "latitude"]]
lat.sort_values(ascending = False, by = "latitude", key = abs).head(10)

##### Above are the top 10 power plants around the world in the high latitudes. They're all in Antarctica, Norway, and the US.