# Pandas Extension - Wind Power By Country

In this notebook, we will grab a table from wikipedia and store the data in a `pandas` dataframe.

We will then use the dataframe data to create some more information, make some projections and plot some graphs. The last part includes using the `Matplotlib` library which we will look at in detail later in the course. 

Run the code in the code cells in order. There are 4 tasks (highlighted in green) for you to do.

## Grab Some Data from the Web
The next few code cells grab some data from the web (Wikipedia) and stores the contents of a particular table we want to analyze in a pandas dataframe.

In [None]:
# Import the requests library for making http requests.
import requests as r

# Create a header that says the request is coming form a browser-like agent (this is to prevent the website blocking our request).
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36',
}

# Make an http request to get the webpage with the following url.
url = "https://en.wikipedia.org/wiki/Wind_power_by_country"
page = r.get(url, headers = headers)

# Check that the request was successful. If so, the status code should be 200.
page.status_code

In [None]:
# Print the first few hundred characters of the webpage..
# The actual content of the page, which is HTML, is stored in the 'content' attribute of the page object.
print(page.content[0:200])

In [None]:
# Import the pandas and numpy libraries.
import pandas as pd, numpy as np

# Use pd.read_html() to find all the tables in the webpage and put them in a list.
tables = pd.read_html(page.content)

# There should be 13 tables in this web page. We want the fifth one (index == 4).
# We will store a copy of this table in a dataframe called wp (for wind power).
wp = tables[4].copy()
wp

## Columns Explained

- Country: the name of the country
- Gen (TWh): the total energy generated by wind power in 2024 for that country, measured in Terawatt-hours.
- % gen.: the percentage of electricity generation that came from wind power in 2024, for that country.
- Cap. (GW): the total generating capacity for the country in 2024, measured in Gigawatts.
- % cap. growth: The growth in generating capacity for that country, measured as a percentage.
- Cap. fac.: Capacity factor. This says how much energy was generated in 2024 by wind as a percentage of the theorectical max (if it was really windy all the time!)

Some of these columns are not very clear, so we will  rename them below.

## Some Data Processing
Now that we've grabbed the data from a webpage and stored it in a dataframe called `wp`, we can start doing some data processing.

In [None]:
# Sort the rows on the "Gen (TWh)" column from largest to smallest.
wp.sort_values('Gen (TWh)', ascending=False, inplace=True)
wp

In [None]:
# Delete the first row which is totals for the whole world.
wp.drop(0, inplace=True)

# Note: when we do the above, we can also reset the index to start at 0 and be consecutive integers. This makes future operations easier.
wp.reset_index(drop=True, inplace=True)

# drop the last column which we won't use.
wp.drop(columns = ['Cap. fac.'], inplace=True)
wp

In [None]:
# Rename some columns to make it more readable.
wp.rename(columns={"Gen (TWh)": "Generation 2024 (TWh)", "% gen.":"% of Total Generation 2024", 
                   "Cap. (GW)":"Capacity 2024 (GW)", "% cap. growth":"% Growth in Capacity"}, inplace='True')
wp

## Data Analysis

We will now do some analysis on the data. This will include making projections for future generation and capacity based on the information available in the dataframe.

In [None]:
# Find the country where wind is the highest % of total generation.
max_pc = wp["% of Total Generation 2024"].max()
country_index = np.where(wp["% of Total Generation 2024"] == max_pc)[0][0]
country = wp.loc[country_index, 'Country']
print(f"{country} generates {max_pc:.1f}% of its electricity from wind power, more than any other country.")

<div style="background-color: #66CC00; padding: 10px;">

## Task 1

Copy and modify the code above to find the country with the greatest percentage growth in capacity.

</div>

In [None]:
# Type your code here


In [None]:
# Work out each country's generation as a percentage of the global total.
total_gen = wp['Generation 2024 (TWh)'].sum()
wp['% of Global Generation 2024']=(wp['Generation 2024 (TWh)'] / total_gen)*100
wp

<div style="background-color: #66CC00; padding: 10px;">

## Task 2

Copy the code above and modify it to add a column called "% of Global Capacity" that gives each country's share of global wind power capacity.
</div>

In [None]:
# Type your code here


In [None]:
# Project future generation based on current capacity and capacity growth.
# We will assume that future generation will grow in proportion to % capacity growth.
wp['Projected Generation 2025 (TWh)'] = wp['Generation 2024 (TWh)'] * (1 + wp['% Growth in Capacity']/100)
wp

<div style="background-color: #66CC00; padding: 10px;">

## Task 3

Copy the code above and modify it to add a column called "Projected Capacity 2025 (GW)" that gives a projection of each country's capacity, based on its current capacity and percentage growth in capacity.
</div>

In [None]:
# Type your code here


## Using Matplotlib to Visual Our Work

We will import and use the `matplotlib.pyplot` library to create some visualizations of our data. 

In [None]:
# Import the relevant plotting library.
import matplotlib.pyplot as plt

# Create a bar graph showing generation for the top ten countries.
x = wp['Country'][0:10].tolist()
y = wp['Generation 2024 (TWh)'][0:10].tolist()

x = x[::-1]
y = y[::-1]
plt.barh(x, y)
plt.title("Wind Power Generation 2024 (Top Ten Countries)")
plt.xlabel("Generation (TWh)")
plt.show()

In [None]:
# Create a pie chart showing share of global generation for 2024.

countries = wp['Country'][0:5].tolist()
data = wp['% of Global Generation 2024'][0:5].tolist()
countries.append("Others")
data.append(100-sum(data))
plt.figure(figsize=(10, 8))
plt.title("Share of Global Wind Generation 2024")
plt.pie(data, labels=countries, autopct='%1.1f%%')
plt.show()

<div style="background-color: #66CC00; padding: 10px;">

## Task 4

1. Use the data to create a bar chart similar to that above, but which shows the projected generation for 2025 for the top 10 countries.
2. Use the data to create a pie chart similar to that above, but which shows the projected share of global generation for 2025.
</div>

In [None]:
# Type your code here
