# Solution `pandas`

### Power Plants Data

In this exercise, we will use the [powerplants.csv](https://raw.githubusercontent.com/PyPSA/powerplantmatching/master/powerplants.csv) dataset from the [powerplantmatching](https://github.com/PyPSA/powerplantmatching) project. This dataset contains information about various power plants, including their names, countries, fuel types, capacities, and more.

URL: `https://raw.githubusercontent.com/PyPSA/powerplantmatching/master/powerplants.csv`

**Task 1:** Load the dataset into a pandas DataFrame.

In [None]:
import pandas as pd
url = "https://raw.githubusercontent.com/PyPSA/powerplantmatching/master/powerplants.csv"
df = pd.read_csv(url, index_col=0)

**Task 2:** Run the function `.describe()` on the DataFrame.

In [None]:
df.describe()

**Task 3:** Provide a list of unique fuel types and technologies included in the dataset.

:::{note}
Look in the `pandas` documentation for functions that might be useful to solve these tasks.
:::

In [None]:
df.Fueltype.unique()

In [None]:
df.Technology.unique()

**Task 4:** Filter the dataset by power plants with the fuel type "Hard Coal".

In [None]:
coal = df.loc[df.Fueltype == "Hard Coal"]
coal

**Task 5:** Identify the 5 largest coal power plants. In which countries are they located? When were they built?

In [None]:
selection = coal.Capacity.nlargest(5).index
selection

In [None]:
coal.loc[selection, ["Name", "Country", "Capacity", "DateIn"]]

**Task 6:** Identify the power plant with the longest name.

In [None]:
i = df.Name.apply(lambda x: len(x)).argmax()
df.iloc[i]

**Task 7:** Identify the 10 northernmost powerplants. What type of power plants are they?

In [None]:
index = df.lat.nlargest(10).index
df.loc[index]

**Task 8:** What is the average start year of each fuel type? Sort the fuel types by their average start year in ascending order and round to the nearest integer.

In [None]:
df.groupby("Fueltype").DateIn.mean().round().sort_values()

### Wind and Solar Capacity Factors

In this exercise, we will work with a time series dataset containing hourly wind and solar capacity factors for Ireland, taken from [model.energy](https://model.energy).

**Task 1:** Use `pd.read_csv` to load the dataset from the following URL into a pandas DataFrame. Ensure that the time stamps are treated as `pd.DatetimeIndex`.

In [None]:
url = "https://model.energy/data/time-series-2b42655fa0b49b73fb15871dba2f7000.csv"
df = pd.read_csv(url, index_col=0, parse_dates=True)

**Task 2:** Calculate the mean capacity factor for wind and solar over the entire time period.


In [None]:
df.mean()


**Task 3:** Calculate the correlation between wind and solar capacity factors.

:::{note}
Go to the [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/index.html) for functions that might be useful to solve these tasks.
:::

In [None]:
df.corr()

**Task 4:** Plot the wind and solar capacity factors for the month of May.

In [None]:
df.loc["05-2011"].plot(ylabel='capacity factor')

**Task 5:** Plot the weekly average capacity factors for wind and solar over the entire time period.

In [None]:
df.resample("W").mean().plot()

**Task 6:** Go to [model.energy](https://model.energy) and retrieve the time series for another region of your choice. Recreate the analysis above and compare the results.

:::{note}
Look for "Download Comma-Separated-Variable (CSV) file of data" in Step 2.
:::