# Validation of the PyPSA-Earth stats

## Description
This task aims to develop such notebook that:
- takes as input the files from folders from pypsa-earth: `results/{scenarios}/stats.csv` (see PR Create statistics #579). In the meantime, data is loaded from `notebooks/validation/temp_stats_csv/stats_merged.csv`
- loads open data on power systems across the world
- Creates plots to perform the validation
Plots and tables shall have different aggregation levels (e.g. demand for a continent)

Create statistics for:
- demand
- installed capacity by technology
- renewable sources
- network characteristics (length of lines for example)

Plots:
- Compare the statistics of the PyPSA-Earth model with open data

## Public data sources collection
These sources could be helpful:
- [ENTSO-E](https://transparency.entsoe.eu/generation/r2/installedGenerationCapacityAggregation/show)
- [IRENA](https://www.irena.org/data-and-statistics), not working
- [IEA](https://www.iea.org/data-and-statistics)
- [WEC](https://www.worldenergy.org/statistics/), not working
- [WRI](https://www.wri.org/resources/data-sets)
- [UN](https://unstats.un.org/unsd/snaama/)
- [WBG](https://datacatalog.worldbank.org/dataset/world-development-indicators)
- [OECD](https://data.oecd.org/)
- [Eurostat](https://ec.europa.eu/eurostat/data/database)
- [EIA](https://www.eia.gov/outlooks/aeo/data/browser/)
- [Enerdata](https://www.enerdata.net/research/)
- [BP](https://www.bp.com/en/global/corporate/energy-economics/statistical-review-of-world-energy.html)
- [USAID](https://www.usaid.gov/what-we-do/energy/global-energy-database), Single countries only?

https://www.usaid.gov/powerafrica/nigeria


## TODO
- Include continent analysis with country converter coco

## Preparation

### Import packages

In [None]:
import logging
import os
import sys

import pypsa
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

logger = logging.getLogger(__name__)

pd.set_option("display.max_columns", None)
pd.set_option("display.max_colwidth", 70)

### Set main directory to root folder

In [None]:
# change current directory
module_path = os.path.abspath(os.path.join('../../../')) # To import helpers

if module_path not in sys.path:
    sys.path.append(module_path+"/pypsa-earth/scripts")
    
from _helpers import sets_path_to_root, country_name_2_two_digits, two_digits_2_name_country

sets_path_to_root("documentation")

### Load stats data (obtained from pypsa-earth)

In [None]:
# Read it with multilevel column names. #TODO are multilevel column names necessary?
stats = pd.read_csv("notebooks/validation/temp_stats_csv/stats_merged.csv", index_col=0, header=[0,1])

In [None]:
stats.head()

### Load public data

In [None]:
EXAMPLE_URL="https://pxweb.irena.org/pxweb/en/IRENASTAT/IRENASTAT__Power%20Capacity%20and%20Generation/ELECCAP_2022_cycle2.px/"

In [None]:
# Read the data
irena_eleccap = pd.read_csv("notebooks/validation/temp_irena/ELECCAP_20230314-165057.csv", encoding="latin-1", skiprows=2)

# Replace ".." in the dataframe with NaN
irena_eleccap = irena_eleccap.replace("..", np.nan)

# Change dtype of column "Installed electricity capacity by country/area (MW)" to float
irena_eleccap["Installed electricity capacity by country/area (MW)"] = irena_eleccap["Installed electricity capacity by country/area (MW)"].astype(float)

In [None]:
# Combine ongrid and offgrid
irena_eleccap = irena_eleccap.groupby(["Country/area", "Year", "Technology"]).sum(numeric_only=True).reset_index() #"Technology", "Installed electricity capacity by country/area (MW)"

# Delete the column "Year" since it is not needed anymore
irena_eleccap = irena_eleccap.drop(columns=["Year"])

In [None]:
# Check data for a single country
irena_eleccap[irena_eleccap["Country/area"] == "Germany"].head(5)

## Validation

### Installed capacity by technology

In [None]:
# Define the technologies which should be compared
techs = ["CCGT", "OCGT", "nuclear", "oil", "onwind", "ror", "solar", "hydro"]

In [None]:
# Select rule "add_electricity" and their techs 
stats_capacities = stats["add_electricity"].loc[:, (techs)]

In [None]:
# Replace NaN with zeros
stats_capacities = stats_capacities.fillna(0)

In [None]:
# Combine CCGT and OCGT to "gas"
stats_capacities["gas"] = stats_capacities["CCGT"] + stats_capacities["OCGT"]
del stats_capacities["CCGT"] 
del stats_capacities["OCGT"] # TODO write in one line

In [None]:
stats_capacities.head()

#### Uniform technology names and combine datasets

In [None]:
# Show names of IRENA technologies
#irena_eleccap["Technology"].unique()

In [None]:
# Create dict to match the technology names of stats_capacities and irena eleccap
names = {"Solar photovoltaic": "solar",
        "Onshore wind energy": "onwind",
        #"Offshore wind energy": "offwind",
        "Renewable hydropower": "hydro",
        "Nuclear": "nuclear",
        "Oil": "oil",
        "Natural gas": "gas",
        "Mixed Hydro Plants": "ror", # TODO Is this correct? Check IRENA    
        }

In [None]:
# Rename the technologies in irena_eleccap to match the names in stats_capacities using the dict names
irena_eleccap["Technology"] = irena_eleccap["Technology"].replace(names)

In [None]:
# Transform technologies to columns and have the countries as index
irena_eleccap = irena_eleccap.pivot_table(index=["Country/area"], columns="Technology", values="Installed electricity capacity by country/area (MW)")

In [None]:
irena_eleccap.head(10)

In [None]:
stats_capacities.head()

In [None]:
# Change the index of irena_eleccap to two digit country name using the function country_name_2_two_digits()
irena_eleccap.index = irena_eleccap.index.map(country_name_2_two_digits)


In [None]:
# Merge the two dataframes
merged = pd.merge(stats_capacities, irena_eleccap, left_index=True, right_index=True)

In [None]:
"DE" in irena_eleccap.index

In [None]:
stats_capacities.index

In [None]:
merged.head()

#### Plot

In [None]:
stats_capacities.loc["ZA"]

In [None]:
irena_eleccap.head()

In [None]:
stats_capacities.loc["ZA"].index

In [None]:
irena_eleccap.loc["ZA"][stats_capacities.loc["ZA"].index]

In [None]:


data_irena = irena_eleccap.loc["ZA"][stats_capacities.loc["ZA"].index]
data_stats = stats_capacities.loc["ZA"]

In [None]:
# Plot a barplot to compare the technologies of the two dataframes irena_eleccap and stats_capacities for the countries/indices "ZA"

r = [0,1,2,3,4,5,6] # TODO dynamically adjust number of bars 
barWidth = 0.4

# Move the bars next to each other
r_right = [x + barWidth for x in r]
plt.figure(figsize=(6, 4))
plt.bar(r, data_stats, color=['g'], alpha=1, edgecolor='white', width=barWidth)
plt.bar(r_right, data_irena, color=['g'], alpha=0.3, edgecolor='white', width=barWidth)


# enhance graph
plt.xticks(r, data_stats.index)
plt.ylabel("Capacity in MW")

# Add a legend
plt.legend(["PyPSA-Earth", "IRENA"], loc='upper left', bbox_to_anchor=(1,1), ncol=1)

# Add grid on y-axis
plt.grid(axis='y', alpha=0.5)

# Show graphic
#plt.savefig("file", bbox_inches='tight')
plt.show()




In [None]:



merged.loc["ZA"].plot.bar(figsize=(10,5))





### Demand