# Create statistics for a given scenario run

This notebook explains how to create statistics on the overall workflow by running the ```make_statistics``` Snakemake rule.

1. Start by building the PyPSA-Earth [tutorial model](https://pypsa-earth.readthedocs.io/en/latest/short_tutorial.html). 
2. A test case for Nigeria ("NG") has been set up in the ```config.NG.yaml``` file in the ```configs/scenarios``` [folder](https://github.com/javier-cp6/pypsa-earth/tree/main/configs/scenarios).
3. Run the command ```snakemake -j 1 make_statistics```, which will trigger the ```make_statistics.py``` script. As a result, the file ```results/stats.csv``` (shown in the table below) will be created with relevant information, such as:  

    - For clean_osm_data and download_osm_data: the number of elements, length of the lines and length of DC lines are stored.
    - For build_shapes: the surface, total GDP, total population and number of shapes are collected.
    - For build_renewable_profiles: total available potential and average production are collected.
    - For network rules (base_network, add_electricity, simplify_network and solve_network): length of lines, number of buses and total installed capacity by generation technology. For further details see documentation about [lines](https://pypsa.readthedocs.io/en/latest/components.html#line) and  [generators](https://pypsa.readthedocs.io/en/latest/components.html#generator) 
    - Execution time for the rules, when benchmark is available. Computational stats are measured using Snakemake's benchmark rules. For further details see benchmark [documentation](https://snakemake.readthedocs.io/en/v7.24.1/snakefiles/rules.html#benchmark-rules) and [script](https://github.com/snakemake/snakemake/blob/main/snakemake/benchmark.py).


In [1]:
import os
import sys

# import _helpers from pypsa-earth scripts
module_path = os.path.abspath(os.path.join('../../'))
if module_path not in sys.path:
    sys.path.append(module_path+"/pypsa-earth/scripts")
    
from _helpers import sets_path_to_root, read_csv_nafix

# set root folder where pypsa-earth is installed
sets_path_to_root("studio-lab-user")

This is the repository path:  /home/studio-lab-user
Had to go 2 folder(s) up.


In [2]:
stats_path = os.path.realpath("pypsa-earth") + "/results/NG/stats.csv"
df = read_csv_nafix(stats_path, header=[0, 1], index_col=0)
df = df.T

In [3]:
# convert 'lines-length' values from m to km
mask = df.index.get_level_values(1).str.contains('lines-length')
df.loc[mask] = df.loc[mask].div(1e3).astype(float).round(0)

# convert 'area' values from m2 to km2
mask = df.index.get_level_values(1).str.contains('area')
df.loc[mask] = df.loc[mask].div(1e6).astype(float).round(0)

In [4]:
# add 'unit' column
unit_dict = {
    # size and length
    'size': 'EA',
    'lines-length': 'km',
    # computational units
    'total_time': 's',
    'mean_load': '%',
    'max_memory': 'MB',
    # build_shapes
    'area': 'km^2',
    'country_matching': '%',
    'pop': 'inhabitants',
    'gdp': 'USD',
    # build_renewable_profiles
    'potential': 'MW',
    'avg_production_pu': 'MWh',
    # network units
    'buses_number': 'EA',
    'lines_length': 'km',
    'lines_capacity': 'MVA',
    'CCGT': 'MW',
    'OCGT': 'MW',
    'oil': 'MW',
    'onwind': 'MW',
    'solar': 'MW',
    'hydro': 'MW',
}

df['unit'] = ""

for key, value in unit_dict.items():
    mask = df.index.get_level_values(1).str.contains(key)
    df.loc[mask, 'unit'] = value


mask = df.index.get_level_values(0) == "snakemake_status"
df.loc[mask, 'unit'] = ""


# add 'description' column
descr_dict = {
    # computational
    'total_time': 'Running time in seconds.',
    'mean_load': 'CPU usage percentage of the total running time.',
    'max_memory': 'Maximal Virtual Memory Size (VMS) in MB.',
    # build_renewable_profiles
    'potential': 'Technical installable power potential.',
    'avg_production_pu': 'Average production by plant (hydro) or bus (other RES).',
}

df['description'] = ""

for key, value in descr_dict.items():
    mask = df.index.get_level_values(1).str.contains(key)
    df.loc[mask, 'description'] = value


# set 'unit' and 'description' columns as indexes
df = df.set_index(['description', 'unit'], append=True)

In [5]:
unit_list = ['km', 'km^2', 'inhabitants']
mask = df.index.get_level_values(3).isin(unit_list)
df.loc[mask, :] = df.loc[mask, :].applymap('{:,.0f}'.format)

In [6]:
df = df.style.format(precision=2, thousands=",").set_table_styles([
    {'selector': '.index_name', 'props': [('text-align', 'center')]},
    {'selector': 'th.row_heading', 'props': [('text-align', 'center')]},
    {'selector': 'th.row_heading.level2', 'props': [('text-align', 'left')]},
    {'selector': 'th.col_heading', 'props': [('text-align', 'center')]},
    {'selector': 'td', 'props': [('text-align', 'center')]},
])
df

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,NG
rule,key,description,unit,Unnamed: 4_level_1
download_osm_data,cables-size,,EA,1
download_osm_data,generators-size,,EA,425
download_osm_data,lines-size,,EA,645
download_osm_data,substations-size,,EA,188
download_osm_data,total_time,Running time in seconds.,s,2.67
download_osm_data,mean_load,CPU usage percentage of the total running time.,%,70.84
download_osm_data,max_memory,Maximal Virtual Memory Size (VMS) in MB.,MB,479.27
clean_osm_data,generators-size,,EA,115
clean_osm_data,lines-size,,EA,449
clean_osm_data,lines-size_dc,,EA,0
