# Analysis of the generated daily averaged profiles

In this notebook you will see how to plot the averaged daily profiles generated with RAMP and how to analyse the clustering provided by the ML model output. Additionally the profiles per estimated household number will be calculated using ML model output.

In [None]:
import os
import re
import ast
import pandas as pd
import matplotlib.pyplot as plt
country_iso = ""

if country_iso == "":
    print("! --- Please provide a country iso3 --- !")

## Convert ML output from geojson file to csv

This step is only applicable if the provided output of the ML model is in geojson format, you can skip it otherwise

In [None]:
from generate_demand_profiles import prepare_appliance_count
prepare_appliance_count(country_iso)


## Load the ML model data 

Here the user should load the output file from the ML model which provides the average number of appliances for household in each "admin2" region for the following appliances: Air conditioner, Air cooler, Electric cooker, Electric room heater, Electric water heater, Fan, Fridge, Home mechanical appliances (e.g., Mixer, Blender), Home thermal appliances (e.g., Kettle, Iron), Laptop / Computer, Light bulb, Mobile phone charger, Radio, Rice cooker, Sewing machine, Television and Washing machine. It contains also 2 columns labeled "cluster" and "num_hh".  The "cluster" column contains a cluster integer number between 0 and 2 mapping the region to a cluster found out by the ML model. The "num_hh" column contains the estimated number of housholds in the "admin2" region.

In [None]:
ml_appliance_count_file = f"{country_iso}_appliance_count.csv"  # This file contains the output of the ML model
adm1_col_name = "adm1"
adm2_col_name = "shapeName"

simulation_files_prefix = f"{country_iso}_all_intermediate"
simulation_files_prefix_avg = f"{country_iso}_all_intermediate_avg" # daily load profile averaged over one year
simulation_result_folder =  f"simulation_data_{country_iso}"

df_ai = pd.read_csv(ml_appliance_count_file)
df_ai

# Generate the demand load profiles

Here the values in the column `unified_names` from file `ramp_config/Household_template.csv` should match the column headers of all columns providing appliance count in the file `ml_appliance_count_file` defined in the cell above. The RAMP code should use the version located at https://github.com/RAMP-project/RAMP/tree/feature/loaddata_frame. The file `ramp_config/Household_template.csv` contains the time of use of the appliances used during the PeopleSun project (https://www.peoplesun.org/).

If you want to modify the time-of-use window of a given appliance you can use your own template as long as the `unified_names` column remains unmodified.

*NB*: It is recommended to generate the demand profiles with a script rather that within the notebook as it can be a lengthy process depending on the number of admin2 regions. You can simply run 

In [None]:
from generate_demand_profiles import process_household_data

if adm1_col_name not in df_ai.columns:
    df_ai[adm1_col_name] = "dummy"



process_household_data(
    df_ai,
    adm1_col=adm1_col_name,
    adm2_col=adm2_col_name,
    ramp_template_path="ramp_config/Household_template.csv",
    output_prefix=simulation_files_prefix,
    output_dir=simulation_result_folder,
)

## Collect the simulation output files

Search in the provided folder all files corresponding to the daily averaged profile

In [None]:
from analyse_demand_profiles import collect_profiles_path
sorted_files = collect_profiles_path(folder=simulation_result_folder, output_prefix=simulation_files_prefix_avg)
sorted_files

## Merge the simulated profiles in one dataframe

In [None]:
# Read and merge DataFrames on index
merged_df = pd.DataFrame()

for file_path in sorted_files:
    df = pd.read_csv(file_path, index_col=0)  # use index from file
    if merged_df.empty:
        merged_df = df
    else:
        merged_df = merged_df.join(df, how='outer')  # or 'inner' if strict alignment needed

print(merged_df.head())
daily_averaged_profiles = merged_df

## Figure of daily averaged profiles for each region

In [None]:
ax=daily_averaged_profiles.plot(legend=False, ylabel="Power W",xlabel="Hours of the day",title=f"All RAMP daily averaged profiles for {country_iso}")
ax.set_xticks([0, 240, 480, (60 * 12), (60 * 16), (60 * 20), (60 * 24)])
ax.set_xticklabels([0, 4, 8, 12, 16, 20, 24])

## Divide each admin2 region's profile by the estimated number of households within the area

In [None]:
normalized_profiles = daily_averaged_profiles.copy()
for region in normalized_profiles.columns:
    adm2, adm1 = ast.literal_eval(region)
    normalized_profiles[region] = merged_df[region] / df_ai.loc[(df_ai.shapeName == adm2) & (df_ai.adm1 == adm1), "num_hh"].values[0]

## Figure of daily averaged profiles per household for each region

In [None]:
ax=normalized_profiles.plot(legend=False, ylabel="Power W",xlabel="Hours of the day",title=f"All RAMP daily averaged profiles per household for {country_iso}")
ax.set_xticks([0, 240, 480, (60 * 12), (60 * 16), (60 * 20), (60 * 24)])
ax.set_xticklabels([0, 4, 8, 12, 16, 20, 24])

In [None]:
fig = ax.get_figure()
fig.savefig(f"All_profiles_{country_iso}.png")

## Figure grouping the profiles per cluster_num

The ML algorithm has clustered the region into up to 3 cluster, this figure shows

In [None]:
fig,axes = plt.subplots(1,3,figsize=(14, 7), sharey=True)

colors = ['#56B4E9', '#E69F00', '#009E73']


for cluster_num, color in zip([0,1,2],colors):
    temp = df_ai.loc[df_ai.cluster == cluster_num,[adm2_col_name,adm1_col_name]]
    print(f"{len(temp)} clusters with cluster num = {cluster_num}")
    if temp.empty is False:
        keys = []
        for adm2, adm1 in zip(temp.shapeName, temp.adm1):
            keys.append(str((adm2, adm1)))
        normalized_profiles[keys].plot(
            ax=axes[cluster_num], 
            legend=False,
            color=color,
            ylabel="Power W",
            title=f"Cluster num={cluster_num}"
        )


for ax in axes:
    ax.set_xticks([0, 240, 480, (60 * 12), (60 * 16), (60 * 20), (60 * 24)])
    ax.set_xticklabels([0, 4, 8, 12, 16, 20, 24])
axes[1].set_xlabel("Hours of the day")




In [None]:
fig.savefig(f"cluster_comparison_{country_iso}.png")

## Compute indicators

here the adm1_col and adm2_col arguments should match the column name of the admin 1 level names and admin 2 level names within the ML output file, expected to be named like "{country_iso}_appliance_count.csv". As the operation can take some time for countries with large number of regions, the indicators results are stored into a file "{country_iso}_demand_profile_stats.csv" for convenience.

In [None]:
from analyse_demand_profiles import post_process, bind_geometry

stats_df = post_process(
    simulation_result_folder,
    output_prefix=simulation_files_prefix,
    adm1_col=adm1_col_name,
    adm2_col=adm2_col_name,
    output_fname=f"{country_iso}_demand_profile_stats.csv",
    ml_output_path=ml_appliance_count_file
)

## Assign geometry shapes to each region from external source

If the user would like to display the results on a map, they can provide a geojson or shape files. The demand profiles will be matched to the regions. At this step, they can by mismatch between profiles and geometries caused by different numbers of regions or different name formats (FR vs EN for example) for the regions.

In case of different names, one should provide a mapping between the admin 2 region names provided within the output of ML model and the user provided geojson or shape files.

In [None]:
stats_gdf = bind_geometry(
    stats_df, 
    geojson_shapes="geoBoundaries-NER-ADM2.geojson", 
    adm1_col=None, 
    adm2_col=adm2_col_name, # default admin 2 level region names from ML model (geoBoundaries)
    adm1_col_geom=None, 
    adm2_col_geom="shapeName"  # Here provide the name of the column containing admin 2 level region names from your source
)

# Exporting data for further use in a webmap

The code to a webmap is available under https://github.com/rl-institut/AI4EA/tree/webmap. This code require some data to visualize which can be generated in the next two cells

## Save indicators to a geojson 

In [None]:
stats_gdf.to_file(f"{country_iso}_stats.geojson", driver="GeoJSON")

## Save timeseries to a netcdf file

**Caution** for large number of timeseries, this operation can exceed your RAM

In [None]:
# Full year example: 2024 (leap year)
start = '2024-01-01 00:00'
end = '2024-01-01 23:59'

# Create datetime index at 1-minute frequency
time = pd.date_range(start=start, end=end, freq='min')

merged_df.index = time
ds = merged_df.stack().reset_index()
ds.columns = ['time', 'location', 'value']
xr_ds = ds.set_index(['location', 'time']).to_xarray()
print(f"Will occupy {xr_ds.nbytes/1e6} MB")
xr_ds.to_netcdf(f"{country_iso}_timeseries_daily_avg.nc")

    


## Analysis RAMP profiles indicator on a map


In [None]:
def highlight_outliers(df, gdf, col,num_outliers=10,ascending=False):
    outliers = df.reset_index().sort_values(by=col, ascending=ascending).iloc[:num_outliers]
    print(outliers[[adm2_col_name, adm1_col_name, col]])
    outliers = outliers[[adm2_col_name, adm1_col_name]]
    single_location = gdf[(gdf[adm2_col_name].isin(outliers[adm2_col_name])) & (gdf[adm1_col_name].isin(outliers[adm1_col_name]))]
    ax = gdf.plot()
    ax = single_location.plot(ax=ax, color='none', edgecolor='red', linewidth=2)
    return ax

def highlight_upper_outliers(df, gdf, col,num_outliers=10):
    return highlight_outliers(df, gdf, col, num_outliers=num_outliers, ascending=False)

def highlight_lower_outliers(df, gdf, col,num_outliers=10):
    return highlight_outliers(df, gdf, col, num_outliers=num_outliers, ascending=True)

gdf = stats_gdf.to_crs(epsg=3857)


def make_chloropleth_map(gdf, col):
    fig, ax = plt.subplots(figsize=(10, 10))
    gdf.plot(
        column=col,
        cmap='OrRd',
        linewidth=0.8,
        edgecolor='0.8',
        legend=True,
        ax=ax
    )
    
    # Style
    plt.title(f"Choropleth Map of {col}")
    plt.axis('off')


## Look at the estimated household numbers

In [None]:
make_chloropleth_map(gdf, "num_hh")

## Look at the cluster from the ML model

In [None]:
col = "cluster"
fig, ax = plt.subplots(figsize=(10, 10))
gdf.plot(
    column=col,
    categorical=True,
    linewidth=0.8,
    edgecolor='0.8',
    legend=True,
    ax=ax
)

plt.title(f"Map of {col}")
plt.axis('off')

## Look at the indicators on a map

In [None]:
make_chloropleth_map(gdf, "hh_mean")

In [None]:
make_chloropleth_map(gdf, "mean")

In [None]:
highlight_upper_outliers(stats_df,gdf.reset_index(),"hh_mean")

In [None]:
highlight_lower_outliers(stats_df,gdf.reset_index(),"hh_mean")

In [None]:
make_chloropleth_map(gdf, "hh_max")

In [None]:
make_chloropleth_map(gdf, "max")

In [None]:
highlight_upper_outliers(stats_df,gdf.reset_index(),"hh_max")

In [None]:
highlight_upper_outliers(stats_df,gdf.reset_index(),"max")

In [None]:
make_chloropleth_map(gdf, "hh_sum")

In [None]:
make_chloropleth_map(gdf, "sum")

In [None]:
make_chloropleth_map(gdf, "cluster")