# Setting up the empirical-based household load profiles

Goal: Setting up 500 profiles of households, based on empirical or semi-empirical data for the year 2019.

## Components

**Heat pumps**: Water-to-water heat pump load profiles, based on Schlemminger et al. (2022)'s real-world profiles. Enriched to 500 profiles through the approach of Semmelmann et al. (2023).

**Household data**: Finding the corresponding real-world household load profiles from Schlemminger et al. (2022).

**Electric vehicles**: Real-world Norwegian electric vehicle charging profiles from Sørensen et al. (2021), preprocessed in `00 Preprocessing_EV_data.ipynb`. To come up with 500 profiles, load profiles are shifted between [-4,4] weeks. Profiles are shifted by full weeks only to ensure daily integrity.

**BESS sizes**: Power ratings and capacities of household battery energy storage systems (BESS) are taken from Semmelmann et al. (2024, unpublished), based on a realistic sample of German residential households. 

**PV load data**: From renewables.ninja for Hamelin in 2019, the city from which the heat pump and household load profiles are obtained, based on Pfenninger and Staffell (2016). PV sized equal to BESS capacit / h, corresponding to average German households, depicted in Truong et al. (2016). 

## Output

The following output files are generated:

- HP profiles
- HH profiles
- PV profile normed on 1kW rated power
- EV profiles
- Household config: includes for 500 households: A) column of HP and B) HH profile C) BESS capacity (drawn based on distribution) D) BESS rated power (derived from capacity) E) PV power (derived from capacity)


## Sources

- Semmelmann, L., Jaquart, P., & Weinhardt, C. (2023). Generating synthetic load profiles of residential heat pumps: a k-means clustering approach. Energy Informatics, 6(Suppl 1), 37.
- Schlemminger, M., Ohrdes, T., Schneider, E., & Knoop, M. (2022). Dataset on electrical single-family house and heat pump load profiles in Germany. Scientific data, 9(1), 56.
- Semmelmann, L., Konermann, M., Dietze, D., & Staudt, P. (2024). Empirical field evaluation of self-consumption promoting regulation of household battery energy storage systems. Energy Policy, 194, 114343.
- Sørensen, Å. L., Lindberg, K. B., Sartori, I., & Andresen, I. (2021). Residential electric vehicle charging datasets from apartment buildings. Data in Brief, 36, 107105.
- Pfenninger, S., & Staffell, I. (2016). Long-term patterns of European PV output using 30 years of validated hourly reanalysis and satellite data. Energy, 114, 1251-1265.
- Truong, C. N., Naumann, M., Karl, R. C., Müller, M., Jossen, A., & Hesse, H. C. (2016). Economics of residential photovoltaic battery systems in Germany: The case of Tesla’s Powerwall. Batteries, 2(2), 14.


In [1]:
# IMPORTS
import datetime
import random
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:
# GENERAL SETUP
amount_households = 500
households = [] # empty list of dictionaries for household information

In [3]:
# HEAT PUMP LOADS: Iterating over given synthetic household profiles for Hamelin in 2019
df_heat_pumps = pd.read_pickle("./input/2019 Hamelin 500 HP.pkl") # has been previously generated with https://heatpump.ninja/ 
for i in range(amount_households):
    households.append({"heat_pump_profile":df_heat_pumps.columns[i],"hp_yearly_consumption":df_heat_pumps[df_heat_pumps.columns[i]].sum()/1000,"household_profile":df_heat_pumps.columns[i]})

In [None]:
# HOUSEHOLD LOADS
'''
The underlying heat pump loads are based on Schlemminger et al. (2022). 
However, they are randomly shuffled according to observed temperatures, following the methodology from Semmelmann et al. (2023).
Hence, in this cell, the corresponding household loads from the observed days are matched and inserted in their own dataframe.
'''

df_households_source = pd.read_pickle("./input/1920 Final Data w. Additional Features HH Hourly Agg.pkl")
df_heat_pumps_source = pd.read_pickle("./input/1920 Final Data w. Additional Features HP Hourly Agg.pkl")
df_heat_pumps_source_24th_rows = df_heat_pumps_source.iloc[::24, :]

heat_pump_loads_target = df_heat_pumps
heat_pump_loads_target_24th_rows = heat_pump_loads_target.iloc[::24, :]

df_households_target = df_heat_pumps.copy() # copy heat pump dataframe as target

for total_col in df_households_target.columns: # iterating over the heat pump loads, which are associated with specific households
    print(total_col)
    heat_pump_loads_target_24th_rows_spec = heat_pump_loads_target_24th_rows[total_col]
    col = "SFH"+total_col.split("SFH")[1]
    for i in range(0, len(heat_pump_loads_target_24th_rows_spec)): # look up on which day the heat pump loads were observed
        first_hp_load = heat_pump_loads_target_24th_rows_spec.iloc[i]
        for idx,val in enumerate(df_heat_pumps_source_24th_rows[col].values):
            if first_hp_load == val:
                target_values = df_households_source[col].iloc[(idx*24):(idx+1)*24] # find the corresponding household loads
                df_households_target.loc[i*24:(i+1)*24,total_col] = target_values.values # insert them to dataframe

df_households_target.to_pickle("./input/preprocessed/2019 Hamelin 500 HH.pkl")

In [None]:
# Enriching the household characteristics dictionary with information: calculate households consumption and aggregate consumption:
for household in households:
    yearly_sum_hh = df_households_target[household["household_profile"]].sum()/1000
    aggregate_consumption = household["hp_yearly_consumption"]+yearly_sum_hh
    household = household.update({"hh_yearly_consumption":yearly_sum_hh,"total_yearly_consumption":aggregate_consumption})

In [None]:
# ELECTRIC VEHICLE LOADS: Matching EV profiles to households and saving the shifted profiles
# Read EV data with multi-column index
ev_df = pd.read_csv("./input/preprocessed/Hourly_EV_Charging.csv", sep=";", header=[0,1], index_col=0)
ev_df.index = pd.to_datetime(ev_df.index)
print("Number of rows in ev_df: ", len(ev_df))

# Determine unique users
ev_users = ev_df.columns.get_level_values(0).unique()
print("EV user IDs: ", list(ev_users))

In [None]:
# Match an EV profile to each household and determine by how many weeks the EV profile is shifted
for household in households:
    shift_weeks = random.randint(-4, 4)
    household = household.update({"ev_col": random.choice(ev_users), "ev_shift_weeks": shift_weeks})

In [None]:
# Create a new dataframe with the shifted EV profiles
assert (ev_df.index == df_heat_pumps.index).all(), "Indices of EV and heat pump dataframes do not match"
level_0 = df_heat_pumps.columns
level_1 = ev_df.columns.get_level_values(1).unique()
df_ev_target = pd.DataFrame(index=ev_df.index, columns=pd.MultiIndex.from_product([level_0, level_1], names=["household", "ev_info"]))

# Iterate over all households
for idx, col in enumerate(level_0):
    ev_col = households[idx]["ev_col"]
    time_steps_shift = households[idx]["ev_shift_weeks"] * 7 * 24  # integer number of time steps to shift the EV profile
    idx_shift = ev_df.index[-time_steps_shift]  # corresponding time stamp at which the EV profile is split

    # EV profile of the matched user
    ev_df_u = ev_df.loc[:, ev_col].copy()  # copy the EV profile to avoid modifying the original dataframe

    # Check whether the split due to shifting is in the middle of a charging session;
    # this charging session will be neglected;
    # reasoning: if we split it, there may be a high charging peak in time step 0 in the uncontrolled case (all EVs for which charging session has been split are charged at the same time)
    if ev_df_u.loc[idx_shift, "Wh"] > 0 and ev_df_u.loc[idx_shift, "start"] == 0:
        # Find the start and end of the charging session
        idx_start = ev_df_u.loc[:idx_shift][ev_df_u.loc[:idx_shift, "start"] == 1].index[-1]
        idx_end = idx_start + datetime.timedelta(hours=ev_df_u.loc[idx_start, "hours_until_end"])
        # Set the charging session to zero
        ev_df_u.loc[idx_start:idx_end] = 0  # idx_end is included

    shifted_ev_df = pd.concat([ev_df_u[idx_shift:], ev_df_u[:idx_shift - datetime.timedelta(hours=1)]]).reset_index(drop=True)
    df_ev_target[col] = shifted_ev_df.values

In [None]:
# Save the shifted EV profiles
df_ev_target = df_ev_target.astype(np.float32) # reduce memory usage
df_ev_target.to_pickle("./input/preprocessed/2019 Hamelin 500 EV.pkl")

In [None]:
# Turn household characteristics into a dataframe
config = pd.DataFrame(households)
config

In [None]:
# BESS CAPACITY AND PV POWER: Draw from distribution BESS capacity, PV sized at BESS Capacity / 1h

capacities = [2.5, 5, 7.5, 10] # in kWh
probabilities = [6.7, 37.2, 31.5, 24.6] # in percent, distributions from field study in Semmelmann et al. (2024)


def draw_bess_capacity(capacities, probabilities):
    total = sum(probabilities)
    normalized_probabilities = [p / total for p in probabilities]

    return random.choices(capacities, weights=normalized_probabilities, k=1)[0]

drawn_capacities = np.full(amount_households,0) # create an array of drawn capacities

for i in range(amount_households):
    capacity = draw_bess_capacity(capacities, probabilities)
    drawn_capacities[i] = capacity

drawn_capacities = np.sort(drawn_capacities)[::-1] # sort randomly drawn BESS capacities in an ascending order

In [None]:
# To realistically match pv and bess sizes to households, we sort the households by their total hh + hp consumption and then match pv sizes
config.sort_values("total_yearly_consumption",inplace=True,ascending=False)

average_c_rate = 0.41

config["bess_capacity"] = drawn_capacities
config["bess_power"] = config["bess_capacity"]*average_c_rate
config["pv_power"] = drawn_capacities

config

In [None]:
# Save final household configuration
config.to_pickle("./input/preprocessed/2019 Hamelin Household Configuration.pkl")

# Visualization of Case Setup

In [None]:
# AGGREGATED DATA OVER ONE YEAR
df_ev = pd.read_pickle("./input/2019 Hamelin 500 EV.pkl")
df_hh = pd.read_pickle("./input/2019 Hamelin 500 HH.pkl")
df_hp = pd.read_pickle("./input/2019 Hamelin 500 HP.pkl")

In [None]:
# Transforming multi level ev data
wh_columns = [col for col in df_ev.columns if 'Wh' in col[1]]  
wh_data = df_ev[wh_columns]
wh_data_sum = wh_data.sum(axis=1)

In [None]:
# Change where ending plot in jan 2020 is avoided
plt.rcParams.update({'font.size': 17})
plt.rcParams.update({'axes.grid': False})  # Disable the grid

consumption_df = pd.DataFrame()
consumption_df.index = pd.to_datetime(df_hh.index)
consumption_df["Households"] = df_hh.sum(axis=1).values
consumption_df["Electric vehicles"] = wh_data_sum
consumption_df["Heat pumps"] = df_hp.sum(axis=1).values
consumption_df = consumption_df.applymap(lambda x: x / 1000000)
weekly_consumption = consumption_df.resample("W").sum()
weekly_consumption = weekly_consumption[weekly_consumption.index <= "2019-12-31"]
weekly_consumption = weekly_consumption[weekly_consumption.index >= "2019-01-01"]

ax = weekly_consumption.plot.area(color=["dimgray", "limegreen", "dodgerblue"], figsize=(12, 5))
ax.set_ylabel("Weekly Energy Demand [MWh]", fontsize=17)
ax.set_xlabel("Time", fontsize=17)
ax.set_xlim([weekly_consumption.index.min(), weekly_consumption.index.max()])
ax.tick_params(axis='both', which='major', labelsize=17)
ax.legend(fontsize=17)

plt.show()

In [None]:
# ORIGINAL CONSUMPTION ON THE FOUR TYPE DAYS
plt.rcParams.update({'font.size': 17})
dates = ['2019-03-03', '2019-11-18', '2019-12-01', '2019-05-13']
titles = ['Peak inflexible load day', 'Peak electric vehicle day', 'Peak heat pump day', 'Peak feed-in day']

fig, axes = plt.subplots(2, 2, figsize=(20, 12))  # 2x2 grid of subplots

for i, (date, title) in enumerate(zip(dates, titles)):
    day_start = pd.to_datetime(date).tz_localize('UTC')
    day_end = day_start + pd.Timedelta(days=1) - pd.Timedelta(seconds=1)
    
    day_data = consumption_df.loc[day_start:day_end]
    ax = axes[i // 2, i % 2] 
    day_data.plot.area(ax=ax, color=["dimgray", "limegreen", "dodgerblue"], alpha=0.8)

    ax.set_title(title, fontsize=17)
    ax.set_ylabel("Hourly Energy Demand [MWh]", fontsize=17)
    ax.set_xlabel("Time", fontsize=17)
    ax.set_ylim([0, 1.5])
    ax.grid(False)  

plt.tight_layout()
plt.show()