# Twomes interactive inverse grey-box analysis pipeline

This Jupyter Labs notebook can be used to interactively test the Twomes inverse grey-box analysis pipeline, accessing data from a Twomes database (see also [more information how to setup a Twomes server](https://github.com/energietransitie/twomes-backoffice-configuration#jupyterlab)).
Don't forget to install the requirements listed in [requirements.txt](../requirements.txt) first!



## Setting the stage

First several imports and variables need to be defined


### Imports and generic settings

In [None]:
from datetime import datetime, timedelta
import pytz
import math
import pylab as plt

import pandas as pd
import numpy as np

import sys
sys.path.append('../data/')
sys.path.append('../view/')
sys.path.append('../analysis/')

%load_ext autoreload

%matplotlib widget
from plotter import Plot
from filewriter import ExcelWriter as ex

from extractor import WeatherExtractor, Extractor, Period

from inversegreyboxmodel import Learner

import logging
logger = logging.getLogger('Twomes data extraction')
logger.setLevel(logging.NOTSET)

### Analysis settings

- which `moving_horizon_duration` should be used for the annalysis
- and various other global parameters

In [None]:
n_std_outliers = 3.0 # default for the multiplier of the the standard deviation; further out than this times the std, outliers are removed during preprocessing
up_intv = '5min' # the default upsampling interval that is used before interpolation is done
gap_n_intv = 11 # the default maximum number of consecutive NaNs to fill(one for each upsampling interval), i.e. valid measurement values (11+1)* 5 min = 1 hour apart apart will be bridget by interpolation, but not more
sampling_interval = '15min' # the default interval on which interpolation will be done during preprocessing
moving_horizon_duration_d = 7
required_columns_for_sanity = ['home_id', 'T_out_e_avg_C', 'irradiation_hor_avg_W_p_m2', 'T_in_avg_C', 'gas_sup_avg_W', 'e_remaining_heat_avg_W', 'interval_s']
sanity_threshold = 0.9

### Defining which homes, which period 

- which `homes` should be analysed
- what the location and timezone is of those homes (currently, we only support one location and timezone for a batch of homes) 
- from which `start_day` to which `end_day'  the analysis should run

In [None]:
#location: center of Assendorp neighbourhood in Zwolle
lat, lon = 52.50655, 6.09961

#timezone: 
timezone_database = 'UTC'
timezone_homes = 'Europe/Amsterdam'

# # # Below, the maximum period for data collection
# first_day = pytz.timezone(timezone_homes).localize(datetime(2021, 10, 25))
# last_day = pytz.timezone(timezone_homes).localize(datetime(2022, 5, 8))

# Alternatively, you may want to test things only on a three week periode. This is a period with suitable weather and lots of homes with measurements.
first_day = pytz.timezone(timezone_homes).localize(datetime(2022, 1, 3))
last_day = pytz.timezone(timezone_homes).localize(datetime(2022, 1, 31))

# The full set of homes
# homes = [803422, 805164, 809743, 811308, 815925, 817341, 822479, 829947, 830088, 831062, 839440, 845966, 845997, 846697, 857477, 864296, 873985, 879481, 881611, 886307, 895671, 897349, 899510]

# # A subset of homes
# homes = [803422, 805164, 809743]

# single home for virtual homes
homes = [886307]


## Loading and geospatial interpolation of Dutch weather data

Using an external library installaed via [requirements.txt](../requirements.txt), load and geospatially interpolate Dutch weather data


In [None]:
%%time 
%autoreload 2
# get geospatially interpolated weather from KNMI
# for Twomes, the Weather for all all homes studies can be approached by a single location
# get the dataframe only once for all homes to save time
tz_knmi='Europe/Amsterdam'

df_weather = WeatherExtractor.get_interpolated_weather_nl(first_day, last_day, lat, lon, tz_knmi, timezone_homes, sampling_interval)

### Check descriptive statisctics about the weather data

In [None]:
df_weather.describe(include='all')

### Plot weather data

N.B. The resulting figure below can be manipulated interactively; hover with mouse for tips & tricks

In [None]:
logger.setLevel(logging.NOTSET)
Plot.temperature_and_power_one_home_plot('Weather in Assendorp, Zwolle',
                                df_weather,
                                temp_plot_dict = {'T_out_avg_C': 'orange', 'wind_avg_m_p_s': 'c', 'T_out_e_avg_C': 'b'},
                                temp_plot_2nd_list = ['wind_avg_m_p_s'],
                                power_plot_dict = {'irradiation_hor_avg_W_p_m2': 'y'},
                                power_plot_2nd_list = ['irradiation_hor_avg_W_p_m2']
                               )

## Getting time-interpolated home data from the Twomes database and combine with weather data

In [None]:
%%time 


logger.setLevel(logging.INFO)

df_data_homes = Extractor.get_preprocessed_homes_data(homes, first_day, last_day, timezone_database, timezone_homes,
                                                      up_intv, gap_n_intv, sampling_interval, 
                                                      df_weather)
logger.setLevel(logging.NOTSET)


### Optional block to write interpolated data to a file

In [None]:
# filename_prefix = datetime.now().astimezone(pytz.timezone('Europe/Amsterdam')).replace(microsecond=0).isoformat().replace(":","")
# ex.write(df_data_homes, str('{0}-data_homes-{1}-{2}.xlsx'.format(filename_prefix, first_day.isoformat(),last_day.isoformat())))
# Extractor.write_home_data_to_csv(df_data_homes, str('{0}-data_homes-{1}-{2}.csv'.format(filename_prefix, first_day.isoformat(),last_day.isoformat())))

### Optional block to get interpolated data from virtual homes in CSV files and combine with weather data already obtained


In [None]:
# %%time 
# %autoreload 2
# logger.setLevel(logging.INFO)

# homes = [
#     60200, 
#     120100, 
#     150080, 
#     150100, 
#     200060, 
#     300040, 
#     400030, 
#     600020 
# ]

# # For virtual homes, only the following period is valid:
# first_day = pytz.timezone(timezone_homes).localize(datetime(2022, 1, 3))
# last_day = pytz.timezone(timezone_homes).localize(datetime(2022, 1, 24))

# df_data_homes = pd.DataFrame()
# for home_id in homes:
#     df_data_homes = pd.concat([df_data_homes, Extractor.get_virtual_home_data_csv(str('../data/virtualhome_P{0}.csv'.format(home_id)), timezone_homes)], axis=0)

# logger.setLevel(logging.NOTSET)


In [None]:
df_data_homes

### Present sanity metrics for the extracted data

### Optional block to write the extracted data to a CSV file

N.B. In a future version we consider using the Apache Parquet format.

In [None]:
# %%time 
# from tqdm import tqdm_notebook

# %autoreload 2
# filename_prefix = datetime.now().astimezone(pytz.timezone('Europe/Amsterdam')).replace(microsecond=0).isoformat().replace(":","")

# first_day = pytz.timezone(timezone_homes).localize(datetime(2021, 10, 25))
# last_day = pytz.timezone(timezone_homes).localize(datetime(2022, 5, 8))

# df_rawdata = pd.DataFrame()
# home_iterator = tqdm_notebook(homes)

# for home_id in home_iterator:
#     # print('Processing ', home_id)
#     extractor = Extractor(home_id, Period(first_day, last_day))
#     df_rawdata = extractor.get_rawdata()
#     df_rawdata.describe(include='all')
#     Extractor.write_raw_data_to_csv(df_rawdata, str('{0}-rawdata_P{1}-{2}-{3}.csv'.format(filename_prefix, home_id, first_day.isoformat(),last_day.isoformat())))



## Learn parameters using inverse grey-box analysis

Most of the heavy lifting is done by the `learn_home_parameter_moving_horizon()` function, which again uses the [GEKKO Python](https://machinelearning.byu.edu/) dynamic optimization toolkit.

In [None]:
%%time 
%autoreload 2



# Use one of the lines below to set the moving horizon duration used for analysis 
# moving_horizon_duration_d_analysis = 14
moving_horizon_duration_d_analysis = moving_horizon_duration_d


# learn the model parameters and write rerults an intermediate results to excel files
df_results_model_parameters, df_results_tempsim = Learner.learn_home_parameter_moving_horizon(df_data_homes, 
                                                         n_std_outliers, up_intv, gap_n_intv, sampling_interval, 
                                                         moving_horizon_duration_d_analysis, 
                                                         req_col = required_columns_for_sanity, sanity_threshold = sanity_threshold,
                                                         hint_A_m2=None, ev_type=2)



## Show the results

### Show learned model parameters

#### Show table of all learned model parameters of all homes

In [None]:
df_results_model_parameters

#### Visualize results of all learned model parameters of all homes in one plot

In [None]:
(df_results_model_parameters
 ['H_W_p_K']
 .reorder_levels(['start_horizon', 'home_id'])
 .unstack()
 .plot(kind='box', 
       rot=90, 
       title='H_W_p_K')
)

In [None]:
(df_results_model_parameters
 ['tau_h']
 .reorder_levels(['start_horizon', 'home_id'])
 .unstack()
 .plot(kind='box', 
       rot=90,
       title='tau_h')
)

In [None]:
# df_results_model_parameters['C_Wh_p_K'] = df_results_model_parameters['H_W_p_K'] * df_results_model_parameters['tau_h']

In [None]:
(df_results_model_parameters
 ['C_Wh_p_K']
 .reorder_levels(['start_horizon', 'home_id'])
 .unstack()
 .plot(kind='box', 
       rot=90, 
       title='C_Wh_p_K')
)

#### Visualize results of all learned model parameters by week for each home multiple plots

In [None]:
%autoreload 2

Plot.learned_parameters_plot(df_results_model_parameters)

### Show best fitting simulated temperatures and power flows

#### Show table of best fitting simulated temperatures


In [None]:
df_results_tempsim

#### Show a plot with the best fitting simulated temperatures and power flows

In [None]:
%autoreload 2

Plot.temperature_and_power_plot(df_results_tempsim,
                                temp_plot_dict = {'T_out_avg_C': 'orange', 'wind_avg_m_p_s': 'c', 'T_out_e_avg_C': 'b', 'T_in_avg_C': 'red', 'T_set_first_C': 'pink', 'T_in_sim_avg_C': 'green'},
                                temp_plot_2nd_list = ['wind_avg_m_p_s'],
                                power_plot_dict = {'irradiation_hor_avg_W_p_m2': 'y', 'gas_sup_CH_avg_W': 'brown'},
                                power_plot_2nd_list = ['irradiation_hor_avg_W_p_m2']
                               )

#### Plot a series of weeks for a single home homes

In [None]:
%autoreload 2
home_id = 886307

Plot.temperature_and_power_one_home_weekly_plot(home_id,
                                                df_results_tempsim.loc[home_id],
                                                sanity_threshold = sanity_threshold,
                                                temp_plot_dict = {'T_out_avg_C': 'orange', 'wind_avg_m_p_s': 'c', 'T_out_e_avg_C': 'b', 'T_in_avg_C': 'red', 'T_set_first_C': 'pink', 'T_in_sim_avg_C': 'green'},
                                                temp_plot_2nd_list = ['wind_avg_m_p_s'],
                                                power_plot_dict = {'irradiation_hor_avg_W_p_m2': 'y', 'gas_sup_CH_avg_W': 'brown'},
                                                power_plot_2nd_list = ['irradiation_hor_avg_W_p_m2']
                                               )   