# Brains4Buildings interactive inverse grey-box analysis pipeline

This Jupyter Labs notebook can be used to interactively test the Brains4Buildings inverse grey-box analysis pipeline.
Don't forget to install the requirements listed in [requirements.txt](../requirements.txt) first!



## Setting the stage

First several imports and variables need to be defined


### Imports and generic settings

In [None]:
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
# import matplotlib.pyplot as plt

# usually, two decimals suffice for displaying DataFrames (NB internally, precision may be higher)
pd.options.display.precision = 2

from tqdm.notebook import tqdm
from gekko import GEKKO

import sys
sys.path.append('../data/')
sys.path.append('../view/')
sys.path.append('../analysis/')

from preprocessor import Preprocessor
from inversegreyboxmodel import Learner
from plotter import Plot


%load_ext autoreload
%matplotlib inline
%matplotlib widget

import logging
logging.basicConfig(level=logging.ERROR, 
                    format='%(asctime)s %(levelname)-8s %(message)s',
                    datefmt='%Y-%m-%d %H:%M:%S',
                    filename='log_b4b.txt',
                   )

### Load Measured Data from parquet file

In [None]:
%%time
# Prerequisite: for this example to work, you need to have the b4b_raw_properties.parquet, located in the ../data/ folder.
# One way to get this is to run B4BExtractionBackup.ipynb first, but then you have to run this code on the energietransitiewindesheim.nl server

df_prop = pd.read_parquet('../data/b4b_raw_properties.parquet', engine='pyarrow')

#sorting the DataFrame index is needed to get good performance on certain filters
#this guarding code to check whether DataFramews are properly sorted
if not df_prop.index.is_monotonic_increasing:
    print('df needed index sorting')
    df_prop = df_prop.sort_index()  

In [None]:
df_prop.index.unique(level='id').values

In [None]:
df_prop.index.unique(level='source').values

In [None]:
df_prop

In [None]:
df_prop.info()

## Preprocessing

In [None]:
#if this plot does not show up at initial run, run the cell again (something fishy with interactive plotting of DataFrame.plot.hist())
%matplotlib widget
df_prop.co2__ppm.plot.hist(bins=200, alpha=0.5)

In [None]:
df_prep = Preprocessor.preprocess_room_data(df_prop)

In [None]:
df_prep.info()

In [None]:
df_prep.describe()

In [None]:
#if this plot does not show up at initial run, run the cell again (something fishy with interactive plotting of DataFrame.plot.hist())
%matplotlib widget
df_prep['CO2-meter-SCD4x_co2__ppm'].plot.hist(bins=200, alpha=0.5)

## Learn parameters using inverse grey-box analysis

Most of the heavy lifting is done by the `learn_room_parameters()` function, which again uses the [GEKKO Python](https://machinelearning.byu.edu/) dynamic optimization toolkit.

In [None]:
%%time 
%autoreload 2

# learn the model parameters for only a subset of the room ids and write results to a dataframe
filename = '../data/b4b-room-metadata.zip'
df_room_metadata = pd.read_csv(filename, usecols=['id', 'room__m3', 'vent_max__m3_h_1']).set_index(['id'])
col_co2__ppm = 'CO2-meter-SCD4x_co2__ppm'
# col_co2__ppm = 'bms_co2__ppm'
col_occupancy__p = 'CO2-meter-SCD4x_occupancy__p'
# col_occupancy__p = 'xovis_occupancy__p'
col_valve_frac__0 = 'bms_valve_frac__0'

# df_results, df_prep = Learner.learn_room_parameters(df_prep,
df_results, df_prep = Learner.learn_room_parameters(df_prep.loc[[925038,999169]],
                                                    col_co2__ppm = col_co2__ppm, 
                                                    col_occupancy__p = col_occupancy__p, 
                                                    col_valve_frac__0 = col_valve_frac__0,
                                                    df_room_metadata = df_room_metadata,
                                                    learn_period__d = 7, 
                                                    req_col = [col_co2__ppm, col_occupancy__p, col_valve_frac__0],
                                                    sanity_threshold_timedelta = timedelta(hours=24),
                                                    learn_infilt__m2 = True,
                                                    learn_valve_frac__0 = True,
                                                    learn_occupancy__p = False,
                                                    ev_type=2)

### Result Visualization

In [None]:
df_results

In [None]:
df_prep

In [None]:
%autoreload 2
units_to_mathtext = property_types = {
    'degC' : r'$Â°C$',
    'ppm' : r'$ppm$',
    '0' : r'$[-]$',
    'bool': r'$0 = False; 1 = True$',
    'p' : r'$persons$'
}

In [None]:
#Plot all properties from all sources for all ids
input_props = [col_co2__ppm , col_occupancy__p, col_valve_frac__0]
learned_props_frac = input_props + ['sim_co2__ppm', 'learned_valve_frac__0']
learned_props_occupancy = input_props + ['sim_co2__ppm', 'learned_occupancy__p']

In [None]:
df_plot = df_prep[learned_props_frac]

In [None]:
df_plot

In [None]:
Plot.dataframe_preprocessed_plot(df_plot, units_to_mathtext)