# Brains4Buildings data extraction and backup

This JupyterLabs notebook can be used download raw data from a Twomes database (see also [more information how to setup a Twomes server](https://github.com/energietransitie/twomes-backoffice-configuration#jupyterlab)).

In particular, it has been set up to get data from the [Brains4Buildings data collection](https://www.energietransitiewindesheim.nl/brains4buildings2022/privacy/index.html).

Don't forget to install the requirements listed in [requirements.txt](../requirements.txt) first!



## Setting the stage

First several imports and variables need to be defined


### Imports and generic settings

In [None]:
from datetime import datetime, timedelta
import pytz
import math
import pylab as plt

import pandas as pd
import numpy as np

# usually, two decimals suffice for displaying DataFrames (NB internally, precision may be higher)
pd.options.display.precision = 2

import sys
sys.path.append('../data/')
sys.path.append('../view/')
sys.path.append('../analysis/')

%load_ext autoreload

%matplotlib widget
from plotter import Plot

from measurements import Measurements
from preprocessor import Preprocessor

from tqdm.notebook import tqdm


import logging
logging.basicConfig(level=logging.INFO, 
                    format='%(asctime)s %(levelname)-8s %(message)s',
                    datefmt='%Y-%m-%d %H:%M:%S',
                    filename='log_b4b.txt',
                   )

### Defining which account, which period 

- which account was used to provision the measurements? 
- the location and timezone is
- from which `start_day` to which `end_day' 

In [None]:
#location: T-building, Windesheim, in Zwolle
lat, lon = 52.4350486, 5.4040816

#timezone: 
timezone_database = 'UTC'
timezone_buildings = 'Europe/Amsterdam'

# Below, the maximum period for data collection
first_day = pytz.timezone(timezone_buildings).localize(datetime(2022, 10, 1))
last_day = pytz.timezone(timezone_buildings).localize(datetime(2022, 11, 2))

# all devices were provisioned by a single account
account = [820921]

b4b_db_properties = [
    'roomTemp',
    'CO2concentration',
    'relativeHumidity',
    'countPresence'
]

device_mapping = {
    'TWOMES-979368': 999169,
    'TWOMES-9799B8': 900846,
    'TWOMES-ACDEF0': 948634,
    'TWOMES-ACEB08': 917810,
    'TWOMES-ACEB4C': 925038
}

property_rename = {
    'CO2concentration': 'co2_ppm',
    'countPresence': 'occupancy_p',
    'relativeHumidity': 'rel_humidity_0',
    'roomTemp': 'temp_in_degC'
}

property_types = {
    'temp_in_degC' : 'float32',
    'co2_ppm' : 'float32',
    'rel_humidity_0' : 'float32',
    'valve_frac_0' : 'float32',
    'door_open_bool': 'Int8',
    'window_open_bool': 'Int8',
    'occupancy_bool': 'Int8',
    'occupancy_p' : 'Int8'
}


## Getting accounts

In [None]:
%%time 
%autoreload 2
df = Measurements.get_accounts_devices(first_day, last_day,
                                       timezone_database, timezone_buildings)

In [None]:
df

## Getting measurements from sources

### Getting measurements from the database

In [None]:
%%time 
%autoreload 2
df_db_meas = (Measurements.get_raw_measurements(
    account,
    first_day, last_day,
    b4b_db_properties,
    timezone_database, timezone_buildings)
           .loc[account[0]]
           .rename(index=device_mapping)
           .rename(index=property_rename)
           .sort_index()
          )

df_db_meas.index.names = ['id', 'source', 'timestamp', 'property']
df_db_meas = df_db_meas.loc[[device_mapping[id] for id in device_mapping.keys()]]
del df_db_meas['unit']
df_db_meas = df_db_meas.astype('float')

In [None]:
df_db_meas.info()

In [None]:
df_db_meas

### Get other measurements

N.B. You need to download [b4b-rawdata.zip from the source](https://liveadminwindesheim.sharepoint.com/:u:/r/sites/O365-Brains4Buildings/Gedeelde%20documenten/General/Windesheim%20as%20Living%20Lab/data-raw-anon/b4b-rawdata.zip?csf=1&web=1&e=M0NX1r) first and save it in the ../data/ folder): 

In [None]:
%%time 
df = pd.read_csv('../data/b4b-rawdata.zip', parse_dates=['timestamp'], index_col=['timezone', 'timestamp']).sort_index(level='timestamp')


df_other_meas = pd.DataFrame()
for tz in df.index.unique(level='timezone'):
    df_other_meas = pd.concat([df_other_meas, df.loc[tz].tz_localize(tz, ambiguous='NaT')])


df_other_meas = df_other_meas.sort_index()

df_other_meas = df_other_meas.loc[df_other_meas.index.dropna()]


In [None]:
df_other_meas.info()

In [None]:
df_other_meas

### Merge database and other measurements

In [None]:
df_measurements = (pd.concat([
    df_db_meas.reset_index(), 
    df_other_meas.reset_index()[['id', 'source', 'timestamp', 'property', 'value']]])
                   .drop_duplicates()
                   .set_index(['id', 'source', 'timestamp', 'property'])
                   .sort_index()
                  )

In [None]:
df_measurements.info()

In [None]:
df_measurements

### Writing raw measurements to a parquet file

In [None]:
%%time 
df_measurements.to_parquet('b4b_raw_measurements.parquet', index=True, engine='pyarrow')

## Unstack properties into separate columns and apply types

In [None]:
df_prop = df_measurements.copy()
if property_types is not None:
    logging.info("Unstacking properties...")
    df_prop = df_prop.unstack()
    df_prop.columns = df_prop.columns.droplevel()



In [None]:
df_prop.info()

In [None]:
logging.info("Changing column types...")
df_prop = df_prop.astype({k:property_types[k] for k in property_types.keys() if k in df_prop.columns})


In [None]:
df_prop.info()

In [None]:
df_prop

### Writing raw properties to a parquet file

In [None]:
%%time 
df_prop.to_parquet('b4b_raw_properties.parquet', index=True, engine='pyarrow')

## Plotting data

In [None]:
df_plot = df_prop

In [None]:
# df_plot.columns =df_plot.columns.to_flat_index().tolist()

In [None]:
df_plot

In [None]:
# This cell can be used to plot one or more properties in one or more rooms for one or more sources, not fully working yet

for id in list(df_plot.index.unique(level='id')):
    for source in list(df_plot.index.unique(level='source')):
        if len(df_plot.loc[id,source]):
            df_plot.loc[id,source].plot(
                # subplots=
                # [
                #     # ('co2_ppm'),
                #      # ('occupancy_p'),
                #      # ('valve_frac_0', 'rel_humidity_0'),
                #      # ('window_open_bool', 'door_open_bool'),
                #      ('temp_in_degC')
                # ],
                style='.--',
                title=f'room: {id}, source: {source}'
               )