<center>
    <font size=5>Home and Cabin 
        Power Consumption 2021</font>
</center>

##### Sources:

# General
Notebook to analyse electricity consumption in my parents house and in the cottage.

<u><font size=4>Motivation / Objective:</font></u>
* Investigate the dataset using interactive plotting tools in python
* Catch trends seasonal and cyclic patterns in data
* Forecast power consumption based on time-series data

<u><font size=4>Data:</font></u>
* **Data type:** Tabular Data
    Hourly Power Consumption Dataset:
    * Data source: Eesti Energia AS, Estonian main electricity prowider company
    * Data download date: 25.01.2021
    * Data range: 01.01.2021 00:00 - 01.01.2022 00:00
    * Data given: hourly consumption rate in **kwh** - kilotwatt-hours

    Monthly Power Consumption Dataset:
    * Data source: Eesti Energia AS, 
    * Data download date: 25.01.2021
    * Data range 2020-2022
    * Monthly consumption summary statistics for years 2020 & 2021:
        * Daily 
        * Nightly
        * Total
* **Problem Type:** Predict Power consumption Supervised Time-Series Regresion

# Imports

In [48]:
import pandas as pd
import numpy as np
import re

import pandas_bokeh 

from bokeh.io import curdoc
from bokeh.plotting import figure, show

from bokeh_plots import density_hist

from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.utils.validation import check_is_fitted

In [16]:
pandas_bokeh.output_notebook()

# default plot theme for bokeh
curdoc().theme = 'dark_minimal'

# Data

## Classes, functions

In [17]:
## CLASSES ##
class DFDtypeMapper(BaseEstimator, TransformerMixin):
    """Remap pandas dataframe dtypes.
    Parameters
    ----------
    dtype_dict : dict, {'dtype':[col_name]}
        Dictionary of dtypes as keys and values as list of column names. 
    
    Returns
    -------
    DataFrame : pd.DataFrame"""
    def __init__(self, dtype_dict : dict):
        self.dtype_dict = dtype_dict
        self.transformed_column_names = None 
    
    def fit(self, X, y=None):
        self.all_columns_ = X.columns
        return self
    
    def get_feature_names_out(self, input_features=None) -> np.ndarray:
        check_is_fitted(self)
        return self.all_columns_
    
    def transform(self, X, y=None) -> pd.DataFrame:
        X_ = X.copy()
        # remove columns that are not in X
        _dtype_dict = {}
        for dtype, val in self.dtype_dict.items():
            if isinstance(val, str):
                if val in X_.columns: 
                    _dtype_dict[dtype] = val
            elif type(val) not in [tuple, list, np.ndarray]:
                raise ValueError(f'Wrong type for {self.dtype_dict} value.')
            else:
                _dtype_dict[dtype] = [col for col in val 
                                           if col in X_.columns]
        
        for dtype in _dtype_dict:
            X_[_dtype_dict[dtype]] = X_[_dtype_dict[dtype]].astype(dtype)
        
        return X_

class DFValueMapper(BaseEstimator, TransformerMixin):
    """Rename values in column based on dictionary.
    Parameters
    ----------
    map_dict : dict 
        Dictionary of old mappings to new.
    cat_only : bool, default True
        - If True: consider category dtype columns only
        - If False: apply to all columns. Computationally more expensve.
    
    Returns
    -------
    DataFrame : pd.DataFrame
        Remapped pandas DataFrame."""
    def __init__(self, map_dict : dict, cat_only=True):
        self.cat_only = cat_only
        self.map_dict = map_dict
    def fit(self, X, y=None):
        self.all_columns_ = X.columns
        return self
    def get_feature_names_out(self, input_features=None) -> np.ndarray:
        check_is_fitted(self)
        return self.all_columns_
    def transform(self, X, y=None) -> pd.DataFrame:
        X_ = X.copy()
        # categorical features
        if self.cat_only:
            cat_cols = X_.columns[(X_.dtypes == 'category').values]
            X_[cat_cols] = X_[cat_cols].apply(
                lambda x: x.cat.rename_categories(self.map_dict))
            return X_
        else:
            return X_.replace(self.map_dict)

## FUNCTIONS ##
def datetime_gaps(df : pd.DataFrame, column : str, freq='D'):
    """Display time series frequencies and gaps.
    
    Parameters
    ----------
    column : str, DataFrame column or index name.
    freq : str, default 'D'
        Predominant frequency of the datetime column/index."""
    
    df = df.reset_index()
    date_range = pd.date_range(df[column][0], df[column].iloc[-1], freq=freq)
    df[column] = df[column].astype(f"period[{freq}]")
    
    # find frequencies
    temp = df.groupby([column]).sum().reset_index()
    freqs = (temp.loc[:,column]# frequencies
        .diff()
        .value_counts(dropna=False)
        .to_frame())
    print(f"Frequencies")
    display(freqs)
    
    # find gaps
    gaps = date_range.difference(df[column])
    if len(gaps) == 0:
        print(f"No gaps in {column}.")
    else:
        print(f"{len(gaps)} gaps in datetime:")
        return gaps



## Hourly Usage

In [18]:
# load the data
hourly = pd.read_csv(
    'data/tarbimine_tund.csv', 
    header=4, sep=';',
    index_col=False,
    names=['start', 'end', 'cabin', 'home'],
    parse_dates=['start', 'end'],
    decimal=',')

hourly.head(3)

Unnamed: 0,start,end,cabin,home
0,2021-01-01 00:00:00,2021-01-01 01:00:00,0.16,0.86
1,2021-01-01 01:00:00,2021-01-01 02:00:00,0.12,0.737
2,2021-01-01 02:00:00,2021-01-01 03:00:00,0.12,1.377


In [19]:
hourly.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8760 entries, 0 to 8759
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   start   8760 non-null   datetime64[ns]
 1   end     8760 non-null   datetime64[ns]
 2   cabin   8759 non-null   float64       
 3   home    8759 non-null   float64       
dtypes: datetime64[ns](2), float64(2)
memory usage: 273.9 KB


## Monthly Usage
Data in summarized tabular form:
* 3 location chunks:
    * cabin
    * home
    * both summarized
* 3 year chuncks per location
    * 2020 fully
    * 2021 fully
    * 2022 partially
* Each year with monthly index and overall summary in the end.

In [20]:
# read in the data with location headings
monthly_use = pd.read_csv(
    'data/tarbimine.csv', 
    header=None, sep=';',
    skiprows=4,
    index_col=False,
    decimal=',',
    names=['month', 'day', 'night', 'total'])

monthly_use.head()

Unnamed: 0,month,day,night,total
0,Mõõtepunkti aadress: Sutu,,,
1,Tarbimine 2020. aastal,,,
2,,Päev (kWh),Öö (kWh),Kokku (kWh)
3,Jaanuar,39153,40772,79925
4,Veebruar,31930,38141,70071


### Clean & Reshape

In [21]:
temp = monthly_use.copy()

# references where to break the temp
locs = ['Sutu', 'Kuressaare', 'summeeritud']

# extract location
temp['locale'] = temp.month.str.extract(fr"({locs[0]}$|{locs[1]}$|{locs[2]})")
temp['locale'] = temp.locale.fillna(method='ffill')
temp['locale'] = temp.locale.rename({'Sutu':'cabin',
                                 'Kuressaare':'home',
                                 'summeeritud': 'total'})
# extract year
temp['year'] = temp.month.str.extract(r"^Tarbimine\s(\d{4})\.\saastal")
temp['year'] = temp.year.fillna(method='ffill')

# drop unneccessary rows
temp.dropna(inplace=True) # get rid of references

# convert comma decimals to points
temp.loc[:, 'day':'total'] = (
    temp.loc[:, 'day':'total']
    .apply(lambda x: x.str.replace(',', '.')))

# conversion dictionaries
dtype_dct = {'category':['month', 'locale', 'year'],
             'float':['day', 'night', 'total']}
map_dct = {  'Jaanuar':'Jan',
             'Veebruar': 'Feb',
             'Märts': 'Mar',
             'Aprill': 'Apr',
             'Mai': 'May',
             'Juuni': 'Jun',
             'Juuli': 'Jul',
             'August': 'Aug',
             'September': 'Sep',
             'Oktoober': 'Oct',
             'November': 'Nov',
             'Detsember': 'Dec',
             'Aasta kokku': 'Yearly',
             'Sutu': 'cabin',
             'Kuressaare': 'home',
             'summeeritud': 'total'}

# convert dtypes & remap values
temp = DFDtypeMapper(dtype_dct).fit_transform(temp)
temp = DFValueMapper(map_dct).fit_transform(temp)
monthly_stacked = temp.reset_index(drop=True)
monthly_stacked.head(3)

Unnamed: 0,month,day,night,total,locale,year
0,Jan,39.153,40.772,79.925,cabin,2020
1,Feb,31.93,38.141,70.071,cabin,2020
2,Mar,27.137,31.569,58.706,cabin,2020


In [22]:
# write cleaned df to csv
# monthly_stacked.to_csv('data/monthly_useage_clean.csv', index=False)

In [23]:
monthly = (
    monthly_stacked
    .query("year in ['2020','2021']")
    .set_index(['year', 'locale', 'month'])
    .sort_index())
monthly

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,day,night,total
year,locale,month,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2020,home,Yearly,2598.340,2000.977,4599.317
2020,home,Apr,183.016,140.080,323.096
2020,home,Aug,207.633,164.666,372.299
2020,home,Dec,289.374,176.477,465.851
2020,home,Jan,211.679,162.732,374.411
...,...,...,...,...,...
2021,total,Mar,315.632,228.950,544.582
2021,total,Nov,310.882,187.765,498.647
2021,total,Oct,202.362,196.871,399.233
2021,total,Sep,200.281,163.210,363.491


## Holidays

In [24]:
holidays = pd.read_csv('data/holidays_estonia_2021.csv',
                       parse_dates=['date'])
holidays.head(3)

Unnamed: 0,date,description,type
0,2021-01-01,Uusaasta,"Riigipüha, puhkepäev"
1,2021-02-24,"Iseseisvuspäev, Eesti Vabariigi aastapäev","Rahvuspüha, puhkepäev"
2,2021-04-02,Suur reede,"Riigipüha, puhkepäev"


In [25]:
holidays.dtypes

date           datetime64[ns]
description            object
type                   object
dtype: object

# EDA

## Validate
##### Summary Stats
Validate summary statistics in <code>monthly</code> df.

In [26]:
hourly[['cabin', 'home']].sum()

cabin    1382.036
home     4663.473
dtype: float64

In [27]:
monthly.loc['2021',['cabin', 'home'],'Yearly'].total

year  locale  month 
2021  cabin   Yearly    1382.036
      home    Yearly    4663.473
Name: total, dtype: float64

## NaN-s

In [28]:
hourly_eda = hourly.copy()
hourly_eda[hourly_eda.isna().any(axis='columns')]

Unnamed: 0,start,end,cabin,home
2067,2021-03-28 03:00:00,2021-03-28 04:00:00,,


That NaN correspond to the switch from wintertime to daylight saving time in Estonia.

In [29]:
temp = hourly.copy()
temp = temp.set_index('end')[['cabin', 'home']]

# check duplicates
print(f"Has duplicates: {temp.index.has_duplicates}")

# potential hours missing in the data
print(f"n_rows: {temp.shape[0]}")
temp.index = temp.index.to_period(freq='H')
temp = temp.reset_index().groupby(['end']).cabin.sum().reset_index()
temp.end.diff().value_counts(dropna=False)

Has duplicates: False
n_rows: 8760


<Hour>    8759
NaT          1
Name: end, dtype: int64

No hours missing from the data. Check time series around when switching to wintertime at 4:00, last sunday in October. The the clock is turned back an hour.

In [30]:
temp = hourly.copy().set_index('start')
temp = temp[(temp.index.month==10) & # october
            (temp.index.weekday==6) & # sunday
            ((temp.index.hour>0) & (temp.index.hour<8))] # between 1am-8am
temp[temp.index.day == temp.index.day.max()].reset_index()

Unnamed: 0,start,end,cabin,home
0,2021-10-31 01:00:00,2021-10-31 02:00:00,0.607,0.171
1,2021-10-31 02:00:00,2021-10-31 03:00:00,0.095,0.138
2,2021-10-31 03:00:00,2021-10-31 04:00:00,0.138,0.958
3,2021-10-31 04:00:00,2021-10-31 05:00:00,0.153,0.174
4,2021-10-31 05:00:00,2021-10-31 06:00:00,0.062,0.137
5,2021-10-31 06:00:00,2021-10-31 07:00:00,0.056,0.153
6,2021-10-31 07:00:00,2021-10-31 08:00:00,0.106,0.864


Since there are no duplicated entries in the index we can assume that from 4am-5am holds summed data for 1 hour of sumemrtime and 1 hour of wintertime.

## Feature Engineering
In order to inspect the Time Series data we'are going to add some basic time-related information.

In [31]:
hourly_eda = hourly.copy()
hourly_eda = (
    hourly_eda
    .fillna(0)
    .rename({'end':'time'}, axis='columns')
    .set_index('time') # time as index
    .loc[:, 'cabin': 'home']) # drop start time

# add features
hourly_eda['month'] = hourly_eda.index.month
hourly_eda['day'] = hourly_eda.index.day
hourly_eda['hour'] = hourly_eda.index.hour
hourly_eda['day_of_week'] = hourly_eda.index.weekday
hourly_eda['is_weekend'] = hourly_eda.day_of_week > 4
hourly_eda['is_winter'] = (hourly_eda.month > 11) | (hourly_eda.month < 4)
hourly_eda['is_summer'] = (hourly_eda.month > 5) | (hourly_eda.month < 9)
hourly_eda.head(1)

Unnamed: 0_level_0,cabin,home,month,day,hour,day_of_week,is_weekend,is_winter,is_summer
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2021-01-01 01:00:00,0.16,0.86,1,1,1,4,False,True,True


##### Summer/Winter Time
* Transition to summertime (DST) : Last Sunday in March at 3:00
* Transition to wintertime : Last Sunday in October at 4:00

In [32]:
# last sunday in march at 3am
to_summer_time = (
    hourly_eda
    .query("month == 3 & day_of_week == 6 & hour == 4")
    .index.max())

# last sunday in october at 4am
to_winter_time = (
    hourly_eda
    .query("month == 10 & day_of_week == 6 & hour == 4")
    .index.max())

hourly_eda['is_dst'] = False
hourly_eda.loc[to_summer_time:to_winter_time, 'is_dst'] = True
hourly_eda.head(1)

Unnamed: 0_level_0,cabin,home,month,day,hour,day_of_week,is_weekend,is_winter,is_summer,is_dst
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2021-01-01 01:00:00,0.16,0.86,1,1,1,4,False,True,True,False


##### Day & Night Rate
**Day rate:** 
* 7-23 during wintertime
* 8-24 during daylight saving time (DST)

**Night rate:**
* 23-7 during wintertime
* 24-8 during summertime
* during national holidays if it does not land on weekday

In [33]:
temp = hourly_eda.copy()

temp['rate'] = 'day'
temp.loc[
    temp.query("(hour <= 7 | hour > 23) & (is_dst == False)").index,
    'rate'] = 'night'
temp.loc[temp.query("hour <= 8 & is_dst == True").index, 'rate'] = 'night'
temp.loc[temp.query("is_weekend == True").index, 'rate'] = 'night'

hourly_eda = temp
hourly_eda.head(1)

Unnamed: 0_level_0,cabin,home,month,day,hour,day_of_week,is_weekend,is_winter,is_summer,is_dst,rate
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2021-01-01 01:00:00,0.16,0.86,1,1,1,4,False,True,True,False,night


##### Holidays

In [34]:
holidays.head()

Unnamed: 0,date,description,type
0,2021-01-01,Uusaasta,"Riigipüha, puhkepäev"
1,2021-02-24,"Iseseisvuspäev, Eesti Vabariigi aastapäev","Rahvuspüha, puhkepäev"
2,2021-04-02,Suur reede,"Riigipüha, puhkepäev"
3,2021-04-04,Ülestõusmispühade 1. püha,"Riigipüha, puhkepäev"
4,2021-05-01,Kevadpüha,"Riigipüha, puhkepäev"


In [35]:
temp = hourly_eda.copy()
hol = holidays.copy()
hol = hol.set_index('date')

temp['dummy_index'] = pd.to_datetime(temp.index.date)
temp = temp.reset_index().set_index('dummy_index')

# merge holidays to hourly
temp = temp.merge(hol['description'], how='left', 
                  left_index=True, right_index=True)

temp['is_holiday'] = temp.description.notna()
temp['description'] = temp.description.fillna('normal')

hourly_eda = temp.set_index('time')
hourly_eda.head(1)

Unnamed: 0_level_0,cabin,home,month,day,hour,day_of_week,is_weekend,is_winter,is_summer,is_dst,rate,description,is_holiday
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
2021-01-01 01:00:00,0.16,0.86,1,1,1,4,False,True,True,False,night,Uusaasta,True


## Consumption

### Distributions
##### Hourly

In [60]:
a = temp.plot_bokeh(
    y=['cabin', 'home'],
    kind='hist', 
    bins=40,
    histogram_type='sidebyside',
    hovertool=True,
    title="Hourly Power Consumption Distributions",
    line_color='white',
    line_width=1,
    ylabel='Counts',
    use_index=False,
    xticks=np.arange(0, 5.5, 0.5),
    xlabel='Consumption [kWh]',
    colormap=['blue', 'green', 'red'])

In [62]:
cabin_density = density_hist(temp, 'cabin', 40)
home_density = density_hist(temp, 'home', 40)

grid = pandas_bokeh.plot_grid([[cabin_density],[home_density]], 
                              width=700, height=300)

##### Daily

In [75]:
# daily energy consumption distributions
temp_1D = temp.resample("1D").sum()[['cabin', 'home']]

In [74]:
a = temp_1D.plot_bokeh(
    kind='hist', 
    bins=40,
    histogram_type='sidebyside',
    hovertool=True,
    title="Daily Power Consumption Distributions",
    line_color='white',
    line_width=1,
    ylabel='Counts',
    use_index=False,
    xticks=np.arange(0, 36, 5),
    xlabel='Consumption [kWh]',
    colormap=['blue', 'green', 'red'])

In [76]:
cabin_density = density_hist(temp_1D, 'cabin', 40)
home_density = density_hist(temp_1D, 'home', 40)

grid = pandas_bokeh.plot_grid([[cabin_density],[home_density]], 
                              width=700, height=300)

### Time Series

In [90]:
temp_time = temp.copy()
temp_time[['cabin_1D_ma', 'home_1D_ma']] = \
    temp_time[['cabin', 'home']].rolling('1D').mean()

a = temp_time.plot_bokeh.line(
    y=['home', 'cabin', 'cabin_1D_ma', 'home_1D_ma'],
    title="Hourly Power Consumption Time Series",
    xlabel='',
    ylabel='Consumption [kWh]',
    hovertool=True,
    rangetool=True
)

##### Holidays

In [181]:
a_1D = temp_1D.plot_bokeh.line(
    y=['home', 'cabin'],
    title="Daily Power Consumption with Holidays",
    ylabel='Consumption [kWh]',
    hovertool=True,
    rangetool=True)



In [213]:
temp_date = temp.resample('1D').first().drop(['cabin', 'home'], axis='columns')
df = temp_1D.merge(temp_date, how='left', left_index=True, right_index=True)
df = df[['cabin', 'home','is_weekend','description','is_holiday']]
df.dtypes

cabin          float64
home           float64
is_weekend        bool
description     object
is_holiday        bool
dtype: object

In [340]:
from bokeh.models import ColumnDataSource, RangeTool, HoverTool, \
    LegendItem, Legend, DatetimeTickFormatter
from bokeh.plotting import figure
from bokeh.embed import file_html
from bokeh.models.ranges import Range1d
from bokeh.layouts import column, row
from bokeh.palettes import Category10, Category20

In [359]:
def calendar(
    df : pd.DataFrame,
    ys : [str],
    exclude_values_dict : dict={}, 
    include_values_dict : dict={},
    ylabel : str=None,
    **fig_kwargs):
    """Plot (multi)line time series with other informative features. Bool dtypes
    are included by default.
    Parameters
    ----------
    df : pd.DataFrame
        DataFrame with all data to be plotted with datetime index dtype.
    ys : list, [str]
        List of column names to be plotted as time series y values. First 
        column name in the list is used for y_values for other columns.
    exclude_values_dict : dict, default None
        Dictionary of the form {column:[value_1, ... ,value_n] to be excluded
        from plotting.
    include_values_dict : dict, default None
        Dictionary of the form {column:[value_1, ... ,value_n] to be included
        to the resulting plot.
    ylabel : str, default None
        Y-label for time series y axis.
    fig_kwargs : key-word arguments
        Key-word arguments for main figure. E.g. width, height, title.
        
    Returns
    -------
    fig : bokeh.figure"""
    
    # dictionaries to contain different column names
    len_exclude_dict = len(exclude_values_dict)
    len_include_dict = len(include_values_dict)
    if len_exclude_dict > 0 and len_include_dict > 0:
        if len(set(exclude_values_dict.keys()) & 
               set(include_values_dict.keys())) != 0:
            raise ValueError(f"Dictionaries can't contain same columns!")
    
    x = df.index.name                           # index name
    source = ColumnDataSource(df.reset_index()) # create bokeh source
    
    if ylabel == None: ylabel=ys[0]             # y-axis label
        
    # MAIN FIGURE #
    fig = figure(
        y_axis_label=ylabel,
        x_axis_type='datetime',
        **fig_kwargs)
    
    start_index = int(0.75 * len(source.data[x])) # explicitly set initial
    start = source.data[x][start_index]           # range for the figure
    end = source.data[x][-1]
    fig.x_range = Range1d(start, end)
    
    # RANGETOOL FIGURE #
    fig_rangetool = figure(
        title='Range Tool',
        height=130, 
        width=fig.width, 
        y_range=fig.y_range,
        x_axis_type='datetime',
        y_axis_type=None,
        tools="",
        toolbar_location=None,
    )
    
    range_tool = RangeTool(x_range=fig.x_range)
    range_tool.overlay.fill_color = "navy"
    range_tool.overlay.fill_alpha = 0.2

    fig_rangetool.ygrid.grid_line_color = None
    fig_rangetool.add_tools(range_tool)
    fig_rangetool.toolbar.active_multi = range_tool
    
    # x-axis tick format
    x_range_delta = df.index[-1] - df.index[0]  # data range in days
    dt_formatter = DatetimeTickFormatter(
        milliseconds=["%H:%M:%S.%f"],
        seconds=["%H:%M:%S"],
        minutes=["%H:%M:%S"],
        hours=["%H:%M:%S"],
        days=["%d %B %Y"],
        months=["%d %B %Y"],
        years=["%d %B %Y"],
    )
    if x_range_delta <= pd.Timedelta(366, unit='D'):
        dt_formatter.days = ["%b"]
        dt_formatter.months = ["%b"]
        dt_formatter.years = ["%Y"]
        dt_hover_format = "%d %b"
    fig.xaxis.formatter = dt_formatter
    fig_rangetool.xaxis.formatter = dt_formatter
    
    # DATA #
    bools = df.select_dtypes(bool).columns
    features = [*ys, *bools, *exclude_values_dict.keys(), 
                *include_values_dict.keys()]
    palette = Category10 if len(features) < 11 else Category20
    
    legend_items = []
    all_renderers = []
    for name, color in zip(features, palette[len(features)]):
        # init hovertool for each glyph on main
        hover = HoverTool( 
            tooltips=[(x, f"@{x}{{{dt_hover_format}}}")],
            formatters={f"@{x}":'datetime'},
            mode='vline')
        
        glyph_main = None
        glyph_range = None
        if name in ys:
            # draw glyphs on mian and on range tool
            glyph_main = fig.line(x=x, y=name, source=source, 
                                  color=color, line_width=2)
            glyph_range = fig_rangetool.line(x=x, y=name, source=source, 
                                             color=color)
            # add glyphs to renderers
            renderers = [glyph_main, glyph_range]
            all_renderers += renderers
            
            hover.tooltips.append((name, f"@{name}"))
            legend_items.append(LegendItem(
                label=f" {name}", renderers=renderers))
            
        else: 
            cds = None
            if name in bools: 
                # query True values only 
                true_idx = df[name][df[name] == True].index
                temp_df = df[ys[0]].loc[true_idx].reset_index()
                cds = ColumnDataSource(temp_df)
            
            elif name in exclude_values_dict.keys(): # values to exclude
                idx = df[name][~df[name].isin(exclude_values_dict[name])] \
                    .index
                cds = ColumnDataSource(df.loc[idx,[ys[0],name]].reset_index())
            
            else: # values to include
                idx = df[name][df[name].isin(include_values_dict[name])] \
                    .index
                cds = ColumnDataSource(df.loc[idx,[ys[0],name]].reset_index())
            
            # create glyphs
            glyph_main = fig.circle(x=x, y=ys[0], source=cds, 
                color=color, size=8, alpha=0.5)
            glyph_range = fig_rangetool.circle(x=x, y=ys[0], 
                source=cds, color=color, size=4, alpha=0.5)
            
            # add glyps to rednerers list
            renderers = [glyph_main, glyph_range]
            all_renderers += renderers
            
            # hover and legend
            hover_string = "True" if name in bools else f"@{name}"
            hover.tooltips.append((name, hover_string))
            legend_items.append(LegendItem(
                label=f" {name}", 
                renderers=renderers))
        
        hover.renderers = [glyph_main] # for HoverTool
        all_renderers += renderers # for legend
        
        fig.add_tools(hover)

    # Dummy fig for legend
    fig_legend = figure(width=130, height=fig.height + 130, 
                        outline_line_alpha=0,toolbar_location=None,
                        border_fill_color='#ffffff')
    
    # set the components of the figure invisible
    for fig_component in [fig_legend.grid[0], fig_legend.ygrid[0],
                          fig_legend.xaxis[0], fig_legend.yaxis[0]]:
        fig_component.visible = False
    
    # set the figure range outside of the range of all glyphs
    fig_legend.renderers += all_renderers
    fig_legend.x_range.end = fig.x_range.end + pd.Timedelta(365, unit='D')
    fig_legend.x_range.start = fig.x_range.start + pd.Timedelta(360, unit='D')
    fig_legend.add_layout(Legend(click_policy = "hide", location='center', 
                                 items=legend_items, border_line_width=2))
    
    return show(row(column(fig,fig_rangetool), fig_legend))
    
calendar(df, ys=['home', 'cabin'], height=300, 
               title='Daily Power Consumption', 
               ylabel=r'Consumption  [kWh/Day]',
               exclude_values_dict={'description': ['normal']})

## Observations:
**Feature Engineering:**
* is_winter : 4 months - Dec, Jan, Feb, Mar. Selected by cold weather rather than winter months per se.
* is_winter : Jun, Jul, Aug
* is_dst : Daylight Saving Time (Mar 28 3am - Oct 31 4am)
* rate : daily or nightly price rate
* description : type of day, if ordinary day == "normal"
* is_holiday : if national holiday that day or not

**Distributions**
* hourly cabin & home : heavily skewed, maybe use <code>PowerTrasnformer</code> to standardize and unskew.
* daily cabin & home : home has less skewness
    