<center>
    <font size=5>Home and Cabin 
        Power Consumption 2021</font>
</center>

##### Sources:

# General
Notebook to analyse electricity consumption in my parents house and in the cottage.

<u><font size=4>Motivation / Objective:</font></u>
* Investigate the dataset using interactive plotting tools in python
* Catch trends seasonal and cyclic patterns in data
* Forecast power consumption based on time-series data

<u><font size=4>Data:</font></u>
* **Data type:** Tabular Data
    Hourly Power Consumption Dataset:
    * Data source: Eesti Energia AS, Estonian main electricity prowider company
    * Data download date: 25.01.2021
    * Data range: 01.01.2021 00:00 - 01.01.2022 00:00
    * Data given: hourly consumption rate in **kwh** - kilotwatt-hours

    Monthly Power Consumption Dataset:
    * Data source: Eesti Energia AS, 
    * Data download date: 25.01.2021
    * Data range 2020-2022
    * Monthly consumption summary statistics for years 2020 & 2021:
        * Daily 
        * Nightly
        * Total
* **Problem Type:** Predict Power consumption Supervised Time-Series Regresion

# Imports

In [154]:
import pandas as pd
import numpy as np

import pandas_bokeh
pandas_bokeh.output_notebook()

from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.utils.validation import check_is_fitted

# Data

## Classes, functions

In [None]:
# sklearn wrapped trasnformer
class DFDtypeMapper(BaseEstimator, TransformerMixin):
    """Remap pandas dataframe dtypes.
    Parameters
    ----------
    dtype_dict : dict, {'dtype':[col_name]}
        Dictionary of dtypes as keys and values as list of column names. 
    
    Returns
    -------
    DataFrame : pd.DataFrame"""
    def __init__(self, dtype_dict : dict):
        self.dtype_dict = dtype_dict
        self.transformed_column_names = None 
    
    def fit(self, X, y=None):
        self.all_columns_ = X.columns
        return self
    
    def get_feature_names_out(self, input_features=None) -> np.ndarray:
        check_is_fitted(self)
        return self.all_columns_
    
    def transform(self, X, y=None) -> pd.DataFrame:
        X_ = X.copy()
        # remove columns that are not in X
        _dtype_dict = {}
        for dtype, val in self.dtype_dict.items():
            if isinstance(val, str):
                if val in X_.columns: 
                    _dtype_dict[dtype] = val
            elif type(val) not in [tuple, list, np.ndarray]:
                raise ValueError(f'Wrong type for {self.dtype_dict} value.')
            else:
                _dtype_dict[dtype] = [col for col in val 
                                           if col in X_.columns]
        
        for dtype in _dtype_dict:
            X_[_dtype_dict[dtype]] = X_[_dtype_dict[dtype]].astype(dtype)
        
        return X_

class DFValueMapper(BaseEstimator, TransformerMixin):
    """Rename values in column based on dictionary.
    Parameters
    ----------
    map_dict : dict 
        Dictionary of old mappings to new.
    cat_only : bool, default True
        - If True: consider category dtype columns only
        - If False: apply to all columns. Computationally more expensve.
    
    Returns
    -------
    DataFrame : pd.DataFrame
        Remapped pandas DataFrame."""
    def __init__(self, map_dict : dict, cat_only=True):
        self.cat_only = cat_only
        self.map_dict = map_dict
    def fit(self, X, y=None):
        self.all_columns_ = X.columns
        return self
    def get_feature_names_out(self, input_features=None) -> np.ndarray:
        check_is_fitted(self)
        return self.all_columns_
    def transform(self, X, y=None) -> pd.DataFrame:
        X_ = X.copy()
        # categorical features
        if self.cat_only:
            cat_cols = X_.columns[(X_.dtypes == 'category').values]
            X_[cat_cols] = X_[cat_cols].apply(
                lambda x: x.cat.rename_categories(self.map_dict))
            return X_
        else:
            return X_.replace(self.map_dict)

## Hourly Usage

In [14]:
# load the data
hourly_use = pd.read_csv(
    'data/tarbimine_tund.csv', 
    header=6, sep=';',
    index_col=False,
    names=['start', 'end', 'cabin', 'home'],
    parse_dates=['start', 'end'],
    decimal=',')

hourly_use.head(3)

Unnamed: 0,start,end,cabin,home
0,2021-01-01 02:00:00,2021-01-01 03:00:00,0.12,1.377
1,2021-01-01 03:00:00,2021-01-01 04:00:00,0.16,0.17
2,2021-01-01 04:00:00,2021-01-01 05:00:00,0.12,0.252


## Monthly Usage
Data in summarized tabular form:
* 3 location chunks:
    * cabin
    * home
    * both summarized
* 3 year chuncks per location
    * 2020 fully
    * 2021 fully
    * 2022 partially
* Each year with monthly index and overall summary in the end.

In [179]:
# read in the data with location headings
monthly_use = pd.read_csv(
    'data/tarbimine.csv', 
    header=None, sep=';',
    skiprows=4,
    index_col=False,
    decimal=',',
    names=['month', 'day', 'night', 'total'])

monthly_use.head()

Unnamed: 0,month,day,night,total
0,Mõõtepunkti aadress: Sutu,,,
1,Tarbimine 2020. aastal,,,
2,,Päev (kWh),Öö (kWh),Kokku (kWh)
3,Jaanuar,39153,40772,79925
4,Veebruar,31930,38141,70071


### Clean & Reshape

In [233]:
temp = monthly_use.copy()

# references where to break the temp
locs = ['Sutu', 'Kuressaare', 'summeeritud']

# extract location
temp['locale'] = temp.month.str.extract(fr"({locs[0]}$|{locs[1]}$|{locs[2]})")
temp['locale'] = temp.locale.fillna(method='ffill')
temp['locale'] = temp.locale.rename({'Sutu':'cabin',
                                 'Kuressaare':'home',
                                 'summeeritud': 'total'})
# extract year
temp['year'] = temp.month.str.extract(r"^Tarbimine\s(\d{4})\.\saastal")
temp['year'] = temp.year.fillna(method='ffill')

temp.dropna(inplace=True) # get rid of references
temp.dtypes

month     object
day       object
night     object
total     object
locale    object
year      object
dtype: object

In [234]:
# convert comma decimals to points
temp.loc[:, 'day':'total'] = (
    temp.loc[:, 'day':'total']
    .apply(lambda x: x.str.replace(',', '.')))

# convert dtypes & remap values
dtype_dct = {'category':['month', 'locale', 'year'],
             'float':['day', 'night', 'total']}
map_dct = {  'Jaanuar':'Jan',
             'Veebruar': 'Feb',
             'Märts': 'Mar',
             'Aprill': 'Apr',
             'Mai': 'May',
             'Juuni': 'Jun',
             'Juuli': 'Jul',
             'August': 'Aug',
             'September': 'Sep',
             'Oktoober': 'Oct',
             'November': 'Nov',
             'Detsember': 'Det',
             'Aasta kokku': 'Yearly',
             'Sutu': 'cabin',
             'Kuressaare': 'home',
             'summeeritud': 'total'}

temp = DFDtypeMapper(dtype_dct).fit_transform(temp)
temp = DFValueMapper(map_dct).fit_transform(temp)
temp.head()

Unnamed: 0,month,day,night,total,locale,year
3,Jan,39.153,40.772,79.925,cabin,2020
4,Feb,31.93,38.141,70.071,cabin,2020
5,Mar,27.137,31.569,58.706,cabin,2020
6,Apr,20.018,23.972,43.99,cabin,2020
7,May,26.575,28.716,55.291,cabin,2020


In [244]:
temp.copy()

Unnamed: 0,month,day,night,total,locale,year
3,Jan,39.153,40.772,79.925,cabin,2020
4,Feb,31.930,38.141,70.071,cabin,2020
5,Mar,27.137,31.569,58.706,cabin,2020
6,Apr,20.018,23.972,43.990,cabin,2020
7,May,26.575,28.716,55.291,cabin,2020
...,...,...,...,...,...,...
120,Nov,310.882,187.765,498.647,total,2021
121,Det,415.889,260.094,675.983,total,2021
122,Yearly,3398.795,2646.714,6045.509,total,2021
125,Jan,296.273,283.045,579.318,total,2022


In [243]:
(temp.copy()
.query("year in ['2020','2021']")
.set_index(['year', 'locale', 'month'])
.sort_index())

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,day,night,total
year,locale,month,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2020,home,Yearly,2598.340,2000.977,4599.317
2020,home,Apr,183.016,140.080,323.096
2020,home,Aug,207.633,164.666,372.299
2020,home,Det,289.374,176.477,465.851
2020,home,Jan,211.679,162.732,374.411
...,...,...,...,...,...
2021,total,Mar,315.632,228.950,544.582
2021,total,Nov,310.882,187.765,498.647
2021,total,Oct,202.362,196.871,399.233
2021,total,Sep,200.281,163.210,363.491
