## Generate Datasets 

**Description of the notebook :**

In this notebook, The function to create and save the features and the target values will be provided. In the `helpers.py` file one can find all the functions generate the features and the target values. The houses with patterns (6,40,59,72,87) and a house with no significant pattern (60) will be analysed. Respectively they correspond to the house number 1,2,3,4,5 and 6 in the paper.

The list of features modeled are : 
- 'EHW_1W' : Consumption at the same time the week before [L/h] 
- 'EHW_1D' : Consumption at the same time the day before [L/h] 
- 'EHW_12H' : Consumption 12 hours before [L/h] 
- 'EHW_2H' : Consumption 2 hours before [L/h] 
- 'EHW_1H' : Consumption 1 hour before [L/h] 
- 'WORKDAY' : 1 if the day is a workday, otherwise 0
- 'HOLIDAYS' :  1 if the day is a holiday, otherwise 0
- 'DAY0' : 1 if the day is weekday 0 otherwise 0
- 'DAY1' : 1 if the day is weekday 1 otherwise 0
- 'DAY2' : 1 if the day is weekday 2 otherwise 0
- 'DAY3' : 1 if the day is weekday 3 otherwise 0
- 'DAY4' : 1 if the day is weekday 4 otherwise 0
- 'DAY5' : 1 if the day is weekday 5 or 6 otherwise 0
- 'HOURLY_ROLLING_MEAN' : moving average of the consumption for that specific hour [L/h] 
- 'DAILY_HOURLY_ROLLING_MEAN' : moving average of the consumption for that specific hour of the day [L/h] 
- 'IS_CONSUMPTION_LAST24' : 1 if consumption during the past 24 hours otherwise 0
- 'IS_CONSUMPTION_LAST12H' : 1 if consumption during the past 12 hours otherwise 0

Target values : 
- Demand [L/h] 
- Binary value if there is consumption or not (0,1)
- Class of the EWH consumption (0,1,2) based on the percentage or the quantile

The features and target will be generated for the dataset and they are not splitted between train/test set yet.

Import the useful libraries : 

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

from helpers import *

In [1]:
def export_to_csv(house_number):
    """
    Function to export all the features and the target values for one specific house. 
    The EWHDataset file containing all the data must be placed in the same folder.
    
    Inputs :
        - house_number (int) between 1 and 98
    Outputs : 
        - Two csv files stored in the Data folder under the name house_{house_number}_Features.csv and house_{house_number}_Target.csv
    """
    df = pd.read_csv('EWHDataset.csv',index_col =0)
    
    df.index = pd.to_datetime(df.index)
    df['month'] = df.index.month
    df['weekday'] = df.index.dayofweek
    df['hour'] = df.index.hour
    
    df['workday'] = (df['weekday']<5).astype(int)
    
    df['holidays'] = 0
    df.loc['2017-09-30':'2017-10-08','holidays'] = 1
    df.loc['2017-12-07':'2018-01-16','holidays'] = 1
    df.loc['2018-03-29':'2018-04-09','holidays'] = 1
    df.loc['2017-09-24':'2017-09-25','holidays'] = 1
    df.loc['2018-03-21','holidays'] = 1
    
    df = pd.concat([df, pd.get_dummies(df['weekday'],prefix='day')],axis=1)
        
    X, Y, features = prepare_dataset(df,house_number)
    
    Features = pd.DataFrame(data=X,columns=features,index=df.index[7*24:])
    
    Target = pd.concat([pd.Series(Y,index=df.index[7*24:]),
          pd.Series(binary_consumption(df,house_number)[7*24:].values,index=df.index[7*24:]),
          pd.Series(prepare_output(df,house_number,method='percentage')[7*24:],index=df.index[7*24:]),
          pd.Series(prepare_output(df,house_number,method='quantile')[7*24:],index=df.index[7*24:])],axis=1)
    
    Target.columns = ['Demand','Binary_Consumption','Percentage_Consumption','Quantile_Consumption']

    Features.to_csv(f'Data/house_{house_number}_Features.csv')
    Target.to_csv(f'Data/house_{house_number}_Target.csv')

**Prepare Dataset house 6**

In [3]:
export_to_csv(6)

**Prepare Dataset house 40**

In [4]:
export_to_csv(40)

**Prepare Dataset house 59**

In [5]:
export_to_csv(59)

**Prepare Dataset house 72**

In [6]:
export_to_csv(72)

**Prepare Dataset house 87**

In [7]:
export_to_csv(87)

**Prepare Dataset house 60**

In [8]:
export_to_csv(60)