# Exercise 12

The purpose of this exercise is to develop as accurate as possible LSTM model that predicts electricity consumption in a property. 

Prediction:
1. Predict the next hour electricity consumption
2. Predict the next day (24h) electricity consumption 

The data set shows the hourly consumption of the property for the period from 1 January 2017 to 28 February 2022. The files are named as year-month.csv i.e. 2022-2.csv includes consumption from February 2022. Note: the timestamp 1.1.2017 00:00 is the electricity consumption from 1.1.2017 00:00 to 01:00. 

Zipped dataset file can be found from the Moodle: Electricity_consumption.zip



Dataset splitting:
1. Training dataset: 1.1.2017-30.9.2020
2. Test dataset: 1.10.2020-30.9.2021
3. "Another test dataset", which is used to when your LSTM is ready: 1.10.2021-28.2.2022 


Enrich data using open data from (Finnish meteorological Institute)[https://en.ilmatieteenlaitos.fi/]. 
- The weather station location is: Jyväskylä lentoasema, 137208, longitude: 62.39, latitude: 25.69
- Enriched dataset have to contain AT LEAST the temperature, but you can use other information such as windy, humidity etc.


In [1]:
import glob
import os
import pandas as pd 
import numpy as np
from datetime import datetime, timedelta
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
tf.keras.backend.set_floatx('float64')

from sklearn.preprocessing import MinMaxScaler

from keras.models import Sequential
from keras.layers import Dense, LSTM, Dropout
from sklearn.metrics import mean_squared_error


In [2]:
df_train = pd.read_csv('data/df_11_train.csv', parse_dates = ['Datetime'], index_col = ['Datetime'])
df_test = pd.read_csv('data/df_11_test.csv', parse_dates = ['Datetime'], index_col = ['Datetime'])
df_test2 = pd.read_csv('data/df_11_test2.csv', parse_dates = ['Datetime'], index_col = ['Datetime'])

df_train

Unnamed: 0_level_0,Power,Precipitation amount (mm),Air temperature (degC),Wind speed (m/s)
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2017-01-01 00:00:00,1.39,0.0,2.2,4.9
2017-01-01 01:00:00,3.08,0.0,1.9,4.1
2017-01-01 02:00:00,0.84,0.0,1.7,4.1
2017-01-01 03:00:00,1.66,0.0,1.3,4.8
2017-01-01 04:00:00,0.76,0.0,0.7,4.4
...,...,...,...,...
2020-09-30 19:00:00,1.74,0.0,11.0,1.5
2020-09-30 20:00:00,2.80,0.0,10.9,1.0
2020-09-30 21:00:00,1.77,0.0,10.7,1.1
2020-09-30 22:00:00,1.11,0.0,10.1,0.7


In [3]:
# Checking

print(f'{df_train.shape} ja {1369 * 24}')
print(f'{df_test.shape} ja {365 * 24}')
print(f'{df_test2.shape} ja {151 * 24}')

(32855, 4) ja 32856
(8760, 4) ja 8760
(3625, 4) ja 3624


In [4]:
def split_to_features_and_target(data):
    y = data[['Power']].copy()
    X = data.drop(['Power'], axis = 1).copy()
    return X, y

X_train, y_train = split_to_features_and_target(df_train)
X_test, y_test = split_to_features_and_target(df_test)
X_test2, y_test2 = split_to_features_and_target(df_test2)

# Scaling

scaler = MinMaxScaler()
scaler.fit(X_train)
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)
X_test2_scaled = scaler.transform(X_test2)

scalero = MinMaxScaler()
scalero.fit(y_train)
y_train_scaled = scalero.transform(y_train)
y_test_scaled = scalero.transform(y_test)
y_test2_scaled = scalero.transform(y_test2)

In [5]:
# Stepify Satu Sadulta! Muokattu hieman.

def stepify(data):
    lookback = 72
    X=[]

    for i in range(len(data) - lookback - 1):
        t=[]
        for j in range(0, lookback):
            t.append(data[[(i + j)], :])
        X.append(t)
        
    X = np.array(X)
    X = X.reshape(X.shape[0], lookback, data.shape[1]) # data.shape[1] is used for getting no, features/targets
    print(X.shape)
    return X

X_train_scaled_steps = stepify(X_train_scaled)
X_test_scaled_steps = stepify(X_test_scaled)
X_test2_scaled_steps = stepify(X_test2_scaled)

y_train_scaled_steps = stepify(y_train_scaled)
y_test_scaled_steps = stepify(y_test_scaled)
y_test2_scaled_steps = stepify(y_test2_scaled)

(32782, 72, 3)
(8687, 72, 3)
(3552, 72, 3)
(32782, 72, 1)
(8687, 72, 1)
(3552, 72, 1)


In [6]:
# Initialising the LSTM
regressor = Sequential()

# Adding the first LSTM layer and some Dropout regularisation
# NOTE: If we have sequcential LSTM layers, we have to use return_sequences = True parameters
regressor.add(LSTM(units = 50, return_sequences = True, input_shape = (72, 3)))
regressor.add(Dropout(0.2))

# Adding a second LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0.2))

# Adding a third LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0.2))

# Adding a fourth LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 50))
regressor.add(Dropout(0.2))

# Adding the output layer
regressor.add(Dense(units = 1))

# Compiling the LSTM: optimizer = adam and loss = mean_squared_error 
regressor.compile(optimizer = 'adam', loss = 'mean_squared_error')

# Fitting the LSTM to the Training set
regressor.fit(X_train_scaled_steps, y_train_scaled_steps, epochs = 100, batch_size = 32)

Epoch 1/100
Epoch 2/100
Epoch 3/100

KeyboardInterrupt: ignored

# Conclusion

Write short conclusion about results, development process etc. 

Ran out of time, so this model is not build further.

