**Room Occupancy Prediction using Deep Neural Networks**


In this project, we explore a machine learning approach to predict room occupancy using a dataset that includes features such as ```Temperature```, ```Humidity```, ```Light```, and ```Carbon Dioxide (CO2)``` levels. The target variable is binary, indicating the presence ```1``` or absence ```0``` of room occupancy.

**Dataset Overview**

The dataset provides the following features:

* Temperature
* Humidity
* Light
* Carbon Dioxide (CO2)

The target variable is:

* 1 - Indicates room occupancy.
* 0 - Indicates no room occupancy.

The dataset offers a unique opportunity to explore binary classification through various exploratory data analysis (EDA) techniques and predictive modeling.

**Project Highlights**

In this notebook, we focus on leveraging **Deep Neural Networks (DNN)**, specifically:

* Long Short-Term Memory (LSTM) networks to capture temporal dependencies in the data.
* Deep Feedforward Neural Networks (DFNN) for feature representation and classification.

**Key Achievements:**

Achieved accuracy between 95% and 100% in predicting room occupancy.
High F1 scores for both classes, demonstrating balanced performance.

Reduced the number of variables used compared to the original dataset, highlighting the efficiency of the modeling approach.



> **Dataset link:** https://www.kaggle.com/datasets/sachinsharma1123/room-occupancy/data



# Modules installation

In [10]:
!pip install pyts
!pip install keras-tuner
#!pip install keras-nlp



# Modules

In [11]:
import sys
import os
import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import KFold
from scipy.stats import norm
from pyts.approximation import SymbolicAggregateApproximation
#from pyts.approximation import DiscreteFourierTransform
#from pyts.approximation import PiecewiseAggregateApproximation
from scipy.interpolate import interp1d
from sklearn.preprocessing import LabelEncoder
import keras
from keras.metrics import AUC
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras.layers import Conv1D
from keras.layers import AveragePooling1D
from keras.layers import MaxPooling1D
from keras.models import Model
from keras.layers import Average
from keras.layers import Concatenate
from keras.layers import Add
from keras.layers import Multiply
from keras.layers import LayerNormalization
#from keras_nlp.layers import TransformerEncoder
from keras.losses import BinaryCrossentropy
from keras.regularizers import l2
from sklearn.model_selection import train_test_split
from keras.optimizers import Adam
from sklearn.metrics import accuracy_score
from sklearn.metrics import f1_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from keras.layers import Input
import keras_tuner as kt
from sklearn.metrics import confusion_matrix
from google.colab import drive
from sklearn.preprocessing import OneHotEncoder
import string

# Classes

The ```Preprocessing``` class processes both training and test set by deleting 'Unnamed: 0' column and adding the hour column derived from the 'Date' column.

Furthermore, with ```.get_info()``` it is  possible to check information on both sets and it has a method to plot specific features with  ```.plot_ts(column, train)```.

It also possible to get these preprocessed sets with the methods  ``` .get_train() ``` and  ``` .get_test() ```





In [12]:
class Preprocessing:

  # the building methods allows to import datasets. Then it will drop unnecessary default column
  # and adding the hours feature through the date column.
  def __init__(self, path, filename_train, filename_test):

    # preparing working directory
    drive.mount('/content/drive')
    os.chdir(path)

    # loading train and test sets
    self.df_train = pd.read_csv(path + filename_train)
    self.df_test = pd.read_csv(path + filename_test)

    # deleting "unnamed" column
    self.df_train.drop('Unnamed: 0', axis = 1, inplace = True)
    self.df_test.drop('Unnamed: 0', axis = 1, inplace = True)

    # adding hour feature
    self.df_train['date'] = pd.to_datetime(self.df_train['date'])
    self.df_test['date'] = pd.to_datetime(self.df_test['date'])

    self.df_train['hour'] = self.df_train['date'].dt.hour
    self.df_test['hour'] = self.df_test['date'].dt.hour


  def get_info(self):
    print('--------------------- Train ---------------------')
    print(self.df_train.info())
    print('------------------------ Test --------------------')
    print(self.df_test.info())


  def plot_ts(self, col, train = True):
    plt.figure(figsize=(12, 6))

    if (train == True):
      plt.plot(self.df_train[col], label=col)
    elif (train == False):
      plt.plot(self.df_test[col], label=col)
    else:
      return 'define as a parameter if \'train\' or \'test\''

    plt.xlabel('Time')
    plt.ylabel(col)
    plt.title(f'{col} Time Series Plot')
    plt.show()


  def get_train(self):
    return self.df_train


  def get_test(self):
    return self.df_test

The main porpouse of ```Shrinker``` is to prepare the data in a format suitable for the model and for the learning task.


The idea behind is that the smart application, which is able to identify if a room is occuped or not, is that it receves data from sensors through a window *W* and it predicts the occupancy state (i.e. class/label) at the last timestamp of this window.

Formally, the raw data are in the following form:


\begin{align}
\mathbf{((x_1, y_1),(x_2, y_2), (x_3, y_3), ... , (x_T, y_T) )}
\end{align}

where each $\mathbf{x_t}$ is the value of an attribute at time $\mathbf{t}$ , $\mathbf{T}$ is the last timestamp of the series and $\mathbf{y_t}$ is the label at time $\mathbf{t}$. This class transforms the previous raw format in the following form:

\begin{align}
\mathbf{((x_1, x_2, x_3, ... , x_w ), y_w), ((x_2, x_3, ... , x_{w+1}), y_{w+1}), ... , ((x_{T-w}, x_{T-w+1},...,x_T), y_T)}
\end{align}

Where $\mathbf{w}$ represents the size of the window.


Moreover, this transformation is able to add another feature, called **logRatio** through the method ```get_logRatio(x, index_numerator, index_denominator)``` which is the logarithmical ratio between the specified features.


Below there is an example of how to use this class

```
S = Shrinker(window_size = 60)
x, y = S.fit_transform(
      df[['input_column_1','input_column_2']],
      df['label']
  )
x = S.get_logRatio(x, 1, 0)
```



In [13]:
class Shrinker():
  def __init__(self, window_size):

    self.window_size = window_size


  def fit(self):
    return self


  def transform(self, X, Y):

    X_transformed = []
    Y_transformed = []

    # lenght of the time series
    n_samples = len(X.axes[0])

    # creation of the sequences
    for start_idx in range(n_samples - self.window_size + 1):
      end_idx = start_idx + self.window_size
      sequence = X.iloc[start_idx:end_idx].to_numpy()  # extract the window
      label = Y.iloc[end_idx - 1]       # the corresponding label


      X_transformed.append(sequence)
      Y_transformed.append(label)


    X = np.asarray(X_transformed)
    Y = np.asarray(Y_transformed)


    return X, Y.reshape(-1, 1)  # in this way it is reshaped from (x, ) to (x, 1)


  def fit_transform(self, X, Y):
    return self.fit().transform(X, Y)



  def get_logRatio(self, x, idx_num, idx_denom):
    logRatio = np.log( np.divide(x[:,:,idx_num],
                            x[:,:,idx_denom],
                            out=np.full(x[:,:,idx_num].shape, 1e-8),
                            where = x[:,:,idx_denom] != 0  ))   # fill with 0 where division is not valid (i.e denominator is equal to 0)

    logRatio = np.expand_dims(logRatio, axis=-1)
    return np.concatenate((x, logRatio), axis = 2)




```OutlierEliminator()``` identifies possible outliers through the interquantile range, based on the first and last quantile from training data (i.e. this is done in the fit method). Once all possible outliers are detected, the outlier values are replaced through interpolation. It is possible to avoid this transformation on specific feature, to do so, there is ```ignored_indices``` parameter, which allows to take off them temporarlly and add them again at the end of the trasfromation. Example below:

```
O = OutlierEliminator(ignored_indices = [0, 1]) # 0-th and 1-st columns are unchanged
x, _ = O.fit_trasform(x)
```

In [14]:
# outlier elimination:
# nel builder specifico l'indice della colonna delle ore e la threshold (1.5 di default)
# nel fit ottengo i delta poi calcolo i quantili (0.25 e 0.75) e l'IRQ dei delta per ciascuna feature
# nel transform rilevo per ciascuna serie e feature, l'indice dove ce un possibile outlier e sostituisco con l'interpolazione
# (metodo fit_transform per completezza)

class OutlierEliminator():
  def __init__(self, ignored_indices = None, thr = 1.5):
    self.ignored_indices = ignored_indices
    self.thr = thr



  def remove_hour_col(self, x):

    self.ignored_features = list()

    for i, idx in enumerate(self.ignored_indices):

      self.ignored_features.append( x[:,:, idx] )
      self.ignored_features[i] = np.expand_dims(self.ignored_features[i], axis=-1)  # reshaped from (n_series, window_size) to (n_series, window_size, 1)

    x = np.delete(x, self.ignored_indices, axis=2)

    return x



  def add_hour_col(self, x):

    for feature in self.ignored_features:
      x = np.concatenate((x, feature), axis=2)

    return x



  def get_deltas(self, x_train):

    numerator = x_train[:, 1:, :] - x_train[:, :-1, :]
    denominator = x_train[:, :-1, :]

    # here deltas will have shape (x, y-1, z)
    deltas =  np.divide(
                numerator,
                denominator,
                out=np.zeros_like(numerator),  where = denominator != 0 # fill with 0 where division is not valid
                              )

    zero_row = np.zeros((x_train.shape[0], 1, x_train.shape[2]))    # create an array of zeros with shape (x, 1, z)
    deltas =  np.concatenate((zero_row, deltas), axis=1)   # now deltas has shape (x,y,z)


    return deltas



  def fit(self, x_train, y_train = None):

    def unfold_ts(x):
      ts = x[0, :, :]       # ts stands for time series. It starts taking the first window

      for i in range(1, x.shape[0]):

        # appending last value of every series to ts variable
        conc_value = x[i, -1, :].reshape(1, -1)   # reshaped because in this way it takes shape (1, n_features) instead of (n_features, )
        ts = np.concatenate((ts, conc_value), axis = 0)

      return ts

    if self.ignored_indices is not None:
      x_train = self.remove_hour_col(x_train)

    deltas = self.get_deltas(x_train)
    deltas = unfold_ts(deltas)

    self.distrib = dict()

    self.distrib['Q1'] = np.quantile(deltas, 0.25, axis = 0)
    self.distrib['Q3'] = np.quantile(deltas, 0.75, axis = 0)
    self.distrib['IQR'] = self.distrib['Q3'] - self.distrib['Q1']

    return self



  def transform(self, x, y= None):

    if self.ignored_indices is not None:
      x = self.remove_hour_col(x)


    deltas = self.get_deltas(x)
    high_thr =  self.distrib['Q3'] + self.thr * self.distrib['IQR']
    low_thr =   self.distrib['Q1'] - self.thr * self.distrib['IQR']


    for series in range(0, x.shape[0]):
      for feature in range(0, x.shape[2]):


        # put NaN values where the deltas are above (or below) the interquantile range
        idx_high_outliers = np.where( deltas[series, :, feature] > high_thr[feature] )[0]
        idx_low_outliers = np.where( (deltas[series, :, feature] < low_thr[feature] ) & (deltas[series, :, feature] != 0) )[0]

        if idx_high_outliers.size != 0:
          x[series, idx_high_outliers, feature] = np.nan

        if idx_low_outliers.size != 0:
          x[series, idx_low_outliers, feature] = np.nan


        # get values and indices of not NaN data, these are usefull for the interpolation function
        X_not_nan = x[series, :, feature][~np.isnan( x[series, :, feature] )]
        indices = np.where(~np.isnan( x[series, :, feature] ))[0]

        # replacing NaN value through interpolation
        x[series,:, feature] = np.interp( np.arange( 0, x.shape[1] ), indices, X_not_nan  )


    if self.ignored_indices is not None:
      x = self.add_hour_col(x)

    return x, y


  def fit_transform(self, x, y = None):
    return self.fit(x).transform(x, y)
    #T = self.fit(x)

    #return T.transform(x, y)

```SAX_Transformer()``` enables the SAX transformation and it also do a one hot encoding on the trasformed features. The parameters for the SAX transformer are the same of the SAX transformer from pyts, for more details: https://pyts.readthedocs.io/en/latest/generated/pyts.approximation.SymbolicAggregateApproximation.html

Likewise ```OutlierEliminator()```, ```SAX_Transformer()``` allows to ignore user-specified features through selecting indicies of those features.

Example below:

```
T = SAX_Transformer(n_bins = 10, strategy = )
```

In [15]:
# SAX transformation with OH-Encoding
class SAX_Transformer():
  def __init__(self, n_bins, ignored_indices = None, strategy = 'normal', ohe = True):
    self.n_bins = n_bins
    self.ignored_indices = ignored_indices
    self.strategy = strategy
    self.ohe = ohe


  def remove_hour_col(self, x):

    self.ignored_features = list()

    for i, idx in enumerate(self.ignored_indices):
      self.ignored_features.append( x[:,:, idx] )
      self.ignored_features[i] = np.expand_dims(self.ignored_features[i], axis=-1)  # reshaped from (n_series, window_size) to (n_series, window_size, 1)

    x = np.delete(x, self.ignored_indices, axis=2)

    return x


  def add_hour_col(self, x):

    for feature in self.ignored_features:
      x = np.concatenate((x, feature), axis=2)

    return x


  def fit(self, x, y=None):

    if self.ignored_indices:
      x = self.remove_hour_col(x)

    self.sax_transformers = list()
    n_features = x.shape[2]

    # SAX transfromer initialization for every feature in x
    self.sax_transformers = [
          SymbolicAggregateApproximation(n_bins=self.n_bins, strategy=self.strategy).fit(x[:,:,idx_feature])
            for idx_feature in range(n_features) ]
    return self


  def oh_encoding(self, data):

    n_series, window_size, n_features = data.shape
    one_hot_encoded_all_features = []

    for i in range(n_features):
      # applying one-hot encoding separatly on each feature
      label_encoder = LabelEncoder()
      flattened_data = data[:, :, i].ravel()
      encoded_data = label_encoder.fit_transform(flattened_data)

      # number of different letters (n_bins)
      one_hot_encoded = np.eye(self.n_bins)[encoded_data]
      one_hot_encoded = one_hot_encoded.reshape(n_series, window_size, self.n_bins)

      one_hot_encoded_all_features.append(one_hot_encoded)

    # concatenate over the feature dimension (so it will get a shape (n_series, window_size, n_bins * n_features))
    one_hot_encoded_all_features = np.concatenate(one_hot_encoded_all_features, axis=-1)

    return one_hot_encoded_all_features



  def transform(self, x, y= None):

    if self.ignored_indices is not None:
      x = self.remove_hour_col(x)

    n_features = x.shape[2]
    sax_transformed = []
    # Applying SAX to every feature
    for i in range(n_features):
        sax_feature = self.sax_transformers[i].transform(x[:, :, i])
        sax_transformed.append(sax_feature)

    # Stack per ottenere una nuova shape (n_timeseries, n_timestamp, n_features)
    sax_transformed = np.stack(sax_transformed, axis=-1)

    if self.ohe:
      sax_transformed = self.oh_encoding(sax_transformed)

    if self.ignored_indices is not None:
      sax_transformed = self.add_hour_col(sax_transformed)

    return sax_transformed, y


  def fit_transform(self, x, y = None):
    return self.fit(x, y).transform(x, y)

The class Report simply gives a report containing the information about the predictive performance of the model through some metrics like:


*   Accuracy
*   Precision
*   Recall
*   F1-Score

and a confusion matrix

In [16]:
class Report():
  def __init__(self, y_true, y_pred):
    self.y_true = y_true
    self.y_pred = y_pred
    #compute metrics
    self.accuracy = accuracy_score(self.y_true, self.y_pred)
    self.f1_class_0 = f1_score(self.y_true, self.y_pred, pos_label=0)
    self.f1_class_1 = f1_score(self.y_true, self.y_pred, pos_label=1)
    self.precision_0 = precision_score(self.y_true, self.y_pred, pos_label=0)
    self.precision_1 = precision_score(self.y_true, self.y_pred, pos_label=1)
    self.recall_0 = recall_score(self.y_true, self.y_pred, pos_label=0)
    self.recall_1 = recall_score(self.y_true, self.y_pred, pos_label=1)


  def get_accuracy(self):
    return self.accuracy

  def get_f1_class_0(self):
    return self.f1_class_0

  def get_f1_class_1(self):
    return self.f1_class_1

  def get_precision_0(self):
    return self.precision_0

  def get_precision_1(self):
    return self.precision_1

  def get_recall_0(self):
    return self.recall_0

  def get_recall_1(self):
    return self.recall_1

  def show_report(self):
    # print metrics
    print(f"Accuracy: {self.accuracy}")
    print(f"F1-score for class 0: {self.f1_class_0}")
    print(f"F1-score for class 1: {self.f1_class_1}")
    print(f"Precision for class 0: {self.precision_0}")
    print(f"Precision for class 1: {self.precision_1}")
    print(f"Recall for class 0: {self.recall_0}")
    print(f"Recall for class 1: {self.recall_1}")
    print(confusion_matrix(self.y_true, self.y_pred))

# Preprocessing

In [17]:
PATH= '/content/drive/MyDrive/data_mining/DM/dm2_project/'

In [18]:
P = Preprocessing(PATH, 'training_ts.csv', 'test_ts.csv')
train = P.get_train()
test = P.get_test()

Mounted at /content/drive


# Shrinking

In [19]:
S = Shrinker(window_size = 60)

x_train_shrinked, y_train_shrinked = S.fit_transform(train[['Light', 'CO2', 'hour']], train['Occupancy'])
x_test_shrinked, y_test_shrinked = S.transform(test[['Light', 'CO2', 'hour']], test['Occupancy'])

x_train_shrinked = S.get_logRatio(x_train_shrinked, 1, 0)
x_test_shrinked = S.get_logRatio(x_test_shrinked, 1, 0)

# Outlier elimination

In [20]:
O = OutlierEliminator(ignored_indices = [2,3])

x_train_shrinked, _ = O.fit_transform(x_train_shrinked)
x_test_shrinked, _ = O.transform(x_test_shrinked)

# SAX

In [21]:
SAX = SAX_Transformer(n_bins = 8, ignored_indices = [2,3])

x_train_sax, _ = SAX.fit_transform(x_train_shrinked)
x_test_sax, _ = SAX.transform(x_test_shrinked)

# Model selection

The fitting process for all the models described below follows the same approach, utilizing the classes ```Shrinker()```, ```OutlierEliminator()```, and ```SAX_Transformer()```. These classes have their parameters fine-tuned by the tuner to determine the optimal input shape that maximizes the model's performance. Initially, the dataset is divided using a standard hold-out method, allocating 15% of the training data for validation. After identifying the best configuration, the model undergoes a thorough evaluation using k-fold cross-validation (as detailed in the Cross-Validation section) to ensure robustness, and its final performance is assessed on the designated test set (refer to the Model Evaluation section).

## Best model

This model uses 4 features, namely Light, CO2, hour and logRatio. Then, it combines each feature with the hour feature, in this way it is possible for the model to learn the relationship between these features (i.e. light, CO2 and this log ratio of light and CO2) and the time component (i.e. the hour in that day). This is done by just applying a concatenation.

This process involves passing each feature through its own LSTM layers that are specifically designed to model the interaction between that feature and the hour feature. There are four distinct sets of LSTM layers: one for Light combined with Hour, one for CO2 combined with Hour, one dedicated solely to the log ratio, and one for the interaction between the log ratio and Hour. Each LSTM set focuses on learning the temporal patterns and dependencies unique to the combination of the respective feature and the hour of the day.

After this sets of LSTMs, the model concatenates the results of the different LSTMs and condensate them with a Deep Farward Neural Network. The last layer of this DFNN it does a sigmoid, so it can do binary classification.

In [22]:
class Model_wHours_ratio(kt.HyperModel):

  def build(self, hp):

      # it defines the widows size and the number of bin for the SAX
      window = hp.Int('window', 30, 120, step=10)
      n_bins = hp.Int('n_bins', 5, 10, step=1)


      input_shape = (window, n_bins)

      # inputs layer
      feature1 = Input(shape=input_shape, name="feature_1")
      feature2 = Input(shape=input_shape, name="feature_2")
      hours = Input(shape=(window, 1), name="hours")
      ratio = Input(shape=(window, 1), name="feature3")

      ratio = LayerNormalization()(ratio)

      # concatenate each feature with hours
      feature1_with_hours = Concatenate()([feature1, hours])
      feature2_with_hours = Concatenate()([feature2, hours])
      ratio_with_hours = Concatenate()([ratio, hours])



      # LSTM layers
      lstm_first = True
      n_lstm_layers = hp.Int('num_lstm_layers', 1, 10)
      for lstm_layer in range(n_lstm_layers):

        # it checks if it is on the last layer of LSTM
        if lstm_layer + 1 == n_lstm_layers:
          flag_sequence = False
        else:
          flag_sequence = True


        if lstm_first:
          lstmFeature1 = LSTM(hp.Int(f'{lstm_layer + 1} lstm layer feature 1 units', 5, 65, step=5), return_sequences = flag_sequence,
                              kernel_regularizer= l2(hp.Float(f'{lstm_layer + 1} lstm layer feature 1 l2_reg', min_value=1e-6, max_value=1e-1, sampling='log')))(feature1_with_hours)

          lstmFeature2 = LSTM(hp.Int(f'{lstm_layer + 1} lstm layer feature 2 units', 5, 65, step=5), return_sequences = flag_sequence,
                              kernel_regularizer= l2(hp.Float(f'{lstm_layer + 1} lstm layer feature 2 l2_reg', min_value=1e-6, max_value=1e-1, sampling='log')))(feature2_with_hours)

          lstmFeature3 = LSTM(hp.Int(f'{lstm_layer + 1} lstm layer feature 3 units', 5, 65, step=5), return_sequences = flag_sequence,
                              kernel_regularizer= l2(hp.Float(f'{lstm_layer + 1} lstm layer feature 3 l2_reg', min_value=1e-6, max_value=1e-1, sampling='log')))(ratio)

          lstmFeature3_hours = LSTM(hp.Int(f'{lstm_layer + 1} lstm layer feature 3(hours) units', 5, 65, step=5), return_sequences = flag_sequence,
                              kernel_regularizer= l2(hp.Float(f'{lstm_layer + 1} lstm layer feature 3(hours) l2_reg', min_value=1e-6, max_value=1e-1, sampling='log')))(ratio_with_hours)

        else:
          lstmFeature1 = LSTM(hp.Int(f'{lstm_layer + 1} lstm layer feature 1 units', 5, 65, step=5), return_sequences = flag_sequence,
                              kernel_regularizer= l2(hp.Float(f'{lstm_layer + 1} lstm layer feature 1 l2_reg', min_value=1e-6, max_value=1e-1, sampling='log')))(lstmFeature1)

          lstmFeature2 = LSTM(hp.Int(f'{lstm_layer + 1} lstm layer feature 2 units', 5, 65, step=5), return_sequences = flag_sequence,
                              kernel_regularizer= l2(hp.Float(f'{lstm_layer + 1} lstm layer feature 2 l2_reg', min_value=1e-6, max_value=1e-1, sampling='log')))(lstmFeature2)

          lstmFeature3 = LSTM(hp.Int(f'{lstm_layer + 1} lstm layer feature 3 units', 5, 65, step=5), return_sequences = flag_sequence,
                              kernel_regularizer= l2(hp.Float(f'{lstm_layer + 1} lstm layer feature 3 l2_reg', min_value=1e-6, max_value=1e-1, sampling='log')))(lstmFeature3)

          lstmFeature3_hours = LSTM(hp.Int(f'{lstm_layer + 1} lstm layer feature 3(hours) units', 5, 65, step=5), return_sequences = flag_sequence,
                              kernel_regularizer= l2(hp.Float(f'{lstm_layer + 1} lstm layer feature 3(hours) l2_reg', min_value=1e-6, max_value=1e-1, sampling='log')))(lstmFeature3_hours)


        # Dropout for each LSTM layer
        if hp.Boolean(f'{lstm_layer + 1} layer lstm dropout'):
          lstmFeature1 = Dropout(hp.Float(f'{lstm_layer + 1} lstm layer feature 1 dropout_rate', 0.1, 0.5, step=0.1))(lstmFeature1)
          lstmFeature2 = Dropout(hp.Float(f'{lstm_layer + 1} lstm layer feature 2 dropout_rate', 0.1, 0.5, step=0.1))(lstmFeature2)
          lstmFeature3 = Dropout(hp.Float(f'{lstm_layer + 1} lstm layer feature 3 dropout_rate', 0.1, 0.5, step=0.1))(lstmFeature3)
          lstmFeature3_hours = Dropout(hp.Float(f'{lstm_layer + 1} lstm layer feature 3(hours) dropout_rate', 0.1, 0.5, step=0.1))(lstmFeature3_hours)

      # merging the two LSTM layers
      x = Concatenate()([lstmFeature1, lstmFeature2, lstmFeature3,  lstmFeature3_hours])
      x = Dense(hp.Int('dim_reduction_layer__units', 1, x.shape[1]//2, step=1), activation='relu',
                kernel_regularizer = l2(hp.Float('dense dim reduction l2_reg', min_value=1e-6, max_value=1e-1, sampling='log')))(x)  # it helps to reduce the dimensionality


      # perceptron layers
      num_hidden_layers = hp.Int('num_hidden_layers', 1, 5)
      for layer in range(num_hidden_layers):

          x = Dense(hp.Int(f'{layer + 1} dense layer units', 4, 64, step=4), activation='relu',
                     kernel_regularizer = l2(hp.Float(f'{lstm_layer + 1} dense layer l2_reg', min_value=1e-6, max_value=1e-1, sampling='log')))(x)
          x = Dropout(hp.Float(f'{layer + 1} dropout_rate', 0, 0.5, step=0.1))(x)

      # output layer
      output = Dense(1, activation='sigmoid', kernel_regularizer= l2(hp.Float('output layer l2_reg', min_value=1e-5, max_value=1e-2, sampling='log')))(x)


      # compiling
      model = Model(inputs=[feature1, feature2, hours, ratio], outputs=output)
      model.compile(optimizer=keras.optimizers.Adam(
          learning_rate=hp.Float('lr', 1e-4, 1e-2, sampling='LOG'), ema_momentum=hp.Float('momentum', 1e-4, 1e-2, sampling='LOG')),
          loss=BinaryCrossentropy(),
          metrics=[AUC()])
      return model


                                  # as perc float    # as tuple
  def fit(self, hp, model, x, y, val_split = None, n_bins = None, window_size = None, **kwargs):

    if window_size is None:
      window_size = hp.get('window')
    if n_bins is None:
      n_bins= hp.get('n_bins')

    S = Shrinker(window_size)
    O = OutlierEliminator(ignored_indices = [2,3])
    SAX_trans = SAX_Transformer(n_bins, ignored_indices = [2,3])


    x_train, x_val, y_train, y_val = train_test_split(x, y, test_size= val_split, shuffle= False)


    x_train, y_train = S.fit_transform(x_train, y_train)
    x_train = S.get_logRatio(x_train, 1,0)

    O.fit(x_train)
    x_train, _ = O.transform(x_train, y_train)

    SAX_trans.fit(x_train)
    x_train, _ = SAX_trans.transform(x_train, y_train)


    x_val, y_val = S.transform(x_val, y_val)
    x_val = S.get_logRatio(x_val, 1,0)
    x_val, _ = O.transform(x_val, y_val)
    x_val, _ = SAX_trans.transform(x_val, y_val)


    input_vars_train = [x_train[:, :, 0: n_bins], x_train[:, :, n_bins:n_bins*2], x_train[:, :, n_bins*2], x_train[:, :, n_bins*2 +1]]
    input_vars_val = [x_val[:, :, 0: n_bins], x_val[:, :, n_bins:n_bins*2], x_val[:, :, n_bins*2], x_val[:, :, n_bins*2 +1]]


    return model.fit(x = input_vars_train, y = y_train, validation_data=(input_vars_val, y_val), **kwargs)

In this trial the input feature are ```Light```, ```CO2```, the ```hour``` and their logarithmic ratio

In [23]:
tuner_best_model = kt.Hyperband(
    Model_wHours_ratio(),
    objective=kt.Objective('val_loss', direction="min"),
    max_epochs=10,
    factor=3,
    seed= 42,
    directory='model selction',
    project_name='light-co2-hours_ratio_hb_new'
)


tuner_best_model.search(x =train[['Light', 'CO2', 'hour']], y = train['Occupancy'], epochs=5, val_split=0.15)

Reloading Tuner from model selction/light-co2-hours_ratio_hb_new/tuner0.json


In this trial the input feature are ```Light```, ```Humidity```, the ```hour``` and their logarithmic ratio

In [None]:
tuner_best_model = kt.Hyperband(
    Model_wHours_ratio(),
    objective=kt.Objective('val_loss', direction="min"),
    max_epochs=10,
    factor=3,
    seed= 42,
    directory='model selction',
    project_name='light-humidity-hours_ratio_lstm'
)


tuner_best_model.search(x =train[['Light', 'Humidity', 'hour']], y = train['Occupancy'], epochs=5, val_split=0.15)

Trial 26 Complete [00h 07m 19s]
val_loss: 0.07181131839752197

Best val_loss So Far: 0.04450763016939163
Total elapsed time: 01h 07m 28s

Search: Running Trial #27

Value             |Best Value So Far |Hyperparameter
110               |40                |window
7                 |6                 |n_bins
2                 |3                 |num_lstm_layers
10                |55                |1 lstm layer feature 1 units
0.00028213        |0.0001294         |1 lstm layer feature 1 l2_reg
25                |55                |1 lstm layer feature 2 units
0.0025305         |1.8533e-05        |1 lstm layer feature 2 l2_reg
45                |65                |1 lstm layer feature 3 units
0.058597          |4.1017e-05        |1 lstm layer feature 3 l2_reg
35                |45                |1 lstm layer feature 3(hours) units
0.0001191         |1.3543e-06        |1 lstm layer feature 3(hours) l2_reg
False             |True              |1 layer lstm dropout
3                 |4     

In [26]:
best_hp = tuner_best_model.get_best_hyperparameters()[0].values
window_size = best_hp['window']
n_bins = best_hp['n_bins']    # these will be usefull next


best_model = tuner_best_model.get_best_models(num_models=1)[0]
best_model.summary()

  saveable.load_own_variables(weights_store.get(inner_path))


## Other models

In this section, we explore two other different models to predict room occupancy, each designed with varying architectures and features. The goal is to evaluate the impact of specific features and network layers on the overall model performance.

**Models Explored:**

**Model with CNN Layers:** Incorporates Convolutional Neural Network (CNN) layers.

**Model with Hour Feature Only:** Focuses on leveraging the hour feature without additional feature transformations.

**Best Model (LSTM with Logarithmic Ratio):** Combines Long Short-Term Memory (LSTM) layers with the logarithmic ratio of features (Light and CO2) and the hour feature, achieving the highest performance.

**Key Findings:**

The first two models demonstrated lower performance compared to the best model.

Incorporating the logarithmic ratio of features alongside the hour feature in the best model significantly enhanced accuracy and F1 scores.

Through this comparison, we demonstrate how thoughtful feature engineering and the appropriate choice of model architecture can improve predictive performance.

### CNN-LSTM Model

In [None]:
class CNN_LSTM_Model(kt.HyperModel):
  def build(self, hp):
      # it defines the widows size and the number of bin for the SAX
      window_size = hp.Int('window', 30, 120, step=10)
      n_bins = hp.Int('n_bins', 5, 10, step=1)

      # defining cnn paramters here because otherwise they won't be tweaked idk why
      cnn_filters_feature_1 = hp.Int('cnn filters feature 1', 1, 5, step=1)
      cnn_filters_feature_2 = hp.Int('cnn filters feature 2', 1, 5, step=1)
      kernel_size_feature_1 = hp.Int('kernel_size feature 1', 2, 5, step=1)
      kernel_size_feature_2 = hp.Int('kernel_size feature 2', 2, 5, step=1)
      pool_size_feature_1 = hp.Int('pool_size feature 1', 2, 5, step=1)
      pool_size_feature_2 = hp.Int('pool_size feature 2', 2, 5, step=1)


      input_shape = (window_size, n_bins)

      # inputs layer
      feature1 = Input(shape=input_shape, name="feature_1")
      feature2 = Input(shape=input_shape, name="feature_2")


      # CNN as an embedding layer
      if hp.Boolean('cnn'):

        cnn1 = Conv1D(filters = cnn_filters_feature_1, kernel_size = kernel_size_feature_1, activation='relu',
                      kernel_regularizer= l2(hp.Float('cnn feature 1 l2_reg', min_value=1e-5, max_value=1e-2, sampling='log')))(feature1)

        cnn2 = Conv1D(filters = cnn_filters_feature_2, kernel_size = kernel_size_feature_2, activation='relu',
                      kernel_regularizer= l2(hp.Float('cnn feature 1 l2_reg', min_value=1e-5, max_value=1e-2, sampling='log')))(feature2)

        # pooling layers
        if hp.Boolean('avg_pool'):
          cnn1 = AveragePooling1D(pool_size_feature_1)(cnn1)
          cnn2 = AveragePooling1D(pool_size_feature_2)(cnn2)
        else:
          cnn1 = MaxPooling1D(pool_size_feature_1)(cnn1)
          cnn2 = MaxPooling1D(pool_size_feature_2)(cnn2)


      # LSTM layers
      lstm_first = True
      n_lstm_layers = hp.Int('num_lstm_layers', 1, 10)
      for lstm_layer in range(n_lstm_layers):

        # it checks if it is on the last layer of LSTM
        if lstm_layer + 1 == n_lstm_layers:
          flag_sequence = False
        else:
          flag_sequence = True


        if lstm_first:
          lstmFeature1 = LSTM(hp.Int(f'{lstm_layer + 1} lstm layer feature 1 units', 5, 65, step=5), return_sequences = flag_sequence,
                              kernel_regularizer= l2(hp.Float(f'{lstm_layer + 1} lstm layer feature 1 l2_reg', min_value=1e-6, max_value=1e-1, sampling='log')))(cnn1 if hp.Boolean('cnn') else feature1)

          lstmFeature2 = LSTM(hp.Int(f'{lstm_layer + 1} lstm layer feature 2 units', 5, 65, step=5), return_sequences = flag_sequence,
                              kernel_regularizer= l2(hp.Float(f'{lstm_layer + 1} lstm layer feature 2 l2_reg', min_value=1e-6, max_value=1e-1, sampling='log')))(cnn2 if hp.Boolean('cnn') else feature2)


        else:
          lstmFeature1 = LSTM(hp.Int(f'{lstm_layer + 1} lstm layer feature 1 units', 5, 65, step=5), return_sequences = flag_sequence,
                              kernel_regularizer= l2(hp.Float(f'{lstm_layer + 1} lstm layer feature 1 l2_reg', min_value=1e-6, max_value=1e-1, sampling='log')))(lstmFeature1)

          lstmFeature2 = LSTM(hp.Int(f'{lstm_layer + 1} lstm layer feature 2 units', 5, 65, step=5), return_sequences = flag_sequence,
                              kernel_regularizer= l2(hp.Float(f'{lstm_layer + 1} lstm layer feature 2 l2_reg', min_value=1e-6, max_value=1e-1, sampling='log')))(lstmFeature2)



        # Dropout for each LSTM layer
        if hp.Boolean(f'{lstm_layer + 1} layer lstm dropout'):
          lstmFeature1 = Dropout(hp.Float(f'{lstm_layer + 1} lstm layer feature 1 dropout_rate', 0.1, 0.5, step=0.1))(lstmFeature1)
          lstmFeature2 = Dropout(hp.Float(f'{lstm_layer + 1} lstm layer feature 2 dropout_rate', 0.1, 0.5, step=0.1))(lstmFeature2)

      # merging the two LSTM layers
      x = Concatenate()([lstmFeature1, lstmFeature2])
      x = Dense(hp.Int('dim_reduction_layer__units', 1, x.shape[1]//2, step=1), activation='relu',
                kernel_regularizer = l2(hp.Float('dense dim reduction l2_reg', min_value=1e-6, max_value=1e-1, sampling='log')))(x)  # it helps to reduce the dimensionality


      # perceptron layers
      num_hidden_layers = hp.Int('num_hidden_layers', 1, 5)
      for layer in range(num_hidden_layers):

          x = Dense(hp.Int(f'{layer + 1} dense layer units', 4, 64, step=4), activation='relu',
                     kernel_regularizer = l2(hp.Float(f'{lstm_layer + 1} dense layer l2_reg', min_value=1e-6, max_value=1e-1, sampling='log')))(x)
          x = Dropout(hp.Float(f'{layer + 1} dropout_rate', 0, 0.5, step=0.1))(x)

      # output layer
      output = Dense(1, activation='sigmoid', kernel_regularizer= l2(hp.Float('output layer l2_reg', min_value=1e-5, max_value=1e-2, sampling='log')))(x)


      # compiling
      model = Model(inputs=[feature1, feature2], outputs=output)
      model.compile(optimizer=keras.optimizers.Adam(
          learning_rate=hp.Float('lr', 1e-4, 1e-2, sampling='LOG'), ema_momentum=hp.Float('momentum', 1e-4, 1e-2, sampling='LOG')),
          loss=BinaryCrossentropy(),
          metrics=[AUC()])
      return model

  def fit(self, hp, model, x, y, val_split = None, n_bins = None, window_size = None, **kwargs):

    if window_size is None:
      window_size = hp.get('window')
    if n_bins is None:
      n_bins= hp.get('n_bins')

    S = Shrinker(window_size)
    O = OutlierEliminator()
    SAX_trans = SAX_Transformer(n_bins)


    x_train, x_val, y_train, y_val = train_test_split(x, y, test_size= val_split, shuffle= False)


    x_train, y_train = S.fit_transform(x_train, y_train)

    O.fit(x_train)
    x_train, _ = O.transform(x_train, y_train)

    SAX_trans.fit(x_train)
    x_train, _ = SAX_trans.transform(x_train, y_train)


    x_val, y_val = S.transform(x_val, y_val)
    x_val, _ = O.transform(x_val, y_val)
    x_val, _ = SAX_trans.transform(x_val, y_val)


    input_vars_train = [x_train[:, :, 0: n_bins], x_train[:, :, n_bins:n_bins*2]]
    input_vars_val = [x_val[:, :, 0: n_bins], x_val[:, :, n_bins:n_bins*2]]


    return model.fit(x = input_vars_train, y = y_train, validation_data=(input_vars_val, y_val), **kwargs)

In [None]:
tuner_cnn_lstm_model = kt.Hyperband(
    CNN_LSTM_Model(),
    objective=kt.Objective('val_loss', direction="min"),
    max_epochs=10,
    factor=3,
    seed= 42,
    directory='model selction',
    project_name='cnn_lstm_light_co2_model'
)


tuner_cnn_lstm_model.search(x =train[['Light', 'CO2']], y = train['Occupancy'], epochs=5, val_split=0.15)

Trial 16 Complete [00h 01m 24s]
val_loss: 0.3785426914691925

Best val_loss So Far: 0.36312299966812134
Total elapsed time: 00h 17m 15s

Search: Running Trial #17

Value             |Best Value So Far |Hyperparameter
100               |110               |window
7                 |9                 |n_bins
3                 |2                 |cnn filters feature 1
3                 |1                 |cnn filters feature 2
3                 |3                 |kernel_size feature 1
3                 |5                 |kernel_size feature 2
4                 |5                 |pool_size feature 1
3                 |3                 |pool_size feature 2
True              |False             |cnn
3                 |3                 |num_lstm_layers
20                |60                |1 lstm layer feature 1 units
0.0042735         |0.012788          |1 lstm layer feature 1 l2_reg
40                |20                |1 lstm layer feature 2 units
0.00068457        |0.0035885         |1

KeyboardInterrupt: 

### LSTM Model with hours

In [None]:
class Model_wHours(kt.HyperModel):

  def build(self, hp):

      # it defines the widows size and the number of bin for the SAX
      window_size = hp.Int('window', 30, 120, step=10)
      n_bins = hp.Int('n_bins', 5, 10, step=1)


      input_shape = (window_size, n_bins)

      # inputs layer
      feature1 = Input(shape=input_shape, name="feature_1")
      feature2 = Input(shape=input_shape, name="feature_2")
      hours = Input(shape=(window_size, 1), name="hours")


      # concatenate each feature with hours
      feature1_with_hours = Concatenate()([feature1, hours])
      feature2_with_hours = Concatenate()([feature2, hours])



      # LSTM layers
      lstm_first = True
      n_lstm_layers = hp.Int('num_lstm_layers', 1, 10)
      for lstm_layer in range(n_lstm_layers):

        # it checks if it is on the last layer of LSTM
        if lstm_layer + 1 == n_lstm_layers:
          flag_sequence = False
        else:
          flag_sequence = True


        if lstm_first:
          lstmFeature1 = LSTM(hp.Int(f'{lstm_layer + 1} lstm layer feature 1 units', 5, 65, step=5), return_sequences = flag_sequence,
                              kernel_regularizer= l2(hp.Float(f'{lstm_layer + 1} lstm layer feature 1 l2_reg', min_value=1e-6, max_value=1e-1, sampling='log')))(feature1_with_hours)

          lstmFeature2 = LSTM(hp.Int(f'{lstm_layer + 1} lstm layer feature 2 units', 5, 65, step=5), return_sequences = flag_sequence,
                              kernel_regularizer= l2(hp.Float(f'{lstm_layer + 1} lstm layer feature 2 l2_reg', min_value=1e-6, max_value=1e-1, sampling='log')))(feature2_with_hours)


        else:
          lstmFeature1 = LSTM(hp.Int(f'{lstm_layer + 1} lstm layer feature 1 units', 5, 65, step=5), return_sequences = flag_sequence,
                              kernel_regularizer= l2(hp.Float(f'{lstm_layer + 1} lstm layer feature 1 l2_reg', min_value=1e-6, max_value=1e-1, sampling='log')))(lstmFeature1)

          lstmFeature2 = LSTM(hp.Int(f'{lstm_layer + 1} lstm layer feature 2 units', 5, 65, step=5), return_sequences = flag_sequence,
                              kernel_regularizer= l2(hp.Float(f'{lstm_layer + 1} lstm layer feature 2 l2_reg', min_value=1e-6, max_value=1e-1, sampling='log')))(lstmFeature2)


        # Dropout for each LSTM layer
        if hp.Boolean(f'{lstm_layer + 1} layer lstm dropout'):
          lstmFeature1 = Dropout(hp.Float(f'{lstm_layer + 1} lstm layer feature 1 dropout_rate', 0.1, 0.5, step=0.1))(lstmFeature1)
          lstmFeature2 = Dropout(hp.Float(f'{lstm_layer + 1} lstm layer feature 2 dropout_rate', 0.1, 0.5, step=0.1))(lstmFeature2)


      # merging the two LSTM layers
      x = Concatenate()([lstmFeature1, lstmFeature2])
      x = Dense(hp.Int('dim_reduction_layer__units', 1, x.shape[1]//2, step=1), activation='relu',
                kernel_regularizer = l2(hp.Float('dense dim reduction l2_reg', min_value=1e-6, max_value=1e-1, sampling='log')))(x)  # it helps to reduce the dimensionality


      # perceptron layers
      num_hidden_layers = hp.Int('num_hidden_layers', 1, 5)
      for layer in range(num_hidden_layers):

          x = Dense(hp.Int(f'{layer + 1} dense layer units', 4, 64, step=4), activation='relu',
                     kernel_regularizer = l2(hp.Float(f'{lstm_layer + 1} dense layer l2_reg', min_value=1e-6, max_value=1e-1, sampling='log')))(x)
          x = Dropout(hp.Float(f'{layer + 1} dropout_rate', 0, 0.5, step=0.1))(x)

      # output layer
      output = Dense(1, activation='sigmoid', kernel_regularizer= l2(hp.Float('output layer l2_reg', min_value=1e-5, max_value=1e-2, sampling='log')))(x)


      # compiling
      model = Model(inputs=[feature1, feature2, hours], outputs=output)
      model.compile(optimizer=keras.optimizers.Adam(
          learning_rate=hp.Float('lr', 1e-4, 1e-2, sampling='LOG'), ema_momentum=hp.Float('momentum', 1e-4, 1e-2, sampling='LOG')),
          loss=BinaryCrossentropy(),
          metrics=[AUC()])
      return model


                                  # as perc float    # as tuple
  def fit(self, hp, model, x, y, val_split = None, n_bins = None, window_size = None, **kwargs):

    if window_size is None:
      window_size = hp.get('window')
    if n_bins is None:
      n_bins= hp.get('n_bins')

    S = Shrinker(window_size)
    O = OutlierEliminator(ignored_indices = [2])
    SAX_trans = SAX_Transformer(n_bins, ignored_indices = [2])


    x_train, x_val, y_train, y_val = train_test_split(x, y, test_size= val_split, shuffle= False)


    x_train, y_train = S.fit_transform(x_train, y_train)

    O.fit(x_train)
    x_train, _ = O.transform(x_train, y_train)

    SAX_trans.fit(x_train)
    x_train, _ = SAX_trans.transform(x_train, y_train)


    x_val, y_val = S.transform(x_val, y_val)
    x_val, _ = O.transform(x_val, y_val)
    x_val, _ = SAX_trans.transform(x_val, y_val)


    input_vars_train = [x_train[:, :, 0: n_bins], x_train[:, :, n_bins:n_bins*2], x_train[:, :, n_bins*2]]
    input_vars_val = [x_val[:, :, 0: n_bins], x_val[:, :, n_bins:n_bins*2], x_val[:, :, n_bins*2]]


    return model.fit(x = input_vars_train, y = y_train, validation_data=(input_vars_val, y_val), **kwargs)

In [None]:
tuner_lstm_hour_model = kt.Hyperband(
    Model_wHours(),
    objective=kt.Objective('val_loss', direction="min"),
    max_epochs=10,
    factor=3,
    seed= 42,
    directory='model selction',
    project_name='lstm_light_co2_hour_hb'
)


tuner_lstm_hour_model.search(x =train[['Light', 'CO2', 'hour']], y = train['Occupancy'], epochs=5, val_split=0.15)

Trial 3 Complete [00h 02m 12s]
val_loss: 0.374039888381958

Best val_loss So Far: 0.374039888381958
Total elapsed time: 00h 04m 55s

Search: Running Trial #4

Value             |Best Value So Far |Hyperparameter
120               |110               |window
6                 |9                 |n_bins
6                 |3                 |num_lstm_layers
35                |10                |1 lstm layer feature 1 units
0.0011434         |2.4999e-05        |1 lstm layer feature 1 l2_reg
45                |65                |1 lstm layer feature 2 units
9.3886e-05        |0.013673          |1 lstm layer feature 2 l2_reg
False             |False             |1 layer lstm dropout
3                 |1                 |dim_reduction_layer__units
1.8558e-06        |1.6021e-05        |dense dim reduction l2_reg
3                 |5                 |num_hidden_layers
8                 |56                |1 dense layer units
1.2137e-06        |1.5868e-05        |1 dense layer l2_reg
0           

KeyboardInterrupt: 

# Cross-Validation

In [27]:
S = Shrinker(window_size)
O = OutlierEliminator(ignored_indices = [2,3])
SAX_trans = SAX_Transformer(n_bins, ignored_indices = [2,3])

data = train[['Light', 'CO2', 'hour','Occupancy']].copy()
n_fold = 10
reports = list()


for i, (train_index, val_index) in enumerate(KFold(n_splits=n_fold).split(data)):
  print(f'{i+1} fold')


  train = data.iloc[train_index]
  validation = data.iloc[val_index]



  x_train, y_train = S.fit_transform(train[['Light', 'CO2', 'hour']], train['Occupancy'])
  x_val, y_val = S.transform(validation[['Light', 'CO2', 'hour']], validation['Occupancy'])
  x_train = S.get_logRatio(x_train, 1,0)
  x_val = S.get_logRatio(x_val, 1,0)

  x_train, _ = O.fit_transform(x_train, y_train)
  x_val, _ = O.transform(x_val, y_val)

  x_train, _ = SAX_trans.fit_transform(x_train, y_train)
  x_val, _ = SAX_trans.transform(x_val, y_val)

  input_vars_train = [x_train[:, :, 0: n_bins], x_train[:, :, n_bins:n_bins*2], x_train[:, :, n_bins*2], x_train[:, :, n_bins*2 +1]]
  input_vars_val = [x_val[:, :, 0: n_bins], x_val[:, :, n_bins:n_bins*2], x_val[:, :, n_bins*2], x_val[:, :, n_bins*2 +1]]


  best_model = tuner_best_model.get_best_models(num_models=1)[0]    # in this way the model resets itself through each fold
  best_model.fit(input_vars_train ,y_train, epochs = 10)
  y_pred = best_model.predict(input_vars_val)
  y_pred = np.where(y_pred > 0.5, 1,0).astype(int)

  reports.append(Report(y_val, y_pred))


reports[0].show_report()

1 fold


  saveable.load_own_variables(weights_store.get(inner_path))


Epoch 1/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m43s[0m 84ms/step - auc: 0.9373 - loss: 0.2136
Epoch 2/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m39s[0m 79ms/step - auc: 0.9706 - loss: 0.1564
Epoch 3/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m28s[0m 79ms/step - auc: 0.9803 - loss: 0.1208
Epoch 4/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m28s[0m 80ms/step - auc: 0.9877 - loss: 0.1023
Epoch 5/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 80ms/step - auc: 0.9873 - loss: 0.1012
Epoch 6/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m39s[0m 75ms/step - auc: 0.9852 - loss: 0.1126
Epoch 7/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m44s[0m 83ms/step - auc: 0.9685 - loss: 0.1653
Epoch 8/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m43s[0m 90ms/step - auc: 0.9814 - loss: 0.1351
Epoch 9/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0

  saveable.load_own_variables(weights_store.get(inner_path))


Epoch 1/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m39s[0m 80ms/step - auc: 0.9453 - loss: 0.2068
Epoch 2/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m40s[0m 78ms/step - auc: 0.9751 - loss: 0.1459
Epoch 3/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m39s[0m 74ms/step - auc: 0.9644 - loss: 0.1664
Epoch 4/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m42s[0m 78ms/step - auc: 0.9712 - loss: 0.1526
Epoch 5/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 78ms/step - auc: 0.9753 - loss: 0.1448
Epoch 6/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 78ms/step - auc: 0.9876 - loss: 0.1188
Epoch 7/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m43s[0m 83ms/step - auc: 0.9827 - loss: 0.1306
Epoch 8/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m26s[0m 73ms/step - auc: 0.9928 - loss: 0.0932
Epoch 9/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0

  saveable.load_own_variables(weights_store.get(inner_path))


Epoch 1/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m39s[0m 81ms/step - auc: 0.8122 - loss: 0.4203
Epoch 2/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m28s[0m 80ms/step - auc: 0.9745 - loss: 0.1415
Epoch 3/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m42s[0m 84ms/step - auc: 0.9831 - loss: 0.1264
Epoch 4/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m38s[0m 76ms/step - auc: 0.9618 - loss: 0.1742
Epoch 5/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m43s[0m 83ms/step - auc: 0.9733 - loss: 0.1429
Epoch 6/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m30s[0m 87ms/step - auc: 0.9875 - loss: 0.1052
Epoch 7/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m39s[0m 80ms/step - auc: 0.9887 - loss: 0.1038
Epoch 8/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 79ms/step - auc: 0.9913 - loss: 0.0959
Epoch 9/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0

  saveable.load_own_variables(weights_store.get(inner_path))


Epoch 1/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m40s[0m 82ms/step - auc: 0.9541 - loss: 0.1807
Epoch 2/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m40s[0m 79ms/step - auc: 0.9858 - loss: 0.1000
Epoch 3/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m42s[0m 81ms/step - auc: 0.9883 - loss: 0.0979
Epoch 4/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m38s[0m 74ms/step - auc: 0.9896 - loss: 0.0831
Epoch 5/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m45s[0m 87ms/step - auc: 0.9898 - loss: 0.0906
Epoch 6/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m28s[0m 79ms/step - auc: 0.9921 - loss: 0.0806
Epoch 7/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 78ms/step - auc: 0.9940 - loss: 0.0692
Epoch 8/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m42s[0m 82ms/step - auc: 0.9958 - loss: 0.0635
Epoch 9/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


5 fold


  saveable.load_own_variables(weights_store.get(inner_path))


Epoch 1/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 83ms/step - auc: 0.9441 - loss: 0.2158
Epoch 2/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 82ms/step - auc: 0.9672 - loss: 0.1555
Epoch 3/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m40s[0m 79ms/step - auc: 0.9714 - loss: 0.1512
Epoch 4/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 79ms/step - auc: 0.9673 - loss: 0.1622
Epoch 5/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m27s[0m 78ms/step - auc: 0.9692 - loss: 0.1569
Epoch 6/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m40s[0m 74ms/step - auc: 0.9703 - loss: 0.1538
Epoch 7/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 74ms/step - auc: 0.9818 - loss: 0.1235
Epoch 8/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m42s[0m 76ms/step - auc: 0.9855 - loss: 0.1154
Epoch 9/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  saveable.load_own_variables(weights_store.get(inner_path))


Epoch 1/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m39s[0m 79ms/step - auc: 0.9535 - loss: 0.1886
Epoch 2/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m28s[0m 79ms/step - auc: 0.9797 - loss: 0.1171
Epoch 3/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m43s[0m 84ms/step - auc: 0.9802 - loss: 0.1264
Epoch 4/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m37s[0m 73ms/step - auc: 0.9871 - loss: 0.1076
Epoch 5/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m43s[0m 78ms/step - auc: 0.9923 - loss: 0.0843
Epoch 6/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m43s[0m 85ms/step - auc: 0.9958 - loss: 0.0655
Epoch 7/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 84ms/step - auc: 0.9939 - loss: 0.0687
Epoch 8/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m39s[0m 78ms/step - auc: 0.9954 - loss: 0.0610
Epoch 9/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0

  saveable.load_own_variables(weights_store.get(inner_path))


Epoch 1/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 83ms/step - auc: 0.9488 - loss: 0.2038
Epoch 2/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 85ms/step - auc: 0.9624 - loss: 0.1902
Epoch 3/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m30s[0m 85ms/step - auc: 0.9725 - loss: 0.1633
Epoch 4/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m29s[0m 83ms/step - auc: 0.9749 - loss: 0.1423
Epoch 5/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m27s[0m 75ms/step - auc: 0.9806 - loss: 0.1310
Epoch 6/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m42s[0m 78ms/step - auc: 0.9733 - loss: 0.1569
Epoch 7/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m28s[0m 81ms/step - auc: 0.9812 - loss: 0.1186
Epoch 8/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m28s[0m 80ms/step - auc: 0.9878 - loss: 0.1105
Epoch 9/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0

  saveable.load_own_variables(weights_store.get(inner_path))


Epoch 1/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m47s[0m 93ms/step - auc: 0.9421 - loss: 0.2104
Epoch 2/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m29s[0m 84ms/step - auc: 0.9697 - loss: 0.1482
Epoch 3/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m38s[0m 76ms/step - auc: 0.9660 - loss: 0.1819
Epoch 4/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m43s[0m 81ms/step - auc: 0.9673 - loss: 0.1452
Epoch 5/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m42s[0m 84ms/step - auc: 0.9635 - loss: 0.1562
Epoch 6/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m39s[0m 79ms/step - auc: 0.9635 - loss: 0.1494
Epoch 7/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 78ms/step - auc: 0.9693 - loss: 0.1390
Epoch 8/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m40s[0m 76ms/step - auc: 0.9697 - loss: 0.1390
Epoch 9/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0

  saveable.load_own_variables(weights_store.get(inner_path))


Epoch 1/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m39s[0m 79ms/step - auc: 0.9425 - loss: 0.2008
Epoch 2/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m40s[0m 77ms/step - auc: 0.9753 - loss: 0.1372
Epoch 3/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m40s[0m 73ms/step - auc: 0.9724 - loss: 0.1481
Epoch 4/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m43s[0m 80ms/step - auc: 0.9784 - loss: 0.1305
Epoch 5/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m43s[0m 85ms/step - auc: 0.9752 - loss: 0.1313
Epoch 6/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m39s[0m 80ms/step - auc: 0.9761 - loss: 0.1286
Epoch 7/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m42s[0m 82ms/step - auc: 0.9830 - loss: 0.1189
Epoch 8/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 81ms/step - auc: 0.9944 - loss: 0.0781
Epoch 9/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0

  saveable.load_own_variables(weights_store.get(inner_path))


Epoch 1/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m40s[0m 82ms/step - auc: 0.9454 - loss: 0.2140
Epoch 2/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m42s[0m 85ms/step - auc: 0.9812 - loss: 0.1250
Epoch 3/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m40s[0m 83ms/step - auc: 0.9867 - loss: 0.1064
Epoch 4/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m38s[0m 75ms/step - auc: 0.9847 - loss: 0.1168
Epoch 5/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m42s[0m 78ms/step - auc: 0.9752 - loss: 0.1485
Epoch 6/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m28s[0m 81ms/step - auc: 0.9871 - loss: 0.1170
Epoch 7/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 80ms/step - auc: 0.9921 - loss: 0.0930
Epoch 8/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 81ms/step - auc: 0.9818 - loss: 0.1416
Epoch 9/10
[1m351/351[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [28]:
avg_acc = 0
avg_f1_0 = 0
avg_f1_1 = 0
avg_precision_0 = 0
avg_precision_1 = 0
avg_recall_0 = 0
avg_recall_1 = 0

for fold, report in enumerate(reports):
  print(f'Fold {fold+1}')

  avg_acc += report.get_accuracy()
  avg_f1_0 += report.get_f1_class_0()
  avg_f1_1 += report.get_f1_class_1()
  avg_precision_0 += report.get_precision_0()
  avg_precision_1 += report.get_precision_1()
  avg_recall_0 += report.get_recall_0()
  avg_recall_1 += report.get_recall_1()

  report.show_report()
  print()

print('average metrics: \n')
print(f'Average accuracy: {avg_acc/n_fold}')
print(f'Average F1-score for class 0: {avg_f1_0/n_fold}')
print(f'Average F1-score for class 1: {avg_f1_1/n_fold}')
print(f'Average Precision for class 0: {avg_precision_0/n_fold}')
print(f'Average Precision for class 1: {avg_precision_1/n_fold}')
print(f'Average Recall for class 0: {avg_recall_0/n_fold}')
print(f'Average Recall for class 1: {avg_recall_1/n_fold}')

Fold 1
Accuracy: 0.9430485762144054
F1-score for class 0: 0.9592326139088729
F1-score for class 1: 0.9055555555555556
Precision for class 0: 0.9852216748768473
Precision for class 1: 0.8534031413612565
Recall for class 0: 0.9345794392523364
Recall for class 1: 0.9644970414201184
[[800  56]
 [ 12 326]]

Fold 2
Accuracy: 0.990787269681742
F1-score for class 0: 0.9933049300060864
F1-score for class 1: 0.9852348993288591
Precision for class 0: 1.0
Precision for class 1: 0.9708994708994709
Recall for class 0: 0.9866989117291415
Recall for class 1: 1.0
[[816  11]
 [  0 367]]

Fold 3
Accuracy: 0.9932998324958124
F1-score for class 0: 0.9955056179775281
F1-score for class 1: 0.9868421052631579
Precision for class 0: 0.9977477477477478
Precision for class 1: 0.9803921568627451
Recall for class 0: 0.9932735426008968
Recall for class 1: 0.9933774834437086
[[886   6]
 [  2 300]]

Fold 4
Accuracy: 0.8542713567839196
F1-score for class 0: 0.9214092140921409
F1-score for class 1: 0.0
Precision for cl

# Model evaluation

## Preprocessing

In [29]:
# Load data
P = Preprocessing(PATH, 'training_ts.csv', 'test_ts.csv')
train = P.get_train()
test = P.get_test()


# Shrinking
S = Shrinker(window_size = window_size)
x_train_shrinked, y_train = S.fit_transform(train[['Light', 'CO2', 'hour']], train['Occupancy'])
x_test_shrinked, y_test = S.transform(test[['Light', 'CO2', 'hour']], test['Occupancy'])

# Getting logRatio feature
x_train_shrinked = S.get_logRatio(x_train_shrinked, 1, 0)
x_test_shrinked = S.get_logRatio(x_test_shrinked, 1, 0)


# Outlier elimination
O = OutlierEliminator(ignored_indices = [2,3])
x_train_shrinked, _ = O.fit_transform(x_train_shrinked)
x_test_shrinked, _ = O.transform(x_test_shrinked)


# SAX transformation
SAX = SAX_Transformer(n_bins = n_bins, ignored_indices = [2,3])
x_train_sax, _ = SAX.fit_transform(x_train_shrinked)
x_test_sax, _ = SAX.transform(x_test_shrinked)

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## Training

In [30]:
input_vars_train = [x_train_sax[:, :, 0: n_bins], x_train_sax[:, :, n_bins:n_bins*2],
                    x_train_sax[:, :, n_bins*2], x_train_sax[:, :, n_bins*2 +1]]

best_model = tuner_best_model.get_best_models(num_models=1)[0]  # model reset
best_model.fit(input_vars_train ,y_train, epochs = 10)

  saveable.load_own_variables(weights_store.get(inner_path))


Epoch 1/10
[1m390/390[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m42s[0m 78ms/step - auc: 0.9480 - loss: 0.2027
Epoch 2/10
[1m390/390[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m30s[0m 78ms/step - auc: 0.9745 - loss: 0.1350
Epoch 3/10
[1m390/390[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m42s[0m 79ms/step - auc: 0.9790 - loss: 0.1257
Epoch 4/10
[1m390/390[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 78ms/step - auc: 0.9864 - loss: 0.1116
Epoch 5/10
[1m390/390[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 80ms/step - auc: 0.9902 - loss: 0.0961
Epoch 6/10
[1m390/390[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m43s[0m 84ms/step - auc: 0.9864 - loss: 0.1108
Epoch 7/10
[1m390/390[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m42s[0m 87ms/step - auc: 0.9895 - loss: 0.0978
Epoch 8/10
[1m390/390[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m38s[0m 79ms/step - auc: 0.9909 - loss: 0.0898
Epoch 9/10
[1m390/390[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0

<keras.src.callbacks.history.History at 0x7e23d763f250>

## Prediction

In [31]:
input_vars_test = [x_test_sax[:, :, 0: n_bins], x_test_sax[:, :, n_bins:n_bins*2],
                    x_test_sax[:, :, n_bins*2], x_test_sax[:, :, n_bins*2 +1]]

y_pred = best_model.predict(input_vars_test)
y_pred = np.where(y_pred > 0.5, 1,0).astype(int)

[1m166/166[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 25ms/step


## Results

In [32]:
Report(y_test, y_pred).show_report()

Accuracy: 0.98474001507159
F1-score for class 0: 0.9904492394764768
F1-score for class 1: 0.9620608899297424
Precision for class 0: 0.9836065573770492
Precision for class 1: 0.9894026974951831
Recall for class 0: 0.9973877938731893
Recall for class 1: 0.9361896080218779
[[4200   11]
 [  70 1027]]
