# Autoencoder

A good way to detect drift in time series data is to use an autoencoder. An autoencoder is a machine learning model which tries to reproduce its inputs. It will do this by first reducing the inputs to a smaller representation trough an encoder. Afterwards a decoder will reconstruct the input from this smaller representation. This means that when the autoencoder is trained on a certain dataset, it can reproduce those inputs well, but when data data distribution has drifted, it will not be able to reproduce it's inputs anymore. This way we can detect when there is drift. 

<img src="images/autoencoder.png"   />

In [None]:
import torch.nn as nn
import torch
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from utils.data_utils_ae import split_in_sequences, create_dataloader, split_train_test_dataset

## Model definition
You can create your autoencoder model like any other pytorch model. The only difference is that the dimension size first reduces and than grows again to its original size. A lot of people however, also split the autoencoder into the encoder and the decoder part. So first we define our encoder.

In [None]:
class Encoder(nn.Module):
    def __init__(self, seq_len):
        
        super(Encoder, self).__init__()
        self.seq_len = seq_len
        
        self.layer1 = nn.Linear(self.seq_len,[fill_in], bias=True)
        nn.init.kaiming_normal_(self.layer1.weight, mode="fan_in", nonlinearity="relu")
        self.act1 = nn.ReLU()
        self.layer2 = nn.Linear([fill_in],[fill_in], bias=True)
        nn.init.kaiming_normal_(self.layer2.weight, mode="fan_in", nonlinearity="relu")
        self.act2 = nn.ReLU()
        self.layer3 = nn.Linear([fill_in],[fill_in], bias=True)
        nn.init.kaiming_normal_(self.layer3.weight, mode="fan_in", nonlinearity="relu")
        self.act3 = nn.ReLU()
        self.layer4 = nn.Linear([fill_in],[fill_in], bias=True)
        nn.init.kaiming_normal_(self.layer4.weight, mode="fan_in", nonlinearity="relu")
        self.act4 = nn.ReLU()


    def forward(self, input_tensor):
        x = self.act1(self.layer1(input_tensor))
        x = self.act2(self.layer2(x))
        x = self.act3(self.layer3(x))
        x = self.act4(self.layer4(x))
        return x

Then we define our decoder. Maker sure the sizes of the hidden layers are the same as the sizes of the encoder reversed. The first layer of the decoder is the embedding dimension, it is a linear layer that has the same input and output size, namely the embedding dimension size.

In [None]:
class Decoder(nn.Module):
    def __init__(self, seq_len):
        super(Decoder, self).__init__()
        self.seq_len = seq_len
        self.layer0 = nn.Linear([fill_in],[fill_in], bias=True)
        nn.init.kaiming_normal_(self.layer0.weight, mode="fan_in", nonlinearity="relu")
        self.act0 = nn.ReLU()
        self.layer1 = nn.Linear([fill_in],[fill_in], bias=True)
        nn.init.kaiming_normal_(self.layer1.weight, mode="fan_in", nonlinearity="relu")
        self.act1 = nn.ReLU()
        self.layer2 = nn.Linear([fill_in],[fill_in], bias=True)
        nn.init.kaiming_normal_(self.layer2.weight, mode="fan_in", nonlinearity="relu")
        self.act2 = nn.ReLU()
        self.layer3 = nn.Linear([fill_in],[fill_in], bias=True)
        nn.init.kaiming_normal_(self.layer3.weight, mode="fan_in", nonlinearity="relu")
        self.act3 = nn.ReLU()
        self.layer4 = nn.Linear([fill_in],self.seq_len, bias=True)
        nn.init.kaiming_normal_(self.layer4.weight, mode="fan_in", nonlinearity="relu")
        self.act4 = nn.ReLU()

    def forward(self, input_tensor):
        x = self.act0(self.layer0(input_tensor))
        x = self.act1(self.layer1(x))
        x = self.act2(self.layer2(x))
        x = self.act3(self.layer3(x))
        x = self.act4(self.layer4(x))
        return x

Lastly we bring the two together in one class, the autoencoder. Here we also define the loss function. Choose a good autoencoder loss function. As help you can look at this [page](https://www.geeksforgeeks.org/loss-functions-in-deep-learning/). Once you have chosen your loss function you can add it in the fill_in_loss using the torch.nn function [page](https://docs.pytorch.org/docs/stable/nn.html#loss-functions).

In [None]:
    
class EncoderDecoder(nn.Module):
    def __init__(self, train_params, seq_len):

        super(EncoderDecoder, self).__init__()

        self.n_epochs = train_params["epochs"]
        self.batch_size = train_params["batch_size"]

        self.seq_len = seq_len
        
        self.criterion = [fill_in_loss]

        self.encoder = Encoder(seq_len)
        self.decoder = Decoder(seq_len)


    def forward(self, input_tensor):
        
        x = self.encoder(input_tensor)
        x = self.decoder(x)
    
        return x

Next we define our train function. This is a standard pytorch trainingloop. The only difference is that our inputs are also our true labels in the loss function and other evaluation metric calculation. Choose a good evaluation metric to use next to your loss function to evaluate your predictions on. Again, [here](https://www.geeksforgeeks.org/regression-metrics/) is a page that can help you. Once you have chosen the metric import it from the sklearn metrics and add it to the train_model function.

In [None]:
from sklearn.metrics import 

In [None]:
def train_model(model, train_dataset, val_dataset, train_params):
    """Trains the machine learning model. Returns the loss and evaluation metric history of the training 
    and the model with the best validation loss."""
    best_loss = np.inf
    best_model = None
    history = dict(train_loss=[], val_loss=[], train_metric=[], val_metric=[])

    optimizer = torch.optim.Adam(model.parameters(), lr=train_params["learning_rate"], weight_decay=1e-8)
    criterion = model.criterion
    # train the model. Run the model on the inputs, calculate the losses, do backpropagation
    for epoch in range(1, train_params["epochs"]):
        model = model.train()
        train_losses = []
        train_mses = []
    
        for (seq_true,) in train_dataset:
            optimizer.zero_grad()
            seq_pred = model(seq_true)

            loss = criterion(seq_pred, seq_true)
            loss.backward()
            metric = [choose_a_metric]

            optimizer.step()
            train_losses.append(loss.item())
            train_mses.append(metric)

        model = model.eval()
        val_losses = []
        val_mses = []
        # run the model and loss on the validation function
        with torch.no_grad():
            for (seq_true,) in val_dataset:
                seq_pred = model(seq_true)

                loss = criterion(seq_pred, seq_true)     
                metric = [choose_a_metric]

                val_losses.append(loss.item())
                val_mses.append(metric)
                
        # get the losses and mse from the epoch
        train_loss = np.mean(train_losses)
        val_loss = np.mean(val_losses)
        train_mse = np.mean(train_mses)
        val_mse = np.mean(val_mses)

        history['train_loss'].append(train_loss)
        history['val_loss'].append(val_loss)
        history['train_mse'].append(train_mse)
        history['val_mse'].append(val_mse)

        # decide if this version of the model is the best
        loss = float(val_loss)
        if loss < best_loss:
            best_loss = val_loss
            best_model = model

        text = f'Epoch = {epoch}, train loss = {train_loss}, val loss = {val_loss}'
        print(text)

    return history, best_model

Lastly, before we start with our dataset, we will also define a predict function which will input our data and return the predictions of the model.

In [None]:
def predict(model, dataset):
    """Runs the given dataset on the given model. It returns the predictions, losses, 
    evaluation metrics and the original input vlaues"""
    predictions, input_values, losses, eval_metrics = [], [], [], []
    criterion = model.criterion
    model = model.eval()
    with torch.no_grad():
        for (seq_true,) in dataset:
            # seq_true = torch.Tensor(seq_true)
            seq_pred = model(seq_true)
            loss = criterion(seq_pred, seq_true)
                
            seq_true_metric = np.array([item for row in seq_true.tolist() for item in row])
            seq_pred_metric = np.array([item for row in seq_pred.tolist() for item in row])
            difference = np.mean(np.abs(seq_true_metric - seq_pred_metric))

            # flatten all data to be able to use as an array
            predictions.append(seq_pred.numpy().flatten())
            input_values.append(seq_true.numpy().flatten())
            losses.append(loss.item())
            eval_metrics.append(difference)

    return predictions, losses, eval_metrics, input_values

## Data 

Now we can train our autoencoder on a dataset. The dataset we use consists of "noise" data which can have different variances and different means. We are going to train our machine learning model on data with 1 specific mean. Then we will evaluate our machine learning model on a time series which contains data with all the different means and variances.

Before we can do this, however, we need to load and prepare our dataset. using the functions in the Data_utils_ae.py file, create a "create_data_train" function which has the length of the input samples and the batch size as input and outputs a train, validation and test dataloader.

In [None]:
def create_data_train(seq_len, batch_size):
    [create_function]
    return dataloader_train, dataloader_val, dataloader_test


## Train the model
Next, choose your hyperparameters for the autoencoder.

In [None]:
def create_parameters():
    learning_rate = [fill_in]
    epochs = [fill_in]
    batch_size = [fill_in]
    seq_len = [fill_in]

    train_params = {
        "epochs": epochs,
        "learning_rate": learning_rate,
        "batch_size": batch_size
    }
    return seq_len, train_params

Now you can use the created functions to create a model, train it and run predictions on the test set. Play around with your model size and hyperparameters untill you get a model that can reproduce its input data well. You can check this using the loss graph and encoder graph functions defined below.

In [None]:
def loss_graph(history):
    """Plots the training history of the machine learning model"""
    train_loss = history['train_loss']
    print("train loss: " + str(train_loss))
    val_loss = history['val_loss']
    train_metric = history['train_metric']
    val_metric = history['val_metric']

    fig, axs = plt.subplots(2,1)

    axs[0].plot(train_loss, label='train')
    axs[0].plot(val_loss, label='val')
    axs[1].plot(train_metric, label='train')
    axs[1].plot(val_metric, label='val')

    axs[0].set_xlabel("epochs")
    axs[1].set_xlabel("epochs")

    axs[0].set_ylabel("Loss")
    axs[1].set_ylabel("Metric")

    axs[0].set_title("Train vs. validation loss")
    axs[1].set_title("Train vs. validation metric")

    axs[0].legend()
    fig.set_size_inches(10, 10, forward=True)
    fig.tight_layout()
    fig.show()

In [None]:
def autoencoder_graph( predictions, inputs, eval_metrics):
    """ Plots the inputs of the model vs. the predictions and 
     the value of the evaluation metric of the "predict" function."""
    predictions = np.concatenate(predictions).tolist()
    inputs = np.concatenate(inputs).tolist()
    fig, axs = plt.subplots(2,1)

    axs[0].plot(inputs, label='inputs')
    axs[0].plot(predictions, label='predictions')
    axs[1].plot(eval_metrics)

    axs[0].set_xlabel("number of samples")
    axs[1].set_xlabel("number of samples")

    axs[0].set_ylabel("value")
    axs[1].set_ylabel("Evaluation metric")

    axs[0].set_title("Test predictions vs inputs curve")
    axs[1].set_title("Evaluation metric")

    axs[0].legend()
    fig.set_size_inches(16.5, 16.5, forward=True)
    fig.tight_layout()
    fig.show()

## The model on a new dataset
Create a "create_data_drift" function which does the same for the test dataset as it did for the training dataset, without splitting it into a train, val and test set.
Then run the model on this new dataset and look at the results.

In [None]:
def create_data_drift(seq_len, batch_size):
    [create_function]
    return dataloader_drift

## Drift detection
Adapt the predict function and autoencoder graph function so that depending on the difference between input and output drift is detected and this is shown in some way in a graph.