# **Autoencoders tutorial**
In today's tutorial you will learn how to use autoencoders to solve the following tasks:
- dimensionality reduction;
- image denoising;
- anomaly detection.

We will use [**TensorFlow**](https://ekababisong.org/gcp-ml-seminar/tensorflow/) framework and [**Keras**](https://keras.io/) open-source library to rapidly prototype deep neural networks.

# **Preliminary operations**
The following code downloads all the necessary material into the remote machine. At the end of the execution select the **File** tab to verify that everything has been correctly downloaded.

In [None]:
!wget https://biolab.csr.unibo.it/ferrara/Courses/DL/Tutorials/Autoencoders/creditcard.zip

!unzip creditcard.zip

!rm creditcard.zip

# **Useful modules import**
First of all, it is necessary to import useful modules used during the tutorial.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import random
import pandas as pd
from tensorflow import keras
from tensorflow.keras import layers
from sklearn.decomposition import PCA
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix

# **Utility functions**
Execute the following code to define some utility functions used in the tutorial:
- **plot_2d_data** plots 2D labeled data;
- **plot_history** draws in a graph the loss trend over epochs on both training and validation sets. Moreover, if provided, it draws in the same graph also the trend of the given metric;
- **show_confusion_matrix** visualizes a 2D confusion matrix as a color-coded image.

In [None]:
def plot_2d_data(data_2d,y,titles=None,figsize=(7,7)):
  _,axs=plt.subplots(1,len(data_2d),figsize=figsize)

  for i in range(len(data_2d)):
    if (titles!=None):
      axs[i].set_title(titles[i])
    scatter=axs[i].scatter(data_2d[i][:,0],data_2d[i][:,1],s=1,c=y[i],cmap=plt.cm.Paired)
    axs[i].legend(*scatter.legend_elements())

def plot_history(history,metric=None):
  fig, ax1 = plt.subplots(figsize=(10, 8))

  epoch_count=len(history.history['loss'])

  line1,=ax1.plot(range(1,epoch_count+1),history.history['loss'],label='train_loss',color='orange')
  ax1.plot(range(1,epoch_count+1),history.history['val_loss'],label='val_loss',color = line1.get_color(), linestyle = '--')
  ax1.set_xlim([1,epoch_count])
  ax1.set_ylim([0, max(max(history.history['loss']),max(history.history['val_loss']))])
  ax1.set_ylabel('loss',color = line1.get_color())
  ax1.tick_params(axis='y', labelcolor=line1.get_color())
  ax1.set_xlabel('Epochs')
  _=ax1.legend(loc='lower left')

  if (metric!=None):
    ax2 = ax1.twinx()
    line2,=ax2.plot(range(1,epoch_count+1),history.history[metric],label='train_'+metric)
    ax2.plot(range(1,epoch_count+1),history.history['val_'+metric],label='val_'+metric,color = line2.get_color(), linestyle = '--')
    ax2.set_ylim([0, max(max(history.history[metric]),max(history.history['val_'+metric]))])
    ax2.set_ylabel(metric,color=line2.get_color())
    ax2.tick_params(axis='y', labelcolor=line2.get_color())
    _=ax2.legend(loc='upper right')

def show_confusion_matrix(conf_matrix,class_names,figsize=(10,10)):
  fig, ax = plt.subplots(figsize=figsize)
  img=ax.matshow(conf_matrix)
  tick_marks = np.arange(len(class_names))
  _=plt.xticks(tick_marks, class_names,rotation=45)
  _=plt.yticks(tick_marks, class_names)
  _=plt.ylabel('Real')
  _=plt.xlabel('Predicted')
  
  for i in range(len(class_names)):
    for j in range(len(class_names)):
        text = ax.text(j, i, '{0:.1%}'.format(conf_matrix[i, j]),
                       ha='center', va='center', color='w')

# **Dimensionality reduction**
In this section a concrete example on how autoencoders can be used for dimensionality reduction is provided.

## **Dataset**
The [**digits MNIST**](http://yann.lecun.com/exdb/mnist/) dataset, containing 28x28 grayscale images of the 10 digits, will be used.

The goal is to reduce the dimensions, from 784 (28x28) to 2, by including as much information as possible.

The following code loads in memory the dataset.

In [None]:
(train_x, train_y), (test_x, test_y) = keras.datasets.mnist.load_data()

print('Train shape: ',train_x.shape)
print('Test shape: ',test_x.shape)

### **Visualization**
Randomly selected images can be shown by executing the following code.

In [None]:
image_count=10

_, axs = plt.subplots(1, image_count,figsize=(15, 10))
for i in range(image_count):
  random_idx=random.randint(0,train_x.shape[0])
  axs[i].imshow(train_x[random_idx],cmap='gray')
  axs[i].axis('off')
  axs[i].set_title(train_y[random_idx])

### **Intensity range normalization**
Pixel intensity is usually represented as discrete values in the range [0;255]. 

In [None]:
print('Min value: ',train_x.min())
print('Max value: ',train_x.max())

Such values could produce math range errors with the activation function or make training unstable. To overcome these issues, a simple normalization step can be applied by dividing all values by 255 to get continuous values in the range [0;1].

In [None]:
train_x = train_x/255.0
test_x = test_x/255.0

print('Min value: ',train_x.min())
print('Max value: ',train_x.max())

### **Image linearization**
The images need to be converted from 2D matrices to vectors before they can be used as input of dimensionality reduction algorithms (e.g., PCA).

The following code use the Numpy function [**reshape**](https://numpy.org/doc/stable/reference/generated/numpy.reshape.html) to flatten the data.

In [None]:
train_x_flatten=np.reshape(train_x,(train_x.shape[0],-1))
test_x_flatten=np.reshape(test_x,(test_x.shape[0],-1))

print('Train flatten shape: ',train_x_flatten.shape)
print('Test flatten shape: ',test_x_flatten.shape)

## **Principal component analysis**
*Principal Component Analysis* (PCA) is a method widely used to apply a linear dimensionality reduction to large datasets.

This algorithm can be easily applied to a dataset using the class [**PCA**](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html) provided by the Scikit-learn library.

The *n_components* parameter is used to set the number of dimensions of the reduced space.

In [None]:
pca = PCA(n_components=2)

The [**fit**](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html#sklearn.decomposition.PCA.fit) method can be used to fit the PCA model to the data passed as input.

In [None]:
pca.fit(train_x_flatten)

Once the PCA model has been created, the dimensionality reduction can be applied to a dataset using the [**transform**](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html#sklearn.decomposition.PCA.transform) method.   

In [None]:
pca_encoded_train_x=pca.transform(train_x_flatten)
pca_encoded_test_x=pca.transform(test_x_flatten)

### **Reduced space visualization**
The following code visualize the reduced training and test sets.

In [None]:
plot_2d_data([pca_encoded_train_x,pca_encoded_test_x],[train_y,test_y],['Train','Test'],(15,7))

### **Reconstructed images**
Reduced images can be reconstructed using the [**inverse_transform**](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html#sklearn.decomposition.PCA.inverse_transform) method.

In [None]:
pca_decoded_test_x_flatten=pca.inverse_transform(pca_encoded_test_x)

print('PCA decoded test flatten shape: ',pca_decoded_test_x_flatten.shape)

Before visualizing the reconstructed images, it is necessary to return to their original 2D shape.

In [None]:
pca_decoded_test_x=np.reshape(pca_decoded_test_x_flatten,(test_x.shape[0],test_x.shape[1],test_x.shape[2]))

print('PCA decoded test shape: ',pca_decoded_test_x.shape)

Randomly selected images and the corresponding reconstructed version can be shown executing the following code.

In [None]:
n=5

fig, axs = plt.subplots(n, 2,figsize=(4,6))
axs[0,0].set_title('Original image')
axs[0,1].set_title('PCA')
for i in range(n):
  rnd_idx=random.randint(0,test_x.shape[0]-1)

  axs[i,0].axis('off')
  axs[i,0].imshow(test_x[rnd_idx], cmap='gray')

  axs[i,1].axis('off')
  axs[i,1].imshow(pca_decoded_test_x[rnd_idx], cmap='gray')
  
plt.show()

## **Undercomplete autoencoder**
In this section an undercomplete autoencoder is implemented to compare its reduction ability with PCA.

### **Model definition**
The following function creates an undercomplete autoencoder given:
- the number of input features (*input_count*);
- the number of neurons for each hidden layer (*neuron_count_per_hidden_layer*);
- the dimension of the latent space (*encoded_dim*);
- the string identifier of the activation function of the hidden layers (*hidden_activation*);
- the string identifier of the activation function of the output layer (*output_activation*).

In Keras, a sequential is a stack of layers where each layer has exactly one input and one output. It can be created by passing a list of layers to the  constructor [**keras.Sequential**](https://keras.io/guides/sequential_model/).

[**Keras layers API**](https://keras.io/api/layers/) offers a wide range of built-in layers ready for use, including:
- [**Input**](https://keras.io/api/layers/core_layers/input/) - the input of the model. Note that, you can also omit the **Input** layer. In that case the model doesn't have any weights until the first call to a training/evaluation method (since it is not yet built);
- [**Dense**](https://keras.io/api/layers/core_layers/dense/) - a fully-connected layer.

To combine encoder and decoder together forming the autoencoder, the [**Model**](https://keras.io/api/models/model/) class provided by Keras is used. Input and output layers are passed to the constructor, then it groups layers into an object with training and inference features.

<u>Note that, the **build_autoencoder** function returns the encoder and the decoder models as well as the whole autoencoder.</u>

In [None]:
def build_autoencoder(input_count,neuron_count_per_hidden_layer,encoded_dim,hidden_activation,output_activation):
  #Encoder
  encoder = keras.Sequential(name='encoder')
  input_layer=layers.Input(shape=input_count,name='encoder_input');
  encoder.add(input_layer)
    
  for neuron_count in neuron_count_per_hidden_layer:
    hidden_layer=layers.Dense(neuron_count,activation=hidden_activation)
    encoder.add(hidden_layer)
      
  latent_layer=layers.Dense(encoded_dim,activation=hidden_activation)
  encoder.add(latent_layer)
    
  #Decoder
  decoder = keras.Sequential(name='decoder')
  decoder.add(layers.Input(shape=encoded_dim))
  
  for neuron_count in reversed(neuron_count_per_hidden_layer):
    hidden_layer=layers.Dense(neuron_count,activation=hidden_activation)
    decoder.add(hidden_layer)
      
  output_layer=layers.Dense(input_count,activation=output_activation)
  decoder.add(output_layer)
  
  autoencoder=keras.Model(encoder.input,decoder(encoder.output),name='autoencoder')
    
  return autoencoder,encoder,decoder

### **Model creation**
The following code creates an undercomplete autoencoder by calling the **build_autoencoder** function defined above.

In [None]:
autoencoder,encoder,decoder=build_autoencoder(train_x_flatten.shape[1],[512,256,128],2,'elu','sigmoid')

### **Model visualization**
A string summary of the network can be printed using the [**summary**](https://keras.io/api/models/model/#summary-method) method.

In [None]:
autoencoder.summary()

The summary is useful for simple models, but can be confusing for complex models.

Function [**keras.utils.plot_model**](https://keras.io/api/utils/model_plotting_utils/) creates a plot of the neural network graph that can make more complex models easier to understand.

In [None]:
keras.utils.plot_model(autoencoder,show_shapes=True, show_layer_names=False,expand_nested=True)

### **Model compilation**
The compilation is the final step in configuring the model for training. 

The following code use the [**compile**](https://keras.io/api/models/model_training_apis/#compile-method) method to compile the model.
The important arguments are:
- the optimization algorithm (*optimizer*);
- the loss function (*loss*);
- the metrics used to evaluate the performance of the model (*metrics*).

The most common [optimization algorithms](https://keras.io/api/optimizers/#available-optimizers), [loss functions](https://keras.io/api/losses/#available-losses) and [metrics](https://keras.io/api/metrics/#available-metrics) are already available in Keras. You can either pass them to **compile** as an instance or by the corresponding string identifier. In the latter case, the default parameters will be used.

In [None]:
autoencoder.compile(loss='mse', optimizer='adam')

### **Split data into training and validation sets**
In order to avoid overfitting during training, it is necessary to have a separate dataset (called validation set), in addition to the training and test datasets, to choose the optimal value for the hyperparameters.

For this reason, *train_x* and *train_y* are divided into training and validation sets using the [**train_test_split**](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) function provided by Scikit-learn.

The *val_size* variable represents the percentage (or the absolute number) of patterns to include in the validation set.

In [None]:
val_size=10000

train_x_flatten, val_x_flatten, train_y, val_y = train_test_split(train_x_flatten, train_y, test_size = val_size,random_state = 1,shuffle=True)

print('Train data flatten shape: ',train_x_flatten.shape)
print('Train label shape: ',train_y.shape)
print('Validation data flatten shape: ',val_x_flatten.shape)
print('Validation label shape: ',val_y.shape)

### **Training**
Now we are ready to train our model by calling the [**fit**](https://keras.io/api/models/model_training_apis/#fit-method) method.

It trains the model for a fixed number of epochs (*epoch_count*) using the training set (*train_x_flatten*) divided into mini-batches of *batch_size* elements. During the training process, the performances will be evaluated on both training and validation (*val_x_flatten*) sets.

Break training when a metric or the loss has stopped improving on the validation set, helps to avoid overfitting.

For this purpose, Keras provides a class called [**EarlyStopping**](https://keras.io/api/callbacks/early_stopping/). Important class parameters are:
- *monitor* - the name of the metric or the loss to be observed; 
- *patience* - the number of epochs with no improvement after which training will be stopped;
- *restore_best_weights* - whether to restore model weights from the epoch with the best value of the monitored quantity.

Once created an instance of the **EarlyStopping** class, it can be passed to the **fit** method in the *callbacks* parameter.

<u>Note that, in this case the target data correspond to the input data because the objective of the autoencoder is to reconstruct the input as best as possible.</u>

In [None]:
epoch_count = 100
batch_size=128
patience=5

early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=patience, restore_best_weights=True)

history = autoencoder.fit(train_x_flatten,train_x_flatten,validation_data=(val_x_flatten,val_x_flatten),epochs=epoch_count,batch_size=batch_size,callbacks=[early_stop])

We can learn a lot about our model by observing the graph of its performance over time during training.

The **fit** method returns an object (*history*) containing loss and metrics values at successive epochs for both training and validation sets.

The following code calls the **plot_history** function defined above to draw in a graph the loss over epochs on both training and validation sets.

In [None]:
plot_history(history)

### **Performance evaluation on the test set**
The performance on the test set can be easily measured by calling the **evaluate** method of the autoencoder.

In [None]:
test_loss = autoencoder.evaluate(test_x_flatten, test_x_flatten, batch_size=batch_size,verbose=0)
print('Test loss: {:.3f}'.format(test_loss))

### **Reduced space visualization**
The [**predict**](https://keras.io/api/models/model_training_apis/#predict-method) method of the *encoder* can be used to reduce training, validation and test sets.

In [None]:
encoded_train_x = encoder.predict(train_x_flatten)
encoded_val_x = encoder.predict(val_x_flatten)
encoded_test_x = encoder.predict(test_x_flatten)

The following code visualize the reduced training, validation and test sets.

In [None]:
plot_2d_data([encoded_train_x,encoded_val_x,encoded_test_x],[train_y,val_y,test_y],['Train','Validation','Test'],(18,6))

### **Reconstructed images**
The **predict** method of the *decoder* can be used to reconstruct the original images from the encoded space. 

In [None]:
decoded_test_x_flatten=decoder.predict(encoded_test_x)

print('Decoded test flatten shape: ',decoded_test_x_flatten.shape)

Before visualizing the reconstructed images, it is necessary to return to their original 2D shape.

In [None]:
decoded_test_x=np.reshape(decoded_test_x_flatten,(test_x.shape[0],test_x.shape[1],test_x.shape[2]))

print('Decoded test shape: ',decoded_test_x.shape)

Randomly selected images and the corresponding reconstructed version can be shown executing the following code.

In [None]:
n=5

fig, axs = plt.subplots(n, 2,figsize=(4,6))
axs[0,0].set_title('Original image')
axs[0,1].set_title('Autoencoder')
for i in range(n):
  rnd_idx=random.randint(0,test_x.shape[0]-1)

  axs[i,0].axis('off')
  axs[i,0].imshow(test_x[rnd_idx], cmap='gray')

  axs[i,1].axis('off')
  axs[i,1].imshow(decoded_test_x[rnd_idx], cmap='gray')
  
plt.show()

## **Comparison between PCA and autoencoder**
The following code visualizes the reconstructed images returned by both PCA and autoencoder starting from images randomly selected from the test set.

In [None]:
n=5

fig, axs = plt.subplots(n, 3,figsize=(5,8))
axs[0,0].set_title('Original image')
axs[0,1].set_title('PCA')
axs[0,2].set_title('Autoencoder')
for i in range(n):
  rnd_idx=random.randint(0,test_x.shape[0]-1)

  axs[i,0].axis('off')
  axs[i,0].imshow(test_x[rnd_idx], cmap='gray')

  axs[i,1].axis('off')
  axs[i,1].imshow(pca_decoded_test_x[rnd_idx], cmap='gray')

  axs[i,2].axis('off')
  axs[i,2].imshow(decoded_test_x[rnd_idx], cmap='gray')
  
plt.show()

The images reconstructed by the autoencoder is far better than that obtained with PCA.

# **Image denoising**
In this section a convolutional autoencoder is used to solve an image denoising problem.

## **Dataset**
The [**fashion MNIST**](https://github.com/zalandoresearch/fashion-mnist) dataset, containing 28x28 grayscale images of the 10 fashion categories, will be used.

The goal is to train the autoencoder to map noisy images to clean ones.

The following code loads in memory the dataset.

In [None]:
(train_x, _), (test_x, _) = keras.datasets.fashion_mnist.load_data()

### **Visualization**
Randomly selected images can be shown by executing the following code.

In [None]:
image_count=10

_, axs = plt.subplots(1, image_count,figsize=(15, 10))
for i in range(image_count):
  random_idx=random.randint(0,train_x.shape[0])
  axs[i].imshow(train_x[random_idx],cmap='gray')
  axs[i].axis('off')

### **Intensity range normalization**
As in previous section, a simple normalization step is applied to map values from range [0;255] to range [0;1].

In [None]:
print('Min value before normalization: ',train_x.min())
print('Max value before normalization: ',train_x.max())

train_x = train_x/255.0
test_x = test_x/255.0

print('Min value after normalization: ',train_x.min())
print('Max value after normalization: ',train_x.max())

### **Image shape**
To use grayscale images as input of a convolutional autoencoder, it is necessary to add a new unit axis to explicitly represent single channel images.

By executing the following code, the shape of the images is updated from WxH to WxHx1.

In [None]:
train_x=np.expand_dims(train_x,axis=3)
test_x=np.expand_dims(test_x,axis=3)

print('Train shape: ',train_x.shape)
print('Test shape: ',test_x.shape)

### **Split data into training and validation sets**
In order to avoid overfitting during training, it is necessary to have a separate dataset (called validation set), in addition to the training and test datasets, to choose the optimal value for the hyperparameters.

For this reason, *train_x* is divided into training and validation sets using the **train_test_split** function provided by Scikit-learn.

The *val_size* variable represents the percentage (or the absolute number) of patterns to include in the validation set.

In [None]:
val_size=10000

train_x, val_x = train_test_split(train_x, test_size = val_size,random_state = 1,shuffle=True)

print('Train shape: ',train_x.shape)
print('Validation shape: ',val_x.shape)

### **Synthetic generation of noisy images**
To synthetically generate noisy images, a Gaussian noise matrix is applied to the original images. After that, the resulting images are clipped in the range [0;1].

Numpy library provides the function [**random.normal**](https://numpy.org/doc/stable/reference/random/generated/numpy.random.normal.html) to draw random samples from a normal (Gaussian) distribution with mean *loc* and standard deviation *scale*. The parameter *size* represents the shape of the output samples.

The strength of the noise applied is represented by the *noise_factor* variable.

In [None]:
noise_factor = 0.5

noisy_train_x = train_x + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=train_x.shape) 
noisy_val_x = val_x + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=val_x.shape) 
noisy_test_x = test_x + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=test_x.shape) 

noisy_train_x = np.clip(noisy_train_x, 0., 1.)
noisy_val_x = np.clip(noisy_val_x, 0., 1.)
noisy_test_x = np.clip(noisy_test_x, 0., 1.)

### **Visualization of noisy images**
The following code shows some randomly selected original images and the corresponding noisy ones.

In [None]:
image_count=10

fig, axs = plt.subplots(2, image_count,figsize=(15,3))
for i in range(image_count):
  rnd_idx=random.randint(0,train_x.shape[0]-1)

  axs[0,i].axis('off')
  axs[0,i].imshow(train_x[rnd_idx].reshape(train_x[rnd_idx].shape[0], train_x[rnd_idx].shape[1]), cmap='gray')

  axs[1,i].axis('off')
  axs[1,i].imshow(noisy_train_x[rnd_idx].reshape(noisy_train_x[rnd_idx].shape[0], noisy_train_x[rnd_idx].shape[1]), cmap='gray')
plt.show()

## **Denoising autoencoder**
In this section a convolutional autoencoder is implemented to recover noisy **fashion MNIST** images.

### **Model definition**
The following function creates a convolutional autoencoder given:
- the shape of the input images (*input_shape*).

[**Keras layers API**](https://keras.io/api/layers/) offers a wide range of built-in layers ready for use, including:
- [**Input**](https://keras.io/api/layers/core_layers/input/) - the input of the model. Note that, you can also omit the **Input** layer. In that case the model doesn't have any weights until the first call to a training/evaluation method (since it is not yet built);
- [**Conv2D**](https://keras.io/api/layers/convolution_layers/convolution2d/) - a 2D convolution layer;
- [**MaxPooling2D**](https://keras.io/api/layers/pooling_layers/average_pooling2d/) - a 2D max pooling layer;
- [**UpSampling2D**](https://keras.io/api/layers/reshaping_layers/up_sampling2d/) - a 2D upsampling layer. It is an unpooling layer with no trainable parameters useful to upsample 3D volumes previously reduced by convolutional or pooling layers.

In [None]:
def build_denoising_autoencoder(input_shape=(28, 28, 1)):
    autoencoder=keras.Sequential(
            [
              layers.Input(shape=input_shape),
              layers.Conv2D(32, (3, 3), activation='relu', padding='same'),
              layers.MaxPooling2D((2, 2), padding='same'),
              layers.Conv2D(32, (3, 3), activation='relu', padding='same'),
              layers.MaxPooling2D((2, 2), padding='same'),
              layers.Conv2D(32, (3, 3), activation='relu', padding='same'),
              layers.UpSampling2D((2, 2)),
              layers.Conv2D(32, (3, 3), activation='relu', padding='same'),
              layers.UpSampling2D((2, 2)),
              layers.Conv2D(1, (3, 3), activation='sigmoid', padding='same')
            ]
          )
    return autoencoder

### **Model creation**
The following code creates the denoising autoencoder by calling the **build_denoising_autoencoder** function defined above.

In [None]:
denoising_autoencoder=build_denoising_autoencoder()

### **Model visualization**
A string summary of the network can be printed by executing the following code.

In [None]:
denoising_autoencoder.summary()

Alternatively, a plot of the neural network graph can be visualized.

In [None]:
keras.utils.plot_model(denoising_autoencoder,show_shapes=True, show_layer_names=False)

### **Model compilation**
The following code compiles the model as already done for the undercomplete autoencoder.

In [None]:
denoising_autoencoder.compile(loss='mse',optimizer='adam')

### **Training**
Now we are ready to train our model by calling the **fit** method.

In [None]:
epoch_count = 100
batch_size=128
patience=5

early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=patience, restore_best_weights=True)

history=denoising_autoencoder.fit(noisy_train_x,train_x,validation_data=(noisy_val_x, val_x),epochs=epoch_count,batch_size=batch_size,callbacks=[early_stop])

The following code calls the **plot_history** function defined above to draw in a graph the loss over epochs on both training and validation sets.

In [None]:
plot_history(history)

## **Performance evaluation on the test set**
The performance on the test set can be easily measured by calling the **evaluate** method of the autoencoder.

In [None]:
test_loss = denoising_autoencoder.evaluate(noisy_test_x, test_x, batch_size=batch_size,verbose=0)
print('Test loss: {:.3f}'.format(test_loss))

## **Denoised images**
The **predict** method can be used to recover the original images from the noisy test set. 

In [None]:
denoised_test_x=denoising_autoencoder.predict(noisy_test_x)

Randomly selected noisy images and the corresponding denoised version can be shown executing the following code.

In [None]:
image_count = 10

fig, axs = plt.subplots(3, image_count,figsize=(15,5))
for i in range(image_count):
  rnd_idx=random.randint(0,test_x.shape[0]-1)

  axs[0,i].axis('off')
  axs[0,i].imshow(test_x[rnd_idx].reshape(test_x[rnd_idx].shape[0], test_x[rnd_idx].shape[1]), cmap='gray')
  
  axs[1,i].axis('off')
  axs[1,i].imshow(noisy_test_x[rnd_idx].reshape(noisy_test_x[rnd_idx].shape[0], noisy_test_x[rnd_idx].shape[1]), cmap='gray')

  axs[2,i].axis('off')
  axs[2,i].imshow(denoised_test_x[rnd_idx].reshape(denoised_test_x[rnd_idx].shape[0], denoised_test_x[rnd_idx].shape[1]), cmap='gray')
plt.show()

# **Anomaly detection: credit card fraud**
In this section, an autoencoder is used to detect fraudulent credit/debit card transactions.

## **How use autoencoders to detect anomalies?**
An *anomaly* can be defined as an illegitimate data point generated by a different process than whatever generated the rest of the data.

By learning to replicate the most salient features in the training data an autoencoder is encouraged to learn to precisely reproduce the most frequently observed characteristics. When facing anomalies, the model should worsen its reconstruction performance.

Usually, only data with *normal* instances are used to train the model. After training, the autoencoder will accurately reconstruct *normal* data, while failing to do so with unfamiliar anomalous data. 

Reconstruction error (the error between the original data and its reconstructed version) is used as an anomaly score to detect anomalies.

## **Dataset**
The [Credit Card Fraud Detection](https://www.kaggle.com/mlg-ulb/creditcardfraud/) dataset from Kaggle contains 284807 European credit card transactions with 492 fraudulent transactions (0.172% of all transactions). Everything except the time and amount has been reduced by a PCA for privacy concerns.

The dataset is stored in a CSV file and can be easily loaded in memory using [**pandas**](https://pandas.pydata.org/), a software library for data manipulation and analysis.

In [None]:
dataframe = pd.read_csv('creditcard.csv')

The variable *dataframe* is an instance of the pandas class [**DataFrame**](https://pandas.pydata.org/pandas-docs/stable/reference/frame.html), a 2-dimensional labeled data structure with columns of potentially different types.

### **Visualization**
*row_count* randomly selected rows can be shown by executing the following code.

In [None]:
row_count=5

dataframe.sample(row_count)

### **Statistics**
The [**info**](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.info.html) method can be used to print a brief summary of a **DataFrame** including the index and the type of each column, the non-null values and the memory usage.

In [None]:
dataframe.info()

To show the overall statistics of the dataset can be used the method [**describe**](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html).

In [None]:
dataframe.describe().transpose()

The method [**hist**](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.hist.html) draws a histogram for each column in the **DataFrame**.

In [None]:
dataframe.hist(bins=50, figsize=(20,15))
plt.show()

From the statistics and the histograms it is clear how each feature presents a very different distribution.

### **Split features from target values**
The following code separates the features from the target values (clean/fraudulent transactions).

In [None]:
dataframe_x=dataframe.drop(['Class'],axis=1)
dataframe_y=dataframe['Class']

The Numpy representation of a **DataFrame** can be obtained using the [**values**](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.values.html) property.

In [None]:
x=dataframe_x.values
y=dataframe_y.values

print('Feature shape: ',x.shape)
print('Target shape: ',y.shape)

### **Split clean and fraudulent transactions**
The following code separates clean and fraudulent transactions.

In [None]:
cleans=y==0
frauds=y==1

clean_x=x[cleans]
clean_y=y[cleans]

fraud_x=x[frauds]
fraud_y=y[frauds]

print('Clean feature shape: ',clean_x.shape)
print('Clean target shape: ',clean_y.shape)
print('Fraudulent feature shape: ',fraud_x.shape)
print('Fraudulent target shape: ',fraud_y.shape)

### **Split data into training, validation and test sets**
In order to avoid overfitting during training and to evaluate the generalization capabilites of the models, it is necessary to divide the data into three disjoined datasets: training, validation and test sets.

For this reason, the data are divided using the **train_test_split** function provided by Scikit-learn.

The *test_size* and *val_size* parameters represent the percentage (or the absolute number) of patterns to include in the test and validation sets, respectively. 

<u>Note that, the autoencoder will be trained using only clean transactions while the test set will contain both clean and fraudulent transactions.</u>

In [None]:
test_size=0.25
val_size=0.33

train_x, test_x, train_y, test_y = train_test_split(clean_x, clean_y, test_size = test_size,random_state = 1,shuffle=True)

train_x, val_x, train_y, val_y = train_test_split(train_x, train_y, test_size = val_size,random_state = 1,shuffle=True)

test_x=np.concatenate((test_x, fraud_x))
test_y=np.concatenate((test_y, fraud_y))

print('Train feature shape: ',train_x.shape)
print('Train target shape: ',train_y.shape)
print('Validation feature shape: ',val_x.shape)
print('Validation target shape: ',val_y.shape)
print('Test feature shape: ',test_x.shape)
print('Test target shape: ',test_y.shape)

### **Data normalization**
It is good practice to normalize features that use different scales and ranges.

Scikit-learn library provides the class [**StandardScaler**](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html) to normalize features by removing the mean and scaling to unit variance.

In [None]:
scaler = StandardScaler().fit(train_x)
train_x = scaler.transform(train_x)
val_x = scaler.transform(val_x)
test_x = scaler.transform(test_x)

## **The autoencoder**
In this section an autoencoder is trained to detect anomalies into credit card transactions.

### **Model creation**
The following code creates the autoencoder by calling the **build_autoencoder** function defined above.

In [None]:
autoencoder,encoder,_=build_autoencoder(train_x.shape[1],[24,16,8,4],2,'elu',None)

### **Model visualization**
A string summary of the network can be printed by executing the following code.

In [None]:
autoencoder.summary()

lternatively, a plot of the neural network graph can be visualized.

In [None]:
keras.utils.plot_model(autoencoder,show_shapes=True, show_layer_names=False,expand_nested=True)

### **Model compilation**
The following code compiles the model.

In [None]:
autoencoder.compile(loss='mse',optimizer='adam')

### **Training**
Now we are ready to train our model by calling the **fit** method.

In [None]:
epoch_count = 200
batch_size=256
patience=5

early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=patience, restore_best_weights=True)

history=autoencoder.fit(train_x,train_x,validation_data=(val_x,val_x),epochs=epoch_count,batch_size=batch_size,callbacks=[early_stop])

The following code calls the **plot_history** function defined above to draw in a graph the loss over epochs on both training and validation sets.

In [None]:
plot_history(history)

## **Latent space visualization**
It is always interesting to look at the compressed representation obtained by the autoencoder.

The **predict** method of the *encoder* can be used to reduce training, validation and test sets.

In [None]:
encoded_train_x = encoder.predict(train_x)
encoded_val_x = encoder.predict(val_x)
encoded_test_x = encoder.predict(test_x)

The following code visualize training,validation and test sets mapped into the latent space.

In [None]:
plot_2d_data([encoded_train_x,encoded_val_x,encoded_test_x],[train_y,val_y,test_y],['Train','Validation','Test'],(15,7))

## **Fraud detection**
To evaluate the fraud detection capabilities of the model, the MSE between each transaction and its reconstructed version will be used as anomaly score.

The following code calls the **predict** method to generate the reconstructed transactions (*reconstructed_train_x*, *reconstructed_val_x* and *reconstructed_test_x*) of the training, validation and test sets (*train_x*, *val_x* and *test_x*).

In [None]:
reconstructed_train_x=autoencoder.predict(train_x)
reconstructed_val_x=autoencoder.predict(val_x)
reconstructed_test_x=autoencoder.predict(test_x)

print('Reconstructed train shape: ',reconstructed_train_x.shape)
print('Reconstructed validation shape: ',reconstructed_val_x.shape)
print('Reconstructed test shape: ',reconstructed_test_x.shape)

Scikit-learn library provides the function [**mean_squared_error**](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html) to compute MSE metric.

In [None]:
train_mse=mean_squared_error(train_x.transpose(),reconstructed_train_x.transpose(),multioutput='raw_values')
val_mse=mean_squared_error(val_x.transpose(),reconstructed_val_x.transpose(),multioutput='raw_values')
test_mse=mean_squared_error(test_x.transpose(),reconstructed_test_x.transpose(),multioutput='raw_values')

print('Train MSE shape: ',train_mse.shape)
print('Validation MSE shape: ',val_mse.shape)
print('Test MSE shape: ',test_mse.shape)

### **Distribution of means squared error**
The following code draws the MSE distributions of training, validation and test sets.

In [None]:
_, axs = plt.subplots(1,3,figsize=(15,5))

axs[0].hist(train_mse, bins=100, density=True, label="clean", alpha=.6, color="green")
axs[0].set_title('Train')

axs[1].hist(val_mse, bins=100, density=True, label="clean", alpha=.6, color="green")
axs[1].set_title('Validation')

axs[2].hist(test_mse[(test_y==0).squeeze()], bins=100, density=True, label="clean", alpha=.6, color="green")
axs[2].hist(test_mse[(test_y==1).squeeze()], bins=100, density=True, label="fraudulent", alpha=.6, color="red")
axs[2].set_title('Test')

plt.legend()
plt.show()

Looking at the test distribution, although some fraudulent transactions present a low MSE very similar to clean transactions, in general the fraudulent transactions clearly have a distinguishing element in their data that sets them apart from clean ones.

### **Detection accuracy**
To detect fraudulent transactions a threshold on the MSE value can be used. 

It must be chosen to limit as much as possible the amount of clean transactions classified as fraudulent (i.e., false positive) and to capture the most anomalous ones.

Here we select as threshold the MSE value to obtain a specific percentage of true negatives on the validation set. 

The MSE value corresponding to a specific percentage (*clean_acceptance_rate*) of true negatives (i.e., clean transactions correctly classified) on the validation set is chosen as threshold.

In [None]:
clean_acceptance_rate=0.99

sorted_val_mse=np.sort(val_mse)

idx=int(clean_acceptance_rate*len(sorted_val_mse))

thr=sorted_val_mse[int(clean_acceptance_rate*len(sorted_val_mse))]

print('Anomaly detection threshold: {:.3f}'.format(thr))

The accuracy can be easily measured by calling the [**accuracy_score**](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html) method provided by the Scikit-learn library.

In [None]:
train_y_pred=train_mse>thr
val_y_pred=val_mse>thr
test_y_pred=test_mse>thr

train_accuracy=accuracy_score(train_y,train_y_pred,normalize='true')
val_accuracy=accuracy_score(val_y,val_y_pred,normalize='true')
test_accuracy=accuracy_score(test_y,test_y_pred,normalize='true')

print('Train accuracy: {:.3f}'.format(train_accuracy))
print('Validation accuracy: {:.3f}'.format(val_accuracy))
print('Test accuracy: {:.3f}'.format(test_accuracy))

### **Confusion matrix**
To evaluate the classification accuracy in presence of an unbalanced dataset, it is useful to compute the [confusion matrix](https://en.wikipedia.org/wiki/Confusion_matrix).

Scikit-learn library provides the function [**confusion_matrix**](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html) to compute the confusion matrix given the grouhd truth (*test_y*) and the predicted classes (*test_y_pred*) as input.

In [None]:
conf_matrix=confusion_matrix(test_y, test_y_pred, normalize='true')
print(conf_matrix)

The following code visualizes the 2D confusion matrix as a color-coded image.

In [None]:
show_confusion_matrix(conf_matrix,('clean','fraud'),figsize=(6,6))

# **Exercise**
Solve another anomaly detection problem chosen from:
- [Unsupervised Anomaly Detection Benchmark](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/OPQMVF);
- [Outlier Detection DataSets](http://odds.cs.stonybrook.edu/).