# **Tensorflow, Tensorboard and Keras Tutorial**
### Sofia Begonha Morgado - nº 62141
### 27/04/2022

## **Introduction**


Deep learning enables computational models built of numerous processing layers to learn representations of large datasets with multiple levels of abstraction. These models allowed a significant improvement in many fields. In this tutorial, I will explain how to implement deep neural networks using Tensorflow and Keras, both with the sequential and functional API. Firstly, Tensorflow will be implement to solve s simple XOR problem; then, Sequential Keras will be used to predict classes of objects from the fashion_MNIST dataset and finally, Keras with functional API will be used to implement an autoencoder for the MNIST dataset.

## **Implementation of a simple neural network to solve the XOR problem using Tensorflow**



To begin, a basic network will be created manually to solve the XOR problem, which entails classifying four points in a two-dimensional space (two features). This problem may be non-linearly solvable when two points belong to one class and the remaining two to another class. For this reason, we must use a neural network instead of only one neuron. 

Firstly, we load the tensorflow library, the numpy data management module and the libraries for plotting the results. 

In [None]:
import tensorflow as tf
import numpy as np
from matplotlib import pyplot as plt
import pandas as pd
import seaborn as sns
print(a)

### **Layers and neurons**

One must specify the number of layers and neurons that our network must include. Then, the neuron's starting parameters must be defined, which include its weight and bias.

#### **How to choose the initial parameters?**
The initial weight of the neurons must be greater than zero, as this would impede the network's development. Also, they must be different from eachother to prevent them to be optimized in the same way. Finally, we must coonsider that larger initial weights will help the network spread more widely from the beggining, but smaller weights may lead to instability. 

For these reasons, we use the tf.random.normal() to create random initial weights with a normal distribution, a mean of 0 and standard deviation of 1. 

The bias is used to shift the activation function by adding a constant. 
It can be started as 0. 

#### **Output layer**
The output layer is composed of one neuron. 
We generate a weight tensor for the output layer using the function tf.Variable, which will thereafter be utilized as input in subsequent processes.
For this exercise, we build a weight tensor with two values, one for each variable's weight.


#### **Hidden layer**
Two neurons are defined for the hidden layer. As a result, weights hidden is a 2x2 array with one weight for each feature in each neuron, and we define one bias for each neuron.



In [None]:
#Define weights and bias for final layer
weights = tf.Variable(tf.random.normal((2,1)), name="weights") 
bias = tf.Variable(0.0, name="bias")

#Define weights and bias for hidden layer
weights_hidden = tf.Variable(tf.random.normal((2,2)), name="weights") 
bias_hidden = tf.Variable(tf.zeros((2,)), name="bias") 

### **Update the weights**

It is necessary to define the functions that will update the weights of the neurons in each layer as the network is trained.

The hidden layer will be given the matrix X, corresponding to the features of our dataset. To begin, we must transform this matrix, which is a numpy array, into a tensor (note the function confirms it is receiving a numpy array, as this function is also used in the Tensorboard part of this tutorial, where it will already receive a tensor). Then, we use the matmul() method to determine the product of this tensor and the weights and sum a bias.

The hidden layer's outputs are then passed to the prediction() function, which defines the output layer, and the new weights are calculated.

We turn this network into a logistic regression classifier using the sigmoid activation function offered in tensorflow because the ideia is to predict which out of two classes each example belongs to.

In [None]:
def predict_hidden(X):
    """"Calculate the hidden layer weights"""
    if type(X).__module__ == np.__name__: #to deal with tensorboard
      X = tf.constant(X.astype(np.float32))
    net_h = tf.add(tf.matmul(X, weights_hidden), bias_hidden, name="net")
    o_h = tf.sigmoid(net_h)
    return prediction(o_h)


def prediction(t_X):
    """Calculate the output layer weights"""
    net = tf.add(tf.matmul(t_X, weights), bias, name="net")
    return tf.reshape(tf.nn.sigmoid(net, name="output"),[-1]) #use the sigmoid activation from tensorflow

### **The loss function and its gradient**
Now, to train the network we must define what function we want to minimize. 
There are many loss functions we can choose depending on the problem we want to solve.
In this classification problem we chose the logistic loss function, used in problems of logistic regression. 

The gradient function, grad(), is responsible for calculating the gradients of the logistic loss function. The function tf.GradientTape() is a context manager used to trace all computations, and its method gradient() is used to compute the gradient.

In [None]:
def logistic_loss(predicted,y):
    if type(y).__module__ == np.__name__:  #to deal with tensorboard
      y = tf.constant(y.astype(np.float32))
    cost = -tf.reduce_mean(y * tf.math.log(predicted) + (1-y) * (tf.math.log(1-predicted)))
    return cost


def grad(X, y):
    with tf.GradientTape() as tape: #Gadient_Tape traces all computation and compute the derivatives
        predicted = predict_hidden(X) 
        loss_val = logistic_loss(predicted,y)
    return tape.gradient(loss_val, [weights, bias, weights_hidden, bias_hidden]),[weights,bias,weights_hidden, bias_hidden]

### **Training the network**

The run() function is responsible for creating the loop which enables the training of the network.
Each epoch is one passage of the data trought the loop. In each epoch, the data must be shuffled so that the network is fed with the examples in different orders.

The classes of the examples are predicted and this predicition is fed to the logistic loss. 
We then return two lists with the evolution os the logistic loss during training.

In [None]:
def run():
    epoch_list = []
    loss_list = []

    for epoch in range(epochs):
        shuffled = np.arange(len(Ys))
        np.random.shuffle(shuffled)
        for batch_num in range(batches_per_epoch):
            start = batch_num*batch_size
            batch_xs = Xs[shuffled[start:start+batch_size],:]
            batch_ys = Ys[shuffled[start:start+batch_size]]
            gradients,variables = grad(batch_xs, batch_ys)
            optimizer.apply_gradients(zip(gradients, variables))
        y_pred = predict_hidden(Xs)
        loss = logistic_loss(y_pred,Ys)
        
        epoch_list += [epoch]
        loss_list += [float(loss)]
    return epoch_list, loss_list

### **Solving the XOR problem**

Import the dataset. In this case, we define X and Y, where X corresponds to the coordinates and Y to the labels.

We define the parameters of our training loop. 
This includes the optimizer, in this case we chose a SGD, strochastic gradient descend, and we must define the learning rate and momentum. The learning rate must be appropriate so that the network does not take too long to train, but slow enough not to cause convergence disturbances. 
The momentum is used in order to increase performance of our stocastic gradient descend. It adds the previous gradient directions, which increases the learning rate when we are descending in the same direction repeatedly.

Additionally, the batch size is set to one, which means that only one sample is sent to the network at a time.
The number of epochs has been set to 1000, which means that the dataset will be fed to the network 1000 times during training.

In [None]:
#Dataset 
Xs = np.array([(0,0),(0,1),(1,0),(1,1)])
Ys = np.array([0,1,1,0])

#Training loop
optimizer = tf.optimizers.SGD(learning_rate=0.1, momentum = 0.9) 
batch_size = 1 
batches_per_epoch = Xs.shape[0]//batch_size
epochs=1000
epoch_list_tf, loss_list_tf = run()

### **Results**

We can assess the performance of our model by plotting the evolution of the loss function value in each epoch. As we can see, we are descending through the loss function values by using the stocastic gradient descend.

In [None]:
#Plot the loss function for each epoch
data = {"epoch_list_tf": epoch_list_tf,
        "loss_list_tf": loss_list_tf}

#We create a dataframe so we can apreciate the evolution of the loss function through the epochs
df_tf = pd.DataFrame(data = data)

fig, ax = plt.subplots(figsize=(15, 10))
sns.scatterplot(data=df_tf, x='epoch_list_tf', y='loss_list_tf')
plt.title('Value of the loss function for each epoch')
plt.xlabel('Epoch')
plt.ylabel('Loss function value')
plt.show()

## **Tensorboard**

Tensorboard is a visualization tool that allows us to easily monitor our models' training. 

To begin, we must create a file in which to register the information about the computation. This file must have a different name each time we run the computation and save it, so we use the date in the name to achieve this effect.

In [None]:
from datetime import datetime

now = datetime.utcnow().strftime("%Y%m%d%H%M%S")
root_logdir = "logs"
log_dir = "{}/model-{}/".format(root_logdir, now)

We start by creating a function which is responsible for creating a graph, and use the decorator @tf.function that executes a tensorflow graph. We use the grad() function which calls all the computation for the exercise. 

We also define a write_graph() function which uses the tf.summary.trace_on to start a trace record and he trace_export() to write the results in another document using the writer. 

In [None]:
@tf.function
def create_graph(X,Y):
  _=grad(X,Y)


def write_graph(X,Y,writer):
  tf.summary.trace_on(graph=True) #the summary is going to write to the log the graph of the computation
  create_graph(tf.constant(X.astype(np.float32)),
               tf.constant(Y.astype(np.float32))) #Create the graph in tensorflow
  with writer.as_default():
    tf.summary.trace_export(name="trace",step=0)


writer = tf.summary.create_file_writer(log_dir)
write_graph(Xs,Ys,writer)

## **Implementation of a convolutional neural network using Keras Sequential API**

A neural network will be created using Keras Sequential API in order to classify images from the fashion_mnist dataset. This dataset includes images of clothes of 10 different "classes", for example, trousers and skirts.

To begin, we load the tensorflow and the keras library.

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import BatchNormalization,Conv2D,MaxPooling2D
from tensorflow.keras.layers import Activation, Flatten, Dropout, Dense
from datetime import datetime

### **Upload and convert the dataset**

First, the data is imported and is divided into a training and a test set, in order to obtain train and test error measurements.

#### **Convert features**
The data must be converted into tensors using the reshape() function. Each image from the dataset consists of an image of 28x28 pixels in black and white, which means each image has only one layer when converted to tensor, with a size of 28x28.

#### **One-hot encoding of the labels**
For the class of each image, we one-hot encode the labels in order to convert each one of them into a 10 value variable. This will match the result of our output layer neurons. 

In [None]:
#LOAD DATA
((trainX, trainY), (testX, testY)) = keras.datasets.fashion_mnist.load_data()

#Reshape into 28x28x1
trainX = trainX.reshape((trainX.shape[0], 28, 28, 1)) #28x28 pixels, 1 color (black and white)
testX = testX.reshape((testX.shape[0], 28, 28, 1))

#Convert to tensors and normalize the data
trainX = trainX.astype("float32") / 255.0
testX = testX.astype("float32") / 255.0

#one-hot encode the training and testing labels
trainY = keras.utils.to_categorical(trainY, 10) #convert the integers to encoder?
testY = keras.utils.to_categorical(testY, 10)

### **Neural network definition**

#### **Create the neural network**
To define the neural network we define a function called create_model(). 
First, we set model as an object of sequential class using the function Sequential(). 

#### **(1.) - convolutional layers with 32 neurons**
In (1.), 2 convolutional layers with 32 neurons each are defined. In the first one, the input_shape defines the shape according to our dataset (in this case, 28x28x1). In the remaining ones, there is no need for this procedure as the input shape will depend on the output of the previous layer. The relu activation function is used to prevent the vanishing gradients problem and the BatchNormalization() function, which normalizes the input, is inserted between each layer to prevent numerical problems.

The function MaxPooling2D() is used to downsample the input along its spatial dimensions (height and width) by taking the maximum value over an input window, defined as pool_size. In this case, we define this input window with a size of 2x2. As a result, each time we use this function, we reduce the size of the input by four times.

#### **(2.) - convolutional layers with 64 neurons**
In (2.) , 2 convolutional layers with 64 neurons each are defined. 

#### **Dense layers**
After this, two last dense layer are defined. First, we use the function Flatten() to convert an n dimensional tensor to a 1D dimensional tensor. A dense layer with 512 neurons is created and the Relu activation function and the BatchNormalization() is yet again used. 

The dropout() function is used for regularization. Dropout is a technique where randomly selected neurons are ignored during training and they are temporally ignored for the neurons' weights updates. 

The output layer is composed of 10 neurons, one for each class of the dataset.The softmax activation is used as the activation function for multi-class classification problems where exists more than two class labels, mutually exclusive.

In [None]:
#DEFINE THE NETWORK
def create_model():
    model = Sequential() #object of sequencial class
    
    # (1.)
    model.add(Conv2D(32, (3, 3), padding="same", input_shape=(28,28,1))) #32 filters (depth of 32), padding same of zeros around the input, preserves the dimensions, inputshape is obligatory just for the first layer (28x28 images, 1 pq pb), so the output willl be 28x28x32, 3x3 receptive field
    model.add(Activation("relu")) 
    model.add(BatchNormalization())
    
    model.add(Conv2D(32, (3, 3), padding="same")) #32 filters (depth of 32), padding same of zeros around the input, preserves the dimensions, inputshape is obligatory just for the first layer (28x28 images, 1 pq pb), so the output willl be 28x28x32, 3x3 receptive field
    model.add(Activation("relu")) 
    model.add(BatchNormalization())
    
    model.add(MaxPooling2D(pool_size=(2, 2))) #Pooling
    
    # (2.)
    model.add(Conv2D(64, (3, 3), padding="same")) #add convolutions, do not specify inputformat because it depends on the previous layer
    model.add(Activation("relu")) 
    model.add(BatchNormalization())
    
    model.add(Conv2D(64, (3, 3), padding="same")) 
    model.add(Activation("relu")) 
    model.add(BatchNormalization())
    
    model.add(MaxPooling2D(pool_size=(2, 2))) #Pooling
    
    #dense
    model.add(Flatten()) #convert a n dimensional tensor to 1D tensor
    model.add(Dense(512)) #neurons
    model.add(Activation("relu"))
    model.add(BatchNormalization()) #do not use batch normalization after drop out, the rescaling will mess this!
    model.add(Dropout(0.25)) #Reduce overfitting 
    
    #output layer
    model.add(Dense(10)) #10 image classes, 10 neurons; use drop out on the dense part, not the convulutional 
    model.add(Activation("softmax")) #softmax is used as the activation function for multi-class classification problems where class membership is required on more than two class labels
    
    return model

### **Define model parameters**

The number of epochs and batch size are set once more.
In addition, the initial learning rate is set to 0.005. Remember that increasing the learning rate too much may cause convergence issues, such as rapid convergence to a suboptimal solution. A very low learning rate, on the other hand, will make training the network in useful time impossible.

The optimizer has the ability to impact both training time and performance.
We use a stochastic gradient descend optimizer with an initial learning rate of 0.005, but we add momentum, which accumulates previous gradient directions and speeds up training when we are descending in the same direction continuously.

Then, we create and compile the model, using categorical crossentropy as the loss function because this is a multi-class classification problem.

The degree to which a value is close to its true value is defined as accuracy. This metric will be used to assess the performance of this model, but many others may be used as well.

In [None]:
#Define parameters
NUM_EPOCHS = 10
INIT_LR  = 0.005
BS = 16 #batch size

optimizer = SGD(learning_rate=INIT_LR, momentum=0.9, decay=INIT_LR / NUM_EPOCHS)
model = create_model()
model.compile(loss="categorical_crossentropy", optimizer=optimizer, metrics=["accuracy"])

### **Call tensorboard and train the model**

Here, we call tensorboard in order for us to be able to visualize the train graph and the measurements of performance.

Then, we train our model using the fit() function, using the pre-deifned parameters. 

In [None]:
#Call tensorboard to save the steps
tb_callback = keras.callbacks.TensorBoard(log_dir='./logs', write_graph=True) #to save twice, change the name of the logs, we can add the date and hour to solve this problem


#Train the model with the data
history = model.fit(trainX, trainY, validation_data=(testX, testY),
                    batch_size=BS, epochs=NUM_EPOCHS,
                    callbacks = [tb_callback])

### **Network summary**

The function summary() may be used to visualize the neural network information, including the output shape of each layer and the number of parameters. 

In [None]:
model.summary()

### **Results**

Finally, we can plot the evolution of the loss function value in each epoch. 

In [None]:
#Plot the loss function for each epoch
data_ks = {"epoch_list_ks": np.arange(1, NUM_EPOCHS + 1, 1),
           "loss_function_ks": history.history["loss"],
           "validation_loss_function_ks": history.history["val_loss"],
           "accuracy_ks": history.history["accuracy"],
           "validation_accuracy_ks": history.history["val_accuracy"]}

#We create a dataframe so we can apreciate the evolution of the loss function through the epochs
df_ks = pd.DataFrame(data = data_ks)
print(df_ks)


#Create a plot
fig, ax = plt.subplots(figsize=(10, 7))
sns.scatterplot(data=df_ks, x='epoch_list_ks', y='loss_function_ks')
sns.lineplot(data=df_ks, x='epoch_list_ks', y='loss_function_ks')
sns.scatterplot(data=df_ks, x='epoch_list_ks', y='validation_loss_function_ks')
sns.lineplot(data=df_ks, x='epoch_list_ks', y='validation_loss_function_ks')
plt.title('Value of the loss function for each epoch')
plt.xlabel('Epoch')
plt.ylabel('Loss function value')
plt.show()

In [None]:
#Plot the accuracy for each epoch
fig, ax = plt.subplots(figsize=(10, 7))
sns.scatterplot(data=df_ks, x='epoch_list_ks', y='accuracy_ks')
sns.lineplot(data=df_ks, x='epoch_list_ks', y='accuracy_ks')
sns.scatterplot(data=df_ks, x='epoch_list_ks', y='validation_accuracy_ks')
sns.lineplot(data=df_ks, x='epoch_list_ks', y='validation_accuracy_ks')
plt.title('Value of accuracy for each epoch')
plt.xlabel('Epoch')
plt.ylabel('Loss function value')
plt.show()

# **Autoencoder and Keras Functional API**

Instead of building a neural network capable of classifying the images in the MNIST dataset, we build an autoencoder in this exercise. An autoencoder is an unsupervised neural network that learns useful data representations and produces output that is similar to the input. In this exercise, we train an autoencoder and then use the encoder's output to represent datapoints in a two-dimensional plot.
Again, we import useful libraries.

In [None]:
from tensorflow import keras
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.models import Sequential
from tensorflow.keras.models import Model
from tensorflow.keras.layers import UpSampling2D,Reshape,Input
from tensorflow.keras.layers import BatchNormalization,Conv2D,MaxPooling2D
from tensorflow.keras.layers import Activation, Flatten, Dropout, Dense
from matplotlib import pyplot as plt
import numpy as np

We import the data set and normalize it.

In [None]:
#Import MNIST dataset
((trainX, trainY), (testX, testY)) = keras.datasets.mnist.load_data()
trainX = trainX.reshape((trainX.shape[0], 28, 28, 1))
testX = testX.reshape((testX.shape[0], 28, 28, 1))

#Normalize data
trainX = trainX.astype("float32") / 255.0
testX = testX.astype("float32") / 255.0

**Define the autoencorder**

To define the neural network we define a function. We called it encoder().

Firstly we define the input format as 28x28x1, corresponding to the 28x28 pixels and one layer, as the images are black and white (Comment 1.). 

In the first convolutional layer (2.), we create a representation of the data corresponding to 28x28x32, with 32 being the number of filters. Again, we use batch normalization. Next, we use the MaxPooling2D() function, , in order to reduce the dimension of our data to 14x14x32.

We add another convolutional layer (3.) with the same number of filters in order to represent the data without reducing its size, due to the great reduction obtained by the transformations from the previous layer. 

The next layers 4., 5. and 6. are used to shrink the dimensions of our representation of the data. Then we flatten the data (7.), which means we represent the data that was previously in a 7x7x8 format (resulting from the 8 filters in the layer 6.) to a 398 format.

Layer 8. is composed of 2 neurons, which means we can represent the data points so that the values of each point in one neuron corresponds to the x axis and the values in the other neuron will be represented on the y axis. 

The layers represented in 9. through 14. are responsible for retrieving a representation of the dataset similar to that of the input data. Since we fed the network the normalized inputs, it is reasonable to use the sigmoid activation function, whose values also range from [0,1].

Notice that with the functional API each transformation receives the values from the previous "layer".  

The function retrieves two sets of results: autoencoder and encoder. The autoencoder are the results from the last layer. The encoder retrieves the results from the encoding part only, which is a representation of the dataset in only two neurons. 


In [None]:
#Define autoencoder function

def autoencoder():
  # 1. Input layer: image format 28x28x1
  inputs = Input(shape=(28,28,1),name='inputs')

  # 2. Conv 1: from 28x28 to 14x14 (Maxpooling)
  layer = Conv2D(32, (3, 3), padding="same", input_shape=(28,28,1))(inputs)
  layer = Activation("relu")(layer)
  layer = BatchNormalization(axis=-1)(layer)
  layer = MaxPooling2D(pool_size=(2, 2))(layer)

  # 3. Conv
  layer = Conv2D(32, (3, 3), padding="same", input_shape=(28,28,1))(layer)
  layer = Activation("relu")(layer)
  layer = BatchNormalization(axis=-1)(layer)
  
  # 4. Conv:from 14x14 to 7x7 (Maxpooling)
  layer = Conv2D(16, (3, 3), padding="same")(layer)
  layer = Activation("relu")(layer)
  layer = BatchNormalization(axis=-1)(layer)
  layer = MaxPooling2D(pool_size=(2, 2))(layer)

  # 5. Conv 
  layer = Conv2D(16, (3, 3), padding="same")(layer)
  layer = Activation("relu")(layer)
  layer = BatchNormalization(axis=-1)(layer)

  # 6. Conv: 8 filters: from 16x7x7 to 8x7x7 
  layer = Conv2D(8, (3, 3), padding="same")(layer)
  layer = Activation("relu")(layer)
  layer = BatchNormalization(axis=-1)(layer)

  # 7. Flattens the output from 8*7*7 to 398
  layer = Flatten()(layer) 

  # 8. Middle layer: Layer with two neurons that will define the representation coordinates (no activation function for representation)
  features = Dense(2,name='features')(layer)
  layer = BatchNormalization()(features)

  # 9. Converts the output from 398 to 8*7*7 and reshapes to the shape of the convolutional
  layer = Dense(8*7*7,activation="relu")(features) #8*7*7 (7 é o 29/2/2 e o 8 corresponde aos 8 filtros)
  layer = Reshape((7,7,8))(layer) 
  
  layer = Conv2D(8, (3, 3), padding="same")(layer)
  layer = Activation("relu")(layer)
  layer = BatchNormalization(axis=-1)(layer)
  
  # 10. Conv: From 7 to 14 (Upsizing2D)
  layer = Conv2D(16, (3, 3), padding="same")(layer)
  layer = Activation("relu")(layer)
  layer = BatchNormalization(axis=-1)(layer)
  layer = UpSampling2D(size=(2,2))(layer) #contrario de MaxPooling

  # 11. Conv
  layer = Conv2D(16, (3, 3), padding="same")(layer)
  layer = Activation("relu")(layer)
  layer = BatchNormalization(axis=-1)(layer)

  # 12. Conv: From 14 to 28 (Upsizing2D)
  layer = Conv2D(32, (3, 3), padding="same")(layer)
  layer = Activation("relu")(layer)
  layer = BatchNormalization(axis=-1)(layer)
  layer = UpSampling2D(size=(2,2))(layer) #contrario de MaxPooling

  # 13. Conv
  layer = Conv2D(32, (3, 3), padding="same")(layer)
  layer = Activation("relu")(layer)
  layer = BatchNormalization(axis=-1)(layer)

  # 14. Last Conv: 1 layer, back to original format 28*28*1
  layer = Conv2D(1, (3, 3), padding="same")(layer)
  layer = Activation("sigmoid")(layer) #as imagens estao todas entre 0 e 1, mais facil de visualizar

  # 15. Create two models: complete autoencoder and encoder
  autoencoder = Model(inputs= inputs, outputs= layer)
  encoder = Model(inputs= inputs, outputs= features) #only the encoding part

  return autoencoder, encoder

Next, we call the autoencoder function and attribute a name to each model in this function. The network parameters are set. We select the binary crossentropy as loss function. Both the mean-squared error (MSE) and the binary cross-entropy are two commonly used loss functions for training autoencoders (BCE). When training autoencoders on image data, BCE is usually loss function chosen because pixel values can be normalized to take values in [0,1] and the decoder model can be designed to generate samples with values in [0,1], as happens in this case. 

Finally, the Adam optimizer was chosen. Similarly to momentum, it improves performance by using the previous gradients, but in this case, an exponentially decaying average of the previous gradients, which appears to guarantee better performance.


In [None]:
#Call the autoencoder function 
ae,enc = autoencoder()

#Define optimizer and compile autoencoder
NUM_EPOCHS = 25
BS = 128

#ae.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"])
ae.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])

#Model summary
ae.summary()

#Fit model to dataset
ENC = ae.fit(trainX, trainX, validation_data=(testX, testX), batch_size=BS, epochs=NUM_EPOCHS)

We may use the values obtained to make a plot that helps us visualize the evolution of the loss function and accuracy in each epoch.

In [None]:

#Plot the loss function for each epoch
data_kf = {"epoch_list_kf": np.arange(1, NUM_EPOCHS + 1, 1),
           "loss_function_kf": ENC.history["loss"],
           "validation_loss_function_kf":  ENC.history["val_loss"],
           "accuracy_kf": ENC.history["accuracy"],
           "validation_accuracy_kf": ENC.history["val_accuracy"]}

#We create a dataframe so we can apreciate the evolution of the loss function through the epochs
df_ks = pd.DataFrame(data = data_ks)


#Create a plot
fig, ax = plt.subplots(figsize=(10, 7))
sns.scatterplot(data=df_kf, x='epoch_list_kf', y='loss_function_kf')
sns.lineplot(data=df_kf, x='epoch_list_kf', y='loss_function_kf')
sns.scatterplot(data=df_kf, x='epoch_list_kf', y='validation_loss_function_kf')
sns.lineplot(data=df_kf, x='epoch_list_kf', y='validation_loss_function_kf')
plt.title('Value of the loss function for each epoch')
plt.xlabel('Epoch')
plt.ylabel('Loss function value')
plt.show()

In [None]:
#Plot the accuracy for each epoch
fig, ax = plt.subplots(figsize=(10, 7))
sns.scatterplot(data=df_kf, x='epoch_list_kf', y='accuracy_kf')
sns.lineplot(data=df_kf, x='epoch_list_kf', y='accuracy_kf')
sns.scatterplot(data=df_kf, x='epoch_list_kf', y='validation_accuracy_kf')
sns.lineplot(data=df_kf, x='epoch_list_kf', y='validation_accuracy_kf')
plt.title('Value of accuracy for each epoch')
plt.xlabel('Epoch')
plt.ylabel('Loss function value')
plt.show()

Finally, we may take the output of the two neurons of the last layer of the encoder to plot the representation of data in two axis.

In [None]:
#Plot a representation of the data expressed by the two neurons in the middle of the network 
def plot_representation():
  ae,enc = autoencoder() #calls the function and ae is model autoencoder and enc is the encoder
  #ae.load_weights('mnist_autoencoder.h5')
  encoding = enc.predict(testX)
  plt.figure(figsize=(8,8))
  for cl in np.unique(testY):
    mask = testY == cl
    plt.plot(encoding[mask,0],encoding[mask,1],'.',label=str(cl)) #no x os valores do 1 neuronio, no y os valores do 2
  plt.legend()

plot_representation()