In [155]:
from pandas import read_csv
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow.keras.utils import to_categorical
import time

In [163]:
def load_data(split):

    # Load train data
    data = read_csv('ds1-06-%d-nn-tr.csv' % split)
    X_train = data.iloc[:, :-1].to_numpy()
    y_train = data.iloc[:, -1].to_numpy()

    # Load test data
    data = read_csv('ds1-06-%d-nn-te.csv' % split)
    X_test = data.iloc[:, :-1].to_numpy()
    y_test = data.iloc[:, -1].to_numpy()
    
    # One-hot encoding
    y_train_OHE_OHE = to_categorical(y_train, num_classes=5)
    y_test_OHE_OHE = to_categorical(y_test, num_classes=5)

    return X_train, y_train, X_test, y_test, y_train_OHE_OHE, y_test_OHE_OHE

As many output neurons as categories --> use softmax and train with (pg44)

In [164]:
def create_model(layer_neurons, activation_function, learning_rate=0.1, decay_steps=1000, decay_rate=0.96, loss_function='categorical_crossentropy', output_neurons=5, dynamic = False, optimizer = tf.keras.optimizers.SGD):
    model = tf.keras.models.Sequential()
    
    # Add hidden layers with specified number of neurons and activation function
    for neurons in layer_neurons:
        model.add(tf.keras.layers.Dense(neurons, activation=activation_function))
    
    if dynamic == True:
        lr = tf.keras.optimizers.schedules.ExponentialDecay(
            initial_learning_rate=learning_rate,
            decay_steps=decay_steps,
            decay_rate=decay_rate)
    else: # Para primeras tareas
        lr = learning_rate

    # Output layer with softmax activation for multi-class classification
    model.add(tf.keras.layers.Dense(output_neurons, activation='softmax'))

    # Compile the model with the specified optimizer and loss function
    model.compile(optimizer =  optimizer(learning_rate=lr), 
                  loss = loss_function, 
                  metrics = ['accuracy'])

    return model

### T1. Start with a basic configuration: a single hidden layer network and the most basic activation function. Try with three different amounts of neurons, and keep the best configuration for the next task.
### For training, set a fixed learning rate, the most basic optimizer, a reasonable batch size, the most direct loss function and choose enough epochs to let the training converge. Tune the learning rate until you achieve convergence and a reasonable performance. To measure performance, use the accuracy for the test set.


First of all, we have a look if our classes are enconded as integers or one-hot encoded, in order to use the appropiated `loss_function`.

In [165]:
X_train, y_train, X_test, y_test, y_train_OHE, y_test_OHE = load_data(split=1)
print(len(X_train[0]))

125


Now we can proceed to create a a one-layer network and try it on a different amount of neurons.

In [148]:
# Load data
X_train_1, y_train_OHE, X_test, y_test_OHE = load_data(1)

neurons_amount = [10,50,100]

for neurons in neurons_amount:
    promed = []
    model = create_model(neurons,  activation_function='relu')
    for i in range(5):
        model.fit(X_train, y_train_OHE, epochs = 50, batch_size = 32, verbose=0) # Quitar verbose para ver cada epoch
        performance = model.evaluate(X_test, y_test_OHE, verbose=0)
        promed.append(performance[1])
    print(f"Neurons: {neurons}, Test Accuracy (median): {np.mean(promed)}")

Neurons: 10, Test Accuracy (median): 0.9101796388626099
Neurons: 50, Test Accuracy (median): 0.9233533143997192
Neurons: 100, Test Accuracy (median): 0.9329341292381287


In this task we used the `relu` activation function and the `sparse_categorical_crossentropy` as our lose function, due to we are facing a .........!!!! becauseAs it can see, the best performance is in the biggest amount, having an accuracy of 79%, bigger than the other amounts of neurons. So, we consider to take that amount of `100` neurons as the best configuration and test it for the following task

### T2. Next, check whether a change in the activation function of the hidden layer neurons improves the classification performance. If that is the case, continue with the alternative activation function.

In order to test if a change in the activation functions can work better then the previous one in T1, we are going to test it in a different amount of activation functions. With that, we will test the most used activation functions and we will be able to compare a reasonable amount of activation functions in order to choose the better one.

In [107]:
activation_functions = ['relu','sigmoid', 'tanh', 'leaky_relu', 'elu', 'selu'] 

for activation in activation_functions:
    model = create_model(100, activation)
    model.fit(X_train, y_train_OHE, epochs=50, batch_size=32, verbose=0)
    performance = model.evaluate(X_test, y_test_OHE, verbose=0)
    print(f"Neurons: {neurons}, Activation: {activation}, Test Accuracy: {performance[1]}")

Neurons: 100, Activation: relu, Test Accuracy: 0.9221556782722473
Neurons: 100, Activation: sigmoid, Test Accuracy: 0.7784430980682373
Neurons: 100, Activation: tanh, Test Accuracy: 0.8982036113739014
Neurons: 100, Activation: leaky_relu, Test Accuracy: 0.8982036113739014
Neurons: 100, Activation: elu, Test Accuracy: 0.8622754216194153
Neurons: 100, Activation: selu, Test Accuracy: 0.8982036113739014


As it is shown, the best performance of the activations is the `relu` activation, reaching a 92% of accuracy. Having a good puntuation but still smaller than our prior configuration, that wins in terms of accuracy.

#### T3. Try with a dynamic learning rate and use it from now on if the performance does not get worse

In [124]:
model = create_model(
    neurons=100,
    activation_function='relu',
    decay_steps=1000,
    decay_rate=0.96,
    dynamic = True
)

model.fit(X_train, y_train_OHE, epochs=50, batch_size=32, verbose=0)
performance = model.evaluate(X_test, y_test_OHE, verbose=0)
print(f"Accuracy using Dynamic Learning Rate: : {performance[1]}")

Using dynamic learning rate
Accuracy using Dynamic Learning Rate: : 0.9191616773605347


### T4. Change to an alternative optimizer and keep it if the performance gets better.

In [128]:
optimizers = [tf.keras.optimizers.SGD,
              tf.keras.optimizers.Adam, 
              tf.keras.optimizers.Adadelta, 
              tf.keras.optimizers.RMSprop]

for optimizer in optimizers:
    model = create_model(
        layer_neurons=[100],
        activation_function='relu',
        decay_steps=1000,
        decay_rate=0.96,
        dynamic = True,
        optimizer= optimizer
    )

    model.fit(X_train, y_train_OHE, epochs=50, batch_size=32, verbose=0)
    performance = model.evaluate(X_test, y_test_OHE, verbose=0)
    print(f"Accuracy using {optimizer}: {performance[1]}")

Using dynamic learning rate
Accuracy using <class 'keras.src.optimizers.sgd.SGD'>: 0.9281437397003174
Using dynamic learning rate
Accuracy using <class 'keras.src.optimizers.adam.Adam'>: 0.7934131622314453
Using dynamic learning rate
Accuracy using <class 'keras.src.optimizers.adadelta.Adadelta'>: 0.8233532905578613
Using dynamic learning rate
Accuracy using <class 'keras.src.optimizers.rmsprop.RMSprop'>: 0.8443113565444946


#### T5. Switch to another loss function to check whether the performance level increases

In [173]:
loss_functions = ['categorical_crossentropy','sparse_categorical_crossentropy','Poisson', 'KLDivergence','mean_squared_error']
neurons = [100]
for loss_function in loss_functions:
    model = create_model(
        layer_neurons=neurons,
        activation_function='relu',
        decay_steps=1000,
        decay_rate=0.96,
        dynamic = True,
        optimizer=tf.keras.optimizers.SGD, # Como resultado anterior, seguimos con este
        loss_function=loss_function
    )
    if('sparse_categorical_crossentropy' == loss_function): 
        model.fit(X_train, y_train, epochs=50, batch_size=32, verbose=0)
        performance = model.evaluate(X_test, y_test, verbose=0)
    else:
        model.fit(X_train, y_train_OHE, epochs=50, batch_size=32, verbose=0)
        performance = model.evaluate(X_test, y_test_OHE, verbose=0)
    print(f"Accuracy using {loss_function}: {performance[1]}")

Accuracy using categorical_crossentropy: 0.9131736755371094
Accuracy using sparse_categorical_crossentropy: 0.910179615020752
Accuracy using Poisson: 0.826347291469574
Accuracy using KLDivergence: 0.9221556782722473
Accuracy using mean_squared_error: 0.817365288734436


#### T6. Add a second hidden layer with a reasonable number of neurons and check whether a performance gain is obtained. If that is the case, keep the second layer.

In [154]:
neurons = [[100,50],[100,100], [100,150]]
for neuron in neurons:
    model = create_model(
        layer_neurons=neuron,
        activation_function='relu',
        decay_steps=1000,
        decay_rate=0.96,
        dynamic = True,
        optimizer=tf.keras.optimizers.SGD, # Como resultado anterior, seguimos con este
        loss_function='categorical_crossentropy'
    )

    model.fit(X_train, y_train_OHE, epochs=50, batch_size=32, verbose=0)
    performance = model.evaluate(X_test, y_test_OHE, verbose=0)
    print(f"Accuracy using {neuron[0]} neurons for first layer and {neuron[1]} for second layer: {performance[1]}")

Accuracy using 100 neurons for first layer and 50 for second layer: 0.9371257424354553
Accuracy using 100 neurons for first layer and 100 for second layer: 0.92514967918396
Accuracy using 100 neurons for first layer and 150 for second layer: 0.9341317415237427


#### T7. Try with larger and smaller batch sizes (one of each). You should observe that the training time also changes. Adopt the size leading to highest performance

In [158]:
batchs = [2,8,16,32,54,128]
for batch in batchs:
    model = create_model(
        layer_neurons=[100,150],
        activation_function='relu',
        decay_steps=1000,
        decay_rate=0.96,
        dynamic = True,
        optimizer=tf.keras.optimizers.SGD, # Como resultado anterior, seguimos con este
        loss_function='categorical_crossentropy'
    )
    start_time = time.time()
    model.fit(X_train, y_train_OHE, epochs=50, batch_size=batch, verbose=0)
    performance = model.evaluate(X_test, y_test_OHE, verbose=0)
    print(f"Accuracy using batch of {batch}: {performance[1]}. Execution time = {time.time() - start_time}")

Accuracy using batch of 2: 0.946107804775238. Execution time = 9.706257343292236
Accuracy using batch of 8: 0.9491018056869507. Execution time = 2.8214809894561768
Accuracy using batch of 16: 0.9281437397003174. Execution time = 1.6462347507476807
Accuracy using batch of 32: 0.9281437397003174. Execution time = 1.065023422241211
Accuracy using batch of 54: 0.8772454857826233. Execution time = 0.8220798969268799
Accuracy using batch of 128: 0.8832335472106934. Execution time = 0.647169828414917


#### T8. For the best configuration found, determine the performance of the network using the accuracy, precision, recall and f1 metrics[2]. To this purpose, you have to:

#### a) Train the network for every one of the splits (using the corresponding training set). Show that the training has achieved convergence in each case by means of an appropriate plot.