# Basic dense (fully connected) neural network models



### Basic simple neural network model

In the previous notebooks, we saw how to implement simple neural network models with **just the output layer** for logistic, softmax (multiclass) and linear regression problems: this output layer had **only one node** (logistic and linear regression) which performed both the linear combination of input variables + bias and the sigmoid/linear activation:

<img src="https://drive.google.com/uc?export=view&id=1PRc719uT1kOUuCMbpHML2sEk7qp6UJnm">

(Softmax regression is slightly different: the single output layer has as many nodes as there are classes, each calculating the linear combination of input variables and the softmax activation).


### Basic dense neural network model

We are now building a **neural network model**, by adding **one hidden layer** (not deep) with **u nodes** (units):

<img src="https://drive.google.com/uc?export=view&id=1QROz9pFnMoqTeqrFbele8pFz8qXDSckq">

There's a number of `hyperparameters`:

- the **number of hidden nodes** (number of units in the hidden layer)
- the **type of activation function** in the hidden layer
- the **output activation function**
- the **loss function** (for backpropagation)
- the **optimizer** (for gradient descent)

By stacking together more than one hidden/intermediate layer (additional hyperparameter), we can then build **deep neural networks**.

## Loading libraries and setting the random seed

First of all, we load some necessary libraries; then we setup the random seed to ensure reproducibility of results. Since tensorflow uses an internal random generator we need to fix both the general seed (via numpy `seed()`) and tensorflow seed (via `set_seet()`)

In [None]:
import numpy as np
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt

  # Set the seed using keras.utils.set_random_seed. This will set:
  # 1) `numpy` seed
  # 2) `tensorflow` random seed
  # 3) `python` random seed
tf.keras.utils.set_random_seed(15)

  # This will make TensorFlow ops as deterministic as possible, but it will
  # affect the overall performance, so it's not enabled by default.
  # `enable_op_determinism()` is introduced in TensorFlow 2.9.
tf.config.experimental.enable_op_determinism()

## Get the data

We get the usual `iris` dataset:

In [None]:
import sklearn.datasets

(features, target) = sklearn.datasets.load_iris(return_X_y = True) ## feature names are not returned
print(features.shape)
print(target.shape)

This is a three-class problem, and for the logistic regression example we need to binarise it:

In [None]:
unique, counts = np.unique(target, return_counts=True)
print(np.asarray((unique, counts)).T)

In [None]:
#updating class labels. To makes things difficult we put together old classes 0 and 1
#in a new class (non virginica) and keep old class 2 (virginica) as new class 1.
#For an easier problems put together versicolor and virginica and keep setosa by itself
j = 100 ## split: 50 for setosa vs versicolor+virginica, 100 for setosa+versicolor vs virginica
binary_target = np.copy(target)
binary_target[0:j] = 0
binary_target[j:150] = 1

In [None]:
unique, counts = np.unique(binary_target, return_counts=True)
print(np.asarray((unique, counts)).T)

## Training and validation sets

In [None]:
#we want to have the same proportion of classes in both train and validation sets
from sklearn.model_selection import StratifiedShuffleSplit

#building a StratifiedShuffleSplit object (sss among friends) with 20% data
#assigned to validation set (here called "test")
sss = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=0)

#the .split() method returns (an iterable over) two lists which can be
#used to index the samples that go into train and validation sets
for train_index, val_index in sss.split(features, binary_target):
    features_train = features[train_index, :]
    features_val   = features[val_index, :]
    target_train   = binary_target[train_index]
    target_val     = binary_target[val_index]

#let's print some shapes to get an idea of the resulting data structure
print(features_train.shape)
print(features_val.shape)
print(target_train.shape)
print(target_val.shape)

In [None]:
from collections import Counter

print(Counter(target_train))
print(Counter(target_val))

In [None]:
target_train

## Build the neural network model

We now build our neural network for binary classification: it will be comprised of one intermediate layer and one output layer which that will perform the final classification (actually, the calculation of the probability of belonging to class `1` given the input features: $P(y=1|x$)).

The necessary steps are:

- model set-up (define the hyperparameters)
- model architecture
- compiling (putting together the configuration -model set-up- and the architecture)

In [None]:
## # Configuration options
input_shape = (features.shape[1],) ## tuple that specifies the number of features
hidden_nodes = 8
hidden_activation = 'relu'
output_activation = 'sigmoid'
loss_function = 'binary_crossentropy'
optimizer_used = 'SGD' ##stochastic gradient descent
num_epochs = 100

In [None]:
#we are building a "sequential" model, meaning that the data will
#flow like INPUT -> ELABORATION -> OUTPUT. In particular, we will
#not have any loops, i.e. our output will never be recycled as
#input for the first layer
from keras.models import Sequential

#a "dense" layer is a layer were all the data coming in are connected
#to all nodes (fully connected).
from keras.layers import Dense, Input

# 2-class logistic regression in Keras
model = Sequential()
model.add(Input(input_shape))
model.add(Dense(units=hidden_nodes, activation=hidden_activation))
model.add(Dense(units=1, activation=output_activation))

#the model is declared, but we still need to compile it to actually
#build all the data structures
model.compile(optimizer=optimizer_used, loss=loss_function)

In [None]:
print(model.summary())

The `summary()` method of the Keras model tells us that there are 49  parameters to train:
- w1, w2, w3, w4, b (weights for the 4 features + bias term) for each of the 8 nodes in the hidden layer ($\rightarrow$ (4+1) x 8 = 40 parameters);
- w1 - w8 + b: weights for the results from the 8 intermediate nodes ("new features") + bias term, for the output layer ($\rightarrow$ 8 + 1 = 9 parameters)
- layer 1 (40 parameters) + layer 2 (9 parameters) = 49 total parameters

## Train the neural network

In [None]:
import time

start = time.time()
history = model.fit(features_train, target_train, epochs=num_epochs, validation_data=(features_val, target_val), verbose=0)
end = time.time()
print(end - start)

In [None]:
#function to take a look at losses evolution
def plot_loss_history(h, title):
    plt.plot(h.history['loss'], label = "Train loss")
    plt.plot(h.history['val_loss'], label = "Validation loss")
    plt.xlabel('Epochs')
    plt.title(title)
    plt.legend()
    plt.show()

In [None]:
plot_loss_history(history, 'Logistic (' + str(num_epochs) + ' epochs)')

## Model evaluation

Any model is only useful when it's used to predict new, unknown data. The validation set was put apart and not really used for training for this specific reason.

Here we look at the following ways to evaluate our neural network model:

- error-rate / accuracy
- confusion matrix

To calculate the accuracy of the trained neural network model for binary classification, we first need to get the **predictions made on the validation set**.
Luckily, it's very easy to apply a trained model to new data (the validation set) via the [predict() method](https://keras.io/api/models/model_training_apis/#predict-method).

We can thus get our prediction for the iris flowers (see below the first 5 predictions):

In [None]:
predictions = model.predict(features_val)
print(predictions[0:5])

We plot the histogram of predictions (in the interval [0,1]), alongside the **0.5 classification threshold**

In [None]:
plt.hist(predictions)
plt.axvline(0.5, color='red', linestyle='dashed', linewidth=1)
plt.show()

#### Error rate / accuracy

In [None]:
predicted_class = np.where(predictions > 0.5, "virginica", "non-virginica")
target_class = np.where(target_val == 1, "virginica", "non-virginica")
target_class = target_class.reshape(len(target_class),1)

results = target_class == predicted_class

In [None]:
errors = np.invert(results).sum()
correct_predictions = results.sum()
total_n_predictions = len(results)

print("Error rate:", round(errors/total_n_predictions, 3))
print("Accuracy:", round(correct_predictions/total_n_predictions, 3))

#### Confusion matrix

In [None]:
from sklearn.metrics import confusion_matrix

labels = ['non-virginica','virginica']
con_mat_df = confusion_matrix( y_true = target_class, y_pred = predicted_class, labels=labels) #true are rows, predicted are columns
pd.DataFrame(
    con_mat_df,
    index = ['true:'+x for x in labels],
    columns = ['pred:'+x for x in labels])

## What if we want to add more layers?

This is very simple: you just need to specify one (or more) additional layer(s): see the example below.

For any additional layer, you need also to specify the number of units and the activation function (additional hyperparameters to fine-tune: the number of layers is itself another hyperparameter to tune).

In [None]:
input_shape = (features_train.shape[1],) ## tuple that specifies the number of features
hidden_nodes_1 = 8
hidden_nodes_2 = 5
hidden_activation_1 = 'relu'
hidden_activation_2 = 'relu'
output_activation = 'sigmoid'
loss_function = 'binary_crossentropy'
optimizer_used = 'rmsprop' ## Root Mean Square Propagation
num_epochs = 100

In [None]:
## resetting the seed (new model graph by tensorflow: seed needs to be specified again)
def reset_random_seeds(nseed, enable_determinism=False):
    tf.keras.utils.set_random_seed(nseed)
    #np.random.seed(n2)
    if enable_determinism:
        tf.config.experimental.enable_op_determinism()

reset_random_seeds(19)

# binary classification shallow neural network model in Keras
model = Sequential()
model.add(tf.keras.Input(input_shape))
model.add(Dense(units=hidden_nodes_1, activation=hidden_activation_1))
model.add(Dense(units=hidden_nodes_2, activation=hidden_activation_2))
model.add(Dense(1, activation=output_activation))

#the model is declared, but we still need to compile it to actually
#build all the data structures
model.compile(optimizer=optimizer_used, loss=loss_function)

In [None]:
print(model.summary())

<u>Parameters breakdown</u>:
- layer 1: 8 x (4 + 1) = 40
- layer 2: 5 x (8 + 1) = 45
- layer 3: 1 x (5 + 1) = 6
- 40 + 45 + 6 = 91 total parameters

In [None]:
import time

start = time.time()
history = model.fit(features_train, target_train, epochs=num_epochs, validation_data=(features_val, target_val), verbose=0)
end = time.time()
print(end - start)

It usually takes longer to train a larger neural network: not necessarily this translates into a better performance of the model.

In [None]:
predictions = model.predict(features_val)

predicted_class = np.where(predictions > 0.5, "virginica", "non-virginica")
target_class = np.where(target_val == 1, "virginica", "non-virginica")
target_class = target_class.reshape(len(target_class),1)

labels = ['non-virginica','virginica']
con_mat_df = confusion_matrix( y_true = target_class, y_pred = predicted_class, labels=labels) #true are rows, predicted are columns
pd.DataFrame(
    con_mat_df,
    index = ['true:'+x for x in labels],
    columns = ['pred:'+x for x in labels])