In [None]:
250306

In [66]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

# Lab 8 - Multi-layer Perceptron Forward Pass

## Part I
For this exercise you will implement a simple 2-layer perceptron (forward pass)

For the first part you'll write a function that computes the forward pass of a 2-layer perecptron that predicts the prices of houses, using the usual Boston housing dataset.

In [67]:
boston = pd.read_csv('data/housing.csv')

# I changed for "housing.csv" because the original file came with an error that didn't the
# columns' labels, so I got a version we used in lab 2

As usual, consider the MEDV as your target variable. 
* Split the data into training, validation and testing (70,15,15)% (you will need this for the next lab as we will build from this lab)

In [68]:
X_train, X_to_split, y_train, y_to_split = train_test_split(boston.drop(['MEDV'], axis=1), boston['MEDV'], train_size=.7)
X_test, X_validation, y_test, y_validation = train_test_split(X_to_split, y_to_split, train_size=.5)

X_train = X_train.to_numpy()
X_test = X_test.to_numpy()
X_validation = X_validation.to_numpy()

y_train = y_train.to_numpy().reshape(-1, 1)
y_test = y_test.to_numpy().reshape(-1, 1)
y_validation = y_validation.to_numpy().reshape(-1, 1)

Now you will write the function that computes the forward pass. 
* I provide here a structure that you can follow for your function, but again, feel free to modify it as you see fit.
* Use the sigmoid function as the activation of the hidden layer.
* Don't forget about the biases!
* *It is up to you to think what should be the activation for the output layer.*

In [69]:
def sigmoid_activation(z: np.ndarray | pd.Series) -> np.ndarray:
    """Sigmoid Function

    Args:
        z (np.ndarray | pd.Series): The vector that we'll apply the sigmoid function
    """
    return 1 / (1 + np.exp(-z))

In [70]:
np.random.seed(0)

def two_layer_perceptron(X, activation, dim_input, dim_hidden, dim_output):
    """
    Implements the forward pass of a two-layer fully connected perceptron.
    
    Parameters
    ----------
    X : a 2-dimensional array
        the input data
    activation : function
        the activation function to be used for the hidden layer
    dim_input : int
        the dimensionality of the input layer
    dim_hidden : int
        the dimensionality of the hidden layer
    dim_output : int
        the dimensionality of the output layer
    Returns
    -------
    y_pred : float
        the output of the computation of the forward pass of the network
    """
    W_1 = np.random.random_sample((dim_input, dim_hidden))
    W_2 = np.random.random_sample((dim_hidden, dim_output))

    Y = activation( X @ W_1 ) @ W_2

    return Y

Calculate the RMSE of the forward pass. 

**Personal note**: This algorithm is called _multi-layer perceptron_, but it has nothing to do with the original perceptron used for classification. It is because a "historic residue".

In [71]:
# Because we are not learning the weights, I'll just use the training set for this one
predicted_y_train = two_layer_perceptron(X_train, sigmoid_activation, X_train.shape[1], 8, 1)

# Personal note: The output_dim = 1 because we want to make a predicion, so I'll not make any type
# of activation on the output dimension, only a regular linear regression

difference = predicted_y_train - y_train
RMSE = np.sqrt((1/y_train.shape[0])*(difference.T @ difference)**2)[0, 0]

## Part II 

For this exercise you will write a function that calculates the forward pass of a 2-layer perceptron that predicts the exact digit from a hand-written image, using the MNIST dataset. 

In [72]:
from sklearn.datasets import load_digits

In [73]:
digits = load_digits()

In [74]:
X = digits.data
y = digits.target.reshape(-1, 1)

In [75]:
X.shape

(1797, 64)

Again, you will split the data into training, validation and testing.

In [77]:
X_train, X_to_split, y_train, y_to_split = train_test_split(X, y, train_size=.7)
X_test, X_validation, y_test, y_validation = train_test_split(X_to_split, y_to_split, train_size=.5)

Write a function that calculates the forward pass for this multi-class classification problem.
* You will use the sigmoid activation function for the hidden layer.
* For the output layer you will have to write the softmax activation function (you can check the slides)
* __Note:__ you can easily re-use the function that you coded for Part I if you do a simple modification and also include an input argument for the activation of the output layer.

In [78]:
def softmax_activation(Z: np.ndarray) -> np.ndarray:
    """Softmax activation function. It receives a matrix and calculates the probability of each class given
    that each column represents a 

    Args:
        Z (np.ndarray): matrix (or array, but can't be a flatten array) that will have softmax applied

    Returns:
        np.ndarray: Returns the matrix with softmax applied
    """
    X = Z.copy()

    for i in range(Z.shape[0]):
        for j in range(Z.shape[1]):
            X[i, j] = np.exp(Z[i, j]) / sum(map(lambda z_ij: np.exp(z_ij), Z[i]))
    
    return X

In [106]:
def multiclass_classification(X, activation, dim_input, dim_hidden, dim_output):
    """
    Neural network that make a multiclass classification
    
    Parameters
    ----------
    X : a 2-dimensional array
        the input data
    activation : function
        the activation function to be used for the hidden layer
    dim_input : int
        the dimensionality of the input layer
    dim_hidden : int
        the dimensionality of the hidden layer
    dim_output : int
        the dimensionality of the output layer
    Returns
    -------
    y_pred : float
        the output of the computation of the forward pass of the network
    """
    W_1 = np.random.random_sample((dim_input, dim_hidden))
    W_2 = np.random.random_sample((dim_hidden, dim_output))

    A = activation(X @ W_1) @ W_2
    Y = softmax_activation(A)

    return Y

Lastly, calculate the error of this forward pass using the cross-entropy loss.

In [136]:
# Again, calculating only the training set because we're not learning how to make the model
# learn yet
prediction_matrix = multiclass_classification(X_train, sigmoid_activation, X_train.shape[1], 5, 10)

prediction = np.max(prediction_matrix, axis=1)

CE_ERROR = -np.sum(np.log(prediction)*y)

CE_ERROR

np.float64(14917646.577071635)