## Group - AdAnSo

### Members:
- Adarsh Anand (2003101)
- Aniket Chaudhri (2003104)
- Somesh Agrawal (2003326)

---

## Q1
#### Implement the following operations (forward and backward pass)

  (a) Matrix multiplication layer W X

  (b) Bias addition layer

  (c) Mean squared loss layer

  (d) Soft max layer

  (e) Sigmoid layer

  (f) Cross entropy loss layer

## Import the required libraries

In [2]:
import numpy as np
import matplotlib.pyplot as plt
import math
import random
import time
import sklearn

## Prerequisites

  - **Matrix multiplication layer (WX)**: This layer performs matrix multiplication between the input data and a weight matrix, resulting in a new feature space. The backward pass involves computing the gradients of the weights and input data.

In [3]:
# forward multiplication of weights and input W.T * X
def forwardMultiplication(W, X):
    return np.dot(X, W) 



  - **Bias addition layer**: This layer adds a bias term to the input data, allowing the model to shift the activation function. The backward pass involves computing the gradients of the bias and input data.

In [4]:

# add bias b
def add_bias(X, b):
    return X+b


  - **Sigmoid layer**: This layer applies the sigmoid function to the input data, resulting in a probability value between 0 and 1. The backward pass involves computing the gradients of the input data and the predicted output.

In [5]:

# sigmoid function
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

# derivative of sigmoid function
def sigmoid_prime(z):
    return sigmoid(z) * (1 - sigmoid(z))




  - **Mean squared loss layer**: This layer calculates the mean squared error between the predicted output and the true output. The backward pass involves computing the gradients of the predicted output and true output.

In [6]:

# minimum squared error
def mean_squared_error(y, y_hat):
    return np.sum((y-y_hat)**2)

# derivative of minimum squared error
def mean_squared_error_prime(y, y_hat):
    return 2*(y_hat-y)



  - **Cross entropy loss layer**: This layer calculates the cross-entropy loss between the predicted output and the true output. The backward pass involves computing the gradients of the predicted output and true output.


In [7]:

# cross entropy error
def cross_entropy_error(y, y_hat):
    return -np.sum(y*np.log(y_hat))

# derivative of cross entropy error
def cross_entropy_error_prime(y, y_hat):
    return y_hat-y



  - **Softmax layer**: This layer applies the softmax function to the input data, resulting in a probability distribution. The backward pass involves computing the gradients of the input data and the predicted output.

In [8]:
# softmax function
def softmax(z):
    return np.exp(z) / np.sum(np.exp(z), axis=0)

# derivative of softmax function
def softmax_prime(z):
    return softmax(z) * (1 - softmax(z))

# forward propagation


# Initialise Weights and Biases
The `init_weights_biases` function initializes the weights and biases for a model, it takes a single argument `dim` representing the number of features in the input data, it returns a tuple containing the weight matrix with shape (dim,1) and bias term with shape (1,1) 
 
The `classifier_init_weights_biases` function initializes the weights and biases for a classifier, it takes two arguments `X` and `Y` representing the input data and the output data respectively. it returns a tuple containing the weight matrix with shape (X.shape[1], Y.shape[1]) and bias term with shape (1, Y.shape[1]).


In [9]:
# initialise weights and biases
def init_weights_biases(dim):
    W = np.random.randn(dim, 1) * 0.01
    b = 0
    return W, b

# initialise weights and biases for classifier
def classifier_init_weights_biases(X, Y):
    dim = X.shape[1]
    output_dim = Y.shape[1]
    W = np.random.randn(dim, output_dim) * 0.01
    b = np.zeros((1, output_dim))
    return W, b
    


# Forward Propagation
The `forward_propagation` function performs the forward pass of a model, it takes 4 arguments:
- `w`: the weight matrix 
- `b`: the bias term
- `X`: the input data 
- `Y`: the true output data 
It calculates the dot product of input data and weight matrix, add bias term to it and then calculate the mean squared error between true output and predicted output.

The `classifier_forward_propagation` function performs the forward pass of a classifier, it takes 4 arguments:
- `w`: the weight matrix
- `b`: the bias term
- `X`: the input data 
- `Y`: the true output data 
It calculates the dot product of input data and weight matrix, add bias term to it, applies softmax on it and then calculate the cross-entropy loss between true output and predicted output.


In [10]:

def forward_propagation(w, b, X, Y):
    # print("W received", w)
    m = X.shape[0]
    N = forwardMultiplication(w, X)
    P = add_bias(N, b)
    # A = sigmoid(Z)
    loss = mean_squared_error(Y, P)
    # loss = cross_entropy_error(Y, P)
    # print("Loss", loss)
    # cost = -1/m * np.sum(Y*np.log(A) + (1-Y)*np.log(1-A))
    return P, loss

def classifier_forward_propagation(w, b, X, Y):
    # print("W received", w)
    m = X.shape[0]
    N = forwardMultiplication(w, X)
    P = add_bias(N, b)
    Q = softmax(P)
    loss = cross_entropy_error(Y, Q)

    return Q, loss


# Backward Propagation
The `backward_propagation` function performs the backward pass of a model, it takes 5 arguments:
- `w`: the weight matrix
- `b`: the bias term
- `X`: the input data
- `Y`: the true output data
- `P`: the predicted output

It first calculates the derivative of the loss function with respect to the predicted output, then it calculates the gradient of the weight matrix and bias term with respect to the loss function.

The `classifier_backward_propagation` function performs the backward pass of a classifier, it takes 5 arguments:
- `w`: the weight matrix
- `b`: the bias term
- `X`: the input data
- `Y`: the true output data
- `P`: the predicted output
It first calculates the derivative of the loss function with respect to the predicted output, then it applies the softmax derivative on it, then it calculates the gradient of the weight matrix and bias term with respect to the loss function.


In [11]:
# backward propagation
def backward_propagation(w, b, X, Y, P):
    m = X.shape[0]
    # dZ = A - Y
    dP = mean_squared_error_prime(Y, P)
    # dP = cross_entropy_error_prime(Y, P)
    # dw = 1/m * np.dot(X, dP.T)
    dw = np.dot(X.T, dP)
    db = np.sum(dP)
    # db = 1/m * np.sum(dP)
    return dw, db

def classifier_backward_propagation(w, b, X, Y, P):
    m = X.shape[0]
    dQ = cross_entropy_error_prime(Y, P)
    # dP = dQ * softmax_prime(P)
    dP = softmax_prime(dQ)
    dw = np.dot(X.T, dP)
    db = np.sum(dP)
    # db = 1/m * np.sum(dP)
    return dw, db



# Update Weights and Biases
The `update_weights_biases` function updates the weights and biases of the model based on the gradients calculated during the backward pass, it takes 5 arguments:
- `w`: the weight matrix 
- `b`: the bias term
- `dw`: the gradient of the weight matrix with respect to the loss function
- `db`: the gradient of the bias term with respect to the loss function
- `learning_rate`: the learning rate used for the optimization algorithm

It subtracts the product of the learning rate and the gradients from the weights and biases, respectively, and returns the updated weights and biases.


In [12]:

# update weights and biases
def update_weights_biases(w, b, dw, db, learning_rate):
    w = w - learning_rate * dw
    b = b - learning_rate * db
    return w, b


# Predict Function
The `predict` function generates predictions for new data based on the trained weights and biases of a model. It takes 3 arguments:
- `w`: the weight matrix
- `b`: the bias term
- `X`: the input data

It first calculates the dot product of the input data and the weight matrix, which gives the pre-activation values, then add the bias term to it.
Then it returns the predicted output.

The `classifier_predict` function generates predictions for new data based on the trained weights and biases of a classifier. It takes 3 arguments:
- `w`: the weight matrix
- `b`: the bias term
- `X`: the input data
It first calculates the dot product of the input data and the weight matrix, which gives the pre-activation values, then add the bias term to it, then it applies softmax on the pre-activation values to get the predicted probabilities, then it one hot encodes the predictions.


In [13]:

# predict function
def predict(w, b, X):
    m = X.shape[1]
    N = forwardMultiplication(w, X)
    P = add_bias(N, b)
    # A = sigmoid(Z)
    return P

def classifier_predict(w, b, X):
    m = X.shape[1]
    N = forwardMultiplication(w, X)
    P = add_bias(N, b)
    P_softmax = softmax(P)
    # one hot encode the predictions
    P_softmax = np.eye(10)[P_softmax.argmax(1)]
    
    return P_softmax


# Plot Loss
The `plot_loss` function is used to visualize the loss over iterations. It takes a single argument `losses` which is a list or an array that contains the loss values after each iteration.
It uses the matplotlib library to create a line plot of the loss values over iterations, with the x-axis representing the iteration number and the y-axis representing the loss value. It also includes labels for the x and y-axis and a title for the plot.


In [14]:

# plot loss
def plot_loss(losses):
    plt.plot(losses)
    plt.ylabel('loss')
    plt.xlabel('iterations (per hundreds)')
    plt.title('Loss vs iterations')
    plt.show()



# Model
The `model` function is the main function that coordinates the training process of the model. It takes 7 arguments:
- `X_train`: the training input data
- `Y_train`: the training output data
- `X_test`: the test input data
- `Y_test`: the test output data
- `num_iterations`: the number of iterations for the training process
- `learning_rate`: the learning rate used for the optimization algorithm
- `print_cost`: a flag indicating whether to print the loss value after each iteration

It first initializes the weights and biases using the `init_weights_biases` function, then it performs the forward and backward propagation, and weight and bias updates for the specified number of iterations. It also tracks the loss values at each iteration and appends them to a list. The plot of the loss values is also plotted after the training process. Finally, it returns the trained weights and biases, the list of loss values, and the predictions for the test data.


In [15]:

def model(X_train, Y_train, X_test, Y_test, num_iterations, learning_rate, print_cost):
    w, b = init_weights_biases(X_train.shape[0])
    losses = []
    loss = 0
    for i in range(num_iterations):
        P, loss = forward_propagation(w, b, X_train, Y_train)
        dw, db = backward_propagation(w, b, X_train, Y_train, P)
        w, b = update_weights_biases(w, b, dw, db, learning_rate)
        if i % 100 == 0:
            losses.append(loss)
            if print_cost:
                print('Cost after iteration %i: %f' %(i, loss))
    if print_cost:
        print('Cost after iteration %i: %f' %(num_iterations, loss))
    plot_loss(losses)
    return w, b

# Boston Housing Dataset
The code is importing the Boston Housing Dataset from a URL and using pandas to read it in as a Dataframe.
The dataset contains information about various properties in Boston, including the median value of owner-occupied homes.

The read_csv function is used with the following parameters:
- data_url: URL of the dataset
- sep: separator is set to any whitespace
- skiprows: skipping the first 22 rows 
- header: no header is provided

The dataframe is then converted into a numpy array, and the following operations are performed:
- Stack the dataframe values by taking every 2nd row and concatenating it with the next 2 columns of the every 2nd row, storing the result in variable 'data'
- Storing the 3rd column of every 2nd row in variable 'target'
- Print the data and its shape


In [16]:
import pandas as pd
data_url = "http://lib.stat.cmu.edu/datasets/boston"


raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
target = raw_df.values[1::2, 2]

# print(data)
print(data.shape)

(506, 13)


# Stochastic Gradient Descent
The `run_sgd` function is used to train a model using the stochastic gradient descent optimization algorithm. It takes 8 arguments:
- `X`: the training input data
- `Y`: the training output data
- `X_test`: the test input data
- `Y_test`: the test output data
- `num_iterations`: the number of iterations for the training process
- `learning_rate`: the learning rate used for the optimization algorithm
- `print_cost`: a flag indicating whether to print the loss value after each iteration
- `batch_size`: the size of each batch of data used in each iteration of the training process

It first initializes the weights and biases using the `init_weights_biases` function, then it performs the forward and backward propagation, and weight and bias updates for the specified number of iterations. It performs these operations in a batch wise manner, with each batch of data having the size of `batch_size`. It also tracks the loss values at each iteration and appends them to a list.
It prints the average loss after each 1000 iteration if `print_cost` is True.
It does not return any value as it updates the weights and biases during the training process.


In [17]:

def run_sgd(X, Y, X_test, Y_test, num_iterations, learning_rate, print_cost, batch_size=1):
    losses = []
    loss = 0
    w, b = init_weights_biases(X.shape[1])
    for i in range(num_iterations):
        curr_iter_loss = []
        for batch in range(0, X.shape[0], batch_size):
            X_batch = X[batch:batch+batch_size]
            Y_batch = Y[batch:batch+batch_size]
            
            
            P, loss = forward_propagation(w, b, X_batch, Y_batch)
            dw, db = backward_propagation(w, b, X_batch, Y_batch, P)
            w, b = update_weights_biases(w, b, dw, db, learning_rate)
            losses.append(loss)
            curr_iter_loss.append(loss)
        if i%1000:
            print(np.average(curr_iter_loss))


This code snippet is calling the run_sgd function with the following parameters:

- data: the training input data
- target: the training output data
- data: the test input data
- target: the test output data
- num_iterations: 10
- learning_rate: 1e-9
- print_cost: True
- batch_size: 5


In [18]:
run_sgd(data, target, data, target, 10, 1e-9, True, 5)

6782.30311364966
5368.591912249655
4569.723944316613
4103.017060428749
3817.213452158155
3631.1047683557113
3500.891068293897
3402.8009234066453
3323.8228860831937


In [19]:
run_sgd(data, target, data, target, 10000, 1e-7, True, 2)


254.4691855871671
251.60411816589033
250.20529992095058
249.35875032670168
248.74750648195587
248.24829155550086
247.8080004010619
247.4019129312684
247.01786599555078
246.64960436048682
246.2937618049513
245.94841987428612
245.61239923514756
245.2849053505749
244.96534951266221
244.65325775521475
244.34822410986152
244.0498863294781
243.75791303103418
243.47199666318255
243.1918494606881
242.91720094422215
242.6477962321686
242.38339479042145
242.12376942860698
241.86870544388225
241.6179998606334
241.37146073845136
241.1289065330761
240.890165501344
240.65507514448146
240.42348168584888
240.1952395802047
239.97021105211948
239.7482656615152
239.52927989453863
239.31313677814754
239.09972551692394
238.88894115074348
238.68068423202595
238.47486052138305
238.27138070055693
238.07016010161885
237.87111845146208
237.67417963068766
237.47927144603716
237.28632541558255
237.09527656592925
236.90606324073914
236.71862691991888
236.53291204886148
236.34886587716642
236.16643830629556
235.985

In [20]:
# run_sgd(data, target, data, target, 10, 1e-9, True, 5)


# Q3 Iris Dataset
The code is loading the Iris dataset from the sklearn library, which is a well-known dataset in the field of machine learning and contains 150 samples of iris flowers with 4 attributes.

The load_iris() function is used to load the dataset, which returns an object containing the data and target values.

The data is stored in the variable `X` and the targets are stored in the variable `Y`.

Then the targets are one-hot encoded, which means that it will convert the numerical labels into a binary vector representation, where each sample will be represented as a vector with a length of the number of classes and only one element of that vector will be 1, representing the class of that sample.


In [31]:
# load iris dataset from sklearn
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
Y = iris.target
# one-hot encode Y
Y = np.eye(3)[Y]

# for feature in iris.feature_names:
#     print(feature)

print("Shape of X", X.shape)

Shape of X (150, 4)


In [32]:
Y.shape

(150, 3)