<center>
In God We Trust
</center>

# CE417: Artificial Intelligence

Dr. Mahdiyeh Soleymani Baghshah, Associate Professor

Computer Engineering Department,
Sharif University of Technology,
Tehran, Tehran, Iran


# MLP MNIST CLassifier (20 Points)

Corresponding TA: Parham Saremi, Aryan Ahadinia

In this question we aim to implement dense neural network from base and train a model for MNIST classification with that. MNIST is a set of 28 by 28 pixels images of handwritten digits. In this problem, you are going to implement neural network using NumPy. You are NOT PERMITTED to use any libraries except NumPy.

**Required features of the model**:
Your implementation should be parametrized and dynamic meaning that your MLP must be instantiated with any number of layers and dimension size for the layers. 

**NOTE**: Most of your score is for your implementation and the existence and the quality of your results (Final numbers doesn't matter that much but your model's ability to learn is important).

**NOTE**: your module's logic must be implemented in NumPy without any python's for loops (or while loops :)). However, you can use for loops for iterating on different layers.


In [1]:
# You are denied to add any other packages.

from abc import abstractmethod
import numpy as np
from tqdm import tqdm


## Loss Function (3 Points)

Loss function is one of the most important part of most of the ML methods. In this part, we want to implement loss function. We implemented an abstract class for loss function `LossFunction`. In following cells, you have to implement Mean Squared Error, Mean Absolute Error and Cross Entropy Loss. You have to implement `forward` and `backward` methods.

Hint: You must save some variables as a field in the class instance in `forward` call. You are going to need them in ‍`backward` call to calculate gradient.


In [2]:
class LossFunction:
    @abstractmethod
    def forward(self, y_hat, y):
        raise NotImplementedError

    @abstractmethod
    def backward(self):
        raise NotImplementedError


In [3]:
class MeanSquaredError(LossFunction):
    def forward(self, y_hat, y):
        # Hint: Saving some fields for backward
        self.m = y.shape[1]
        self.y_hat = y_hat
        self.y = y
        # sum((self.y_hat - self.y) ^ 2)/self.m
        ###################################
        ############ CODE HERE ############
        ###################################
        return np.mean(np.square(y_hat - y))


    def backward(self):
        ###################################
        ############ CODE HERE ############
        ###################################
        return 2 * (self.y_hat - self.y) / self.m

    def __repr__(self):
        return "Mean Squared Error"

    def __str__(self):
        return self.__repr__()


In [4]:
class MeanAbsoluteError(LossFunction):
    def forward(self, y_hat, y):
        self.m = y.shape[1]
        self.y_hat = y_hat
        self.y = y
        ###################################
        ############ CODE HERE ############
        ###################################
        return np.mean(np.abs(y_hat - y))

    def backward(self):
        ###################################
        ############ CODE HERE ############
        ###################################
        return (self.y_hat - self.y) / self.m

    def __repr__(self):
        return "Mean Absolute Error"

    def __str__(self):
        return self.__repr__()


In [5]:
class CrossEntropyLoss(LossFunction):
    def forward(self, y_hat, y):
        self.m = y.shape[1]
        self.y_hat = y_hat
        self.y = y
        ###################################
        ############ CODE HERE ############
        ###################################
        return -np.mean(y * np.log(y_hat) + (1 - y) * np.log(1 - y_hat))

    def backward(self):
        ###################################
        ############ CODE HERE ############
        ###################################
        return (self.y_hat - self.y) / (self.y_hat * (1 - self.y_hat) * self.m)

    def __repr__(self):
        return "Cross Entropy Loss"

    def __str__(self):
        return self.__repr__()


## Activation Functions (3 Points)

Now we are going to implement some activation functions. We will implement the following activation functions: Sigmoid, Leaky ReLU, and Softmax. You have to implement both forward and backward methods for this class.


In [6]:
class ActivationFunction:
    @abstractmethod
    def forward(self, x):
        raise NotImplementedError

    @abstractmethod
    def backward(self):
        raise NotImplementedError


In [7]:
class Sigmoid(ActivationFunction):
    def forward(self, x):
        self.x = x
        ###################################
        ############ CODE HERE ############
        ###################################
        return 1 / (1 + np.exp(-x))


    def backward(self):
        ###################################
        ############ CODE HERE ############
        ###################################
        return self.forward(self.x) * (1 - self.forward(self.x))

    def __repr__(self) -> str:
        return "Sigmoid"

    def __str__(self) -> str:
        return self.__repr__()


In [8]:
class LeakyReLU(ActivationFunction):
    def __init__(self, alpha=0.01):
        self.alpha = alpha

    def forward(self, x):
        self.x = x
        ###################################
        ############ CODE HERE ############
        ###################################
        return np.maximum(x, self.alpha * x)

    def backward(self):
        ###################################
        ############ CODE HERE ############
        ###################################
        return np.where(self.x > 0, 1, self.alpha)

    def __repr__(self) -> str:
        return "Leaky ReLU"

    def __str__(self) -> str:
        return self.__repr__()


In [21]:
class Softmax(ActivationFunction):
    def forward(self, x):
        self.x = x
        ###################################
        ############ CODE HERE ############
        ###################################
        exp = np.exp(x)
        return exp / np.sum(exp, axis=0)

    def backward(self):
        ###################################
        ############ CODE HERE ############
        ###################################
        return(1 - self.forward(self.x)) * self.forward(self.x)

    def __repr__(self) -> str:
        return "Softmax"

    def __str__(self) -> str:
        return self.__repr__()


## Dense Layer (4 Points)

Now it's the time to implement a single dense layer. Each dense layer has an an input vector and output vector size and an activation function. You have to implement two methods: `forward` and `backward`.

Hint: `backward` method get gradient of previous backward step as input, it has to calculate gradient of weights and biases and save them in the class instance and return gradient of this step as output.


In [10]:
class Layer:
    def __init__(self, input_size, output_size, activation):
        self.input_size = input_size
        self.output_size = output_size
        self.activation = activation

        self.w = np.random.randn(output_size, input_size) * np.sqrt(2 / input_size)
        self.b = np.zeros((output_size, 1))

        # Leave these fields unchanged, Use them in forward and backward
        self.z = None  # Output of transformation
        self.a = None  # Output of activation
        self.dw = None  # Gradient of weights
        self.db = None  # Gradient of biases

    def forward(self, x):
        ###################################
        ############ CODE HERE ############
        ###################################
        self.z = self.w @ x + self.b
        self.a = self.activation.forward(self.z)
        return self.a

    def backward(self, da):
        ###################################
        ############ CODE HERE ############
        ###################################
        dz = da * self.activation.backward()
        self.dw = dz @ self.a.T
        self.db = np.sum(dz, axis=1, keepdims=True)
        return self.w.T @ dz

    def update(self, lr):
        self.w -= lr * self.dw
        self.b -= lr * self.db

    def __repr__(self) -> str:
        return f"Dense {self.input_size} -> {self.output_size} with {self.activation}"

    def __str__(self) -> str:
        return self.__repr__()


## Data (2 Points)

In cells below, we have implemented `Data` and `DataLoader` classes. Now you have to use these classes and load MNIST dataset on them. You have to normalize value of each pixel in way that it goes to interval of [0-1] and after that, shift the data in way that it get zero mean. You have to download MNIST data and you are not permitted to use libraries to load that data. You have to create two datasets: Train and Test.


In [11]:
class Data:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __len__(self):
        return len(self.x)

    def __getitem__(self, index):
        return self.x[index], self.y[index]

    def __iter__(self):
        return iter(zip(self.x, self.y))


In [12]:
class DataLoader:
    def __init__(self, data, batch_size):
        self.data = data
        self.batch_size = batch_size

    def __iter__(self):
        for i in range(0, len(self.data), self.batch_size):
            yield self.data[i : i + self.batch_size]


In [13]:
###################################################
########## Load MNIST Dataset, CODE HERE ##########

###################################################
from keras.datasets import mnist
(train_X, train_y), (test_X, test_y) = mnist.load_data()


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


In [14]:
train_X

array([[[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       ...,

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 

In [15]:
train_y

array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)

In [16]:
test_X

array([[[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       ...,

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 

## Neural Network (4 Points)

Now we are going to implement a neural network. You have to implement `forward`, `backward`, and `fit` functions.


In [24]:
class NeuralNetwork:
    def __init__(self, layers, loss):
        self.layers = layers
        self.loss = loss

    def forward(self, x):
        ###################################
        ############ CODE HERE ############
        ###################################
        for layer in self.layers:
            x = layer.forward(x)
        return x

    def backward(self, da):
        ###################################
        ############ CODE HERE ############
        ###################################
        for layer in reversed(self.layers):
            reversed_layer = layer.backward(reversed_layer)
        return reversed_layer
        

    def update(self, lr):
        for layer in self.layers:
            layer.update(lr)

    def fit(self, data, epoch_number, learningRate, batch_size=1):
        ###################################
        ############ CODE HERE ############
        ###################################
        data = DataLoader(data, batch_size)
        for epoch in range(epoch_number):
            for x, y in data:
                y_hat = self.forward(x)
                back_loss = self.loss.backward(y_hat, y)
                self.backward(back_loss)
                self.update(learningRate)
            print(f"Epoch {epoch} Loss: {self.loss.forward(self.forward(x), y)}")
            # with this info we can draw the loss curve that we want in the next two part

    def predict(self, x):
        return self.forward(x)

    def __repr__(self) -> str:
        return "\n".join([str(layer) for layer in self.layers])

    def __str__(self) -> str:
        return self.__repr__()


## Training Model (2 Points)

Now, use your neural network to train a model to predict class of MNIST images.

In [25]:
########################################
########## Train the Model #############
########################################

'''
برای آموزش کافی است دو لایه با اندازه های 
128 - 10
تعریف کنیم و همه چیز
رو خودشون ترین میکنن


برای تعریف لایه هم از همین 
LAyer
که تعریف کردیم استفاده میکنیم.
'''
# define Data and dataloader
data = Data(train_X, train_y)
dl = DataLoader(data, batch_size = 32)
# define loss function
loss = LeakyReLU
# define layers

layer1 = (28 * 28, 128, LeakyReLU)
layer2 = (128, 10, LeakyReLU)
layers = [layer1, layer2]
# define neural network
nn = NeuralNetwork(layers, loss)

In [None]:
nn.fit(data, epoch_number = 2, learningRate = 0.1, batch_size=32)

## Loss Curve (1 Points)

Plot curve of loss in each epoch. It should be an smooth descending function.


In [19]:
########################################
########## Plot loss curve #############
########################################


## Evaluation (1 Points)

Now evaluate your model and measure how accurate is your model on both train and test datasets.


In [20]:
def accuracy(y_hat, y):
    ###################################
    ############ CODE HERE ############
    ###################################
    return sum(y == y_hat)


########################################
########## Calculate accuracy ##########
########################################
