# <center>**EE5179:Deep Learning for Imaging**</center>

# <center>Programming Assignment 1: MNIST Digit Classification using MLP</center>

# Table of contents
1. [Introduction](#introduction)
2. [Dataloading](#paragraph1)
3. [One hot encoding](#subparagraph1)
4. [Forward Propogation](#paragraph2)
5. [Backward Propogation](#paragraph3)
6. [Cost Function](#cost)
7. [Parameter Updation](#parameter)

$I/P$ → $h_1$ (500) → $h_2$ (250) → $h_3$ (100) → $O/P$ \
Input layer is (784 , Number of classes) \
Hidden Layer 1 (500) \
Hiden Layer 2  (250) \
Hidden Layer 3 (100) \
Output Layer (10,Number of classes)

### Activation Functions



1. **Sigmoid** 
*sigmoid($x$)* = $\frac{1}{1+exp(-x)} $

2. **Tanh** 
*f($x$)* = $\frac{exp(x)+exp(-x)}{exp(x)-exp(-x)}$

3. **ReLu** 
*f($x$)* = *max*(0,$x$)

4. **Softmax**

*softmax($z_i$)* = $\frac{exp(z_i)}{\sum exp(z_j)} $

### Initialization


One common initialization scheme for deep NNs is called Glorot (also known as Xavier) Initialization. The idea is to initialize each weight with a small Gaussian value with mean = 0.0 and variance based on the fan-in and fan-out of the weight.

For example, each weight that connects an input node to a hidden node has fan-in of the number of input nodes and fan-out of the number of hidden nodes. In pseudo-code the initialization is: \
$w_ij$ ∼ $U$[−$M$, $M$ ] \
$M$ = $\sqrt\frac{6}{(N_i+N_o)}$
where,$N_i$ is the number of inputs to the weight tensor and $N_o$ is the number of outputs in the weight tensor
```
def initialise_parameter(dim):
  np.random.seed(8)
  parameters = {}
  L = len(dim)
  for i in range(1, L):
    Ni = dim[i-1]
    No = dim[i]
    M = np.sqrt(6/(Ni+No))
    p = np.random.uniform(-M, M, No*Ni)
      
```

All the biases in the model are initialized with zeros.
```np.zeros((dim[i], 1))```

Shape of the Weights and the Biases \
```         
            W1 (500, 784)
            b1 (500, 1)
            W2 (250, 500)
            b2 (250, 1)
            W3 (100, 250)
            b3 (100, 1)
            W4 (10, 100) 
            b4 (10, 1)
```

### Forward Propogation

Forward propagation (or forward pass) refers to the calculation and storage of intermediate variables (including outputs) for a neural network in order from the input layer to the output layer. We now work step-by-step through the mechanics of a neural network with three hidden layer. 



##### <center> $Z_1$ = $W_1$ \* $X$ + $b_1$       &nbsp; &nbsp; &nbsp; &nbsp;         Input to $h_1$ </center>  
#####        <center>  $Z_1$ = (500,784) *(784,60000) + (500,1) = (500,60000)</center>
##### <center> $A_1$ = *sigmoid*($Z_1$)  </center>  
#####        <center> (500,60000)</center>
##### <center> $Z_2$ = $W_2$ \* $A_1$ + $b_2$                    </center>  
#####        <center>  $Z_2$ = (250,500) *(500,60000) + (250,1) = (250,60000)</center>
##### <center> $A_2$ = *sigmoid*($Z_2$)   </center>  
#####        <center> (2500,60000)</center>
##### <center> $Z_3$ = $W_3$ \* $A_2$ + $b_3$                  </center>  
#####        <center>  $Z_3$ = (100,250) *(250,60000) + (100,1) = (100,60000)</center>
##### <center> $A_3$ = *sigmoid*($Z_3$)   </center>  
#####        <center> (100,60000)</center>
##### <center> $Z_4$ = $W_4$ \* $A_3$ + $b_4$               </center>  
#####        <center>  $Z_4$ = (10,100) *(100,60000) + (10,1) = (10,60000)</center>
##### <center> $A_4$ =  *softmax* ($Z_4$)   </center>  
#####        <center> (10,60000)</center>


### Backward Propogation 

Backpropagation refers to the method of calculating the gradient of neural network parameters. In short, the method traverses the network in reverse order, from the output to the input layer.

### Cost Function

### Gradient Descent

## Accuracy and Confusion Matrix 

### Results 
* Sigmoid 
* Tanh
* ReLu \
 All plots and numbers to be included in a table 

 | Activation Function     | Train Accuracy|Test Accuracy|
| ----------- | ----------- |--------|
| Sigmoid    |     |
| Tanh   |        |
| Relu |       |

### Package 
Pytorch
Plots and accuracy

In [1]:
\documentclass[12pt,a4paper]{article}

\usepackage{amsmath}
\usepackage[section]{algorithm}
\usepackage[numbered]{algo}
\usepackage{enumerate}

\begin{document}

\begin{algorithm}[h]
  \small
  \begin{enumerate}[(1)]
    \item For a sample $(x_n ,y^*_n)$, propagate the input $x_n$ through the
    network to compute the outputs $(v_{i_1}, \ldots, v_{i_{|V|}})$ (in topological order).
    \vspace{-6px}
    %\begin{enumerate}[(a)]
    %  \item Given a topological sort $V = (v_{i_1},\ldots,v_{i_{|V|}})$,
    %  sequentially compute the layers' outputs, also denoted by $v_{i_j}$.
    %  \item Then $y(x_n;w) = v_{i_{|V|}}$ is the network's output.
    %\end{enumerate}
    \item Compute the loss $\mathcal{L}_n := \mathcal{L}(v_{i_{|V|}}, y_n^*)$
    and its gradient
    \begin{align}
      \frac{\partial \mathcal{L}_n}{\partial v_{i_{|V|}}}.
    \end{align}
    \vspace{-6px}
    \item For each $j = |V|,\ldots,1$ compute
    \begin{align}
      \frac{\partial \mathcal{L}_n}{\partial w_j} =
      \frac{\partial \mathcal{L}_n}{\partial v_{i_{|V|}}} \prod_{k = j + 1}^{|V|} \frac{\partial v_{i_k}}{\partial v_{i_{k - 1}}}
      \frac{\partial v_{i_j}}{\partial w_j}.
    \end{align}
    where $w_j$ refers to the weights in node $i_j$.
    \vspace{-12px}
  \end{enumerate}
  \caption{Error backpropagation algorithm for a layered neural network
  represented as computation graph $G = (V,E)$.}
\end{algorithm}

\end{document}

SyntaxError: unexpected character after line continuation character (308070759.py, line 1)