In [None]:
#Importing libraries
import torch
import math


**The objective here is to implement a multi-layer perceptron with one hidden layer from scratch and test it on MNIST.** 


**Activation function**

- We create two functions, sigma and its derivative.
- The sigma function will be applied to data for non-linear transformation.
- We need the derivative function of sigma in calculating backward propagation.

In [None]:
#Function to calculate the activation value i.e tanh
def sigma(x):
     return torch.tanh(x)

#Function to calculate derivative of activation function i.e tanh
def dsigma(x):
    return 1-(sigma(x))**2


**Loss**

- Here we calculate loss, where we sum the square difference of the predicted values of the target (v) and the actual values of the target (t).
- The loss is defined as: $||t-v||^2 = \sum (t-v)^2$.
- We also calculated the derivative of the of the loss which will help us get calculating the backward propagation.

In [19]:
#Function to calculate loss of a function
def loss(v,t):
    return (v - t).pow(2).sum()

#Function to calculate derivative of the loss function
def dloss(v,t):
    return 2*(v-t)

**Forward and backward passes**



- Here, we define a forward function, which takes the inputs, and the weigth and bias of two layers. This function computes forward pass in the network.
- For each neuron present in the network, forward pass involves two steps:

1.  Pre-activation represented by 's': it is a weighted sum of inputs (w and x) plus bias (b).
- $s = w^Tx + b $
2. Activation represented by 'x': activation function here is tanh.
- $\sigma(s)$
- The activation output of the final layer is the same as the predicted value of the network i.e x2.

- We also defined a backward function to calculate the back propagation. It updates the gradients of the model i.e weight and bias.

In [20]:
#Function for computng forward pass in the network.
def forward_pass(w1, b1, w2, b2, x):
    x0 = x
    s1 = torch.matmul(w1,x0) + b1
    x1 = sigma(s1)
    s2 =torch.matmul(w2,x1) + b2
    x2 = sigma(s2)

    return x0, s1, x1, s2, x2

#Function for computing backward pass in the network.
def backward_pass(w1, b1, w2, b2,
                  t,
                  x, s1, x1, s2, x2,
                  dl_dw1, dl_db1, dl_dw2, dl_db2):
    x0 = x
    dl_dx2 = dloss(x2, t)
    dl_ds2 = dsigma(s2) * dl_dx2
    dl_dx1 = torch.matmul(w2.t(),dl_dx2)

    dl_ds1 = dsigma(s1) * dl_dx1

    
    dl_dw2.add_(dl_ds2.view(-1, 1).mm(x1.view(1, -1)))
    dl_db2.add_(dl_ds2)
    dl_dw1.add_(dl_ds1.view(-1, 1).mm(x0.view(1, -1)))
    dl_db1.add_(dl_ds1)



**Training the network**


- Here we write the code to train and test the neural network with one hidden layer of 50 units. 
- The network have an input size of 784, which is the dimension of the MNIST training set, and an output size of 10, which is the number of classes.
1. We downloaded the data using the provided prologue.load_data function, with one-hot label vectors and normalized inputs.
- We multiplied the target label vector by $\zeta =0.9$ (so that they are strictly in the value range of tanh).
2. We created four weight and bias tensors, and fill them with random values sampled according to N(0,epsilon) with epsilon = $1e^{-6}$.
3. We created four tensor to sum up the gradients on individual samples, with respect to weigths and biases.

4. We performed 1,000 gradiented steps aith a step size $\eta$ equal to 0.1 divided by the number of training samples.
- Each of the above steps required to reset to zero the tensors for summing up the gradients (to avoid the gradients from recording a running tally of all the operations that had happened), and doing a forward and backward pass for each training example.
- We computed and printed the training loss, training error and test error after every step using the class of maximum response as the predicted one.

In [22]:
#Loading  data
import dlc_practical_prologue as prologue
train_input, train_target, test_input, test_target = prologue.load_data(one_hot_labels = True,
                                                                        normalize = True)
#Creating the number of classes, which is 10
n_classes = train_target.size(1)
#Creating the number of training samples whish is 784
n_train_samples = train_input.size(0)


zeta = 0.90

#Multiplying zeta with training input
train_input = train_input * zeta
#Multiplying zeta with testing input
test_input = test_input * zeta


#Number of units/neurons in one hidden layer
n_hidden = 50
#Gradient step size or learning rate
eta = 1e-1 / n_train_samples

epsilon = 1e-6

#Initalizing weights and bias tensors
w1 = torch.normal(0,epsilon,size=(n_hidden, train_input.size(1)))
b1 = torch.empty(n_hidden).normal_(0, epsilon)
w2 = torch.normal(0,epsilon, size = (n_classes, n_hidden))
b2 = torch.empty(n_classes).normal_(0, epsilon)

dl_dw1 = torch.empty(w1.size())
dl_db1 = torch.empty(b1.size())
dl_dw2 = torch.empty(w2.size())
dl_db2 = torch.empty(b2.size())

#Creating loop to perform 1000 gradient steps
for k in range(1000):

    # Forwrad and backward pass

    acc_loss = 0
    n_train_errors = 0
     
    #Reseting to zero the tensors for summing up the gradients
    dl_dw1.zero_()
    dl_db1.zero_()
    dl_dw2.zero_()
    dl_db2.zero_()

    for n in range(n_train_samples):
        x0, s1, x1, s2, x2 = forward_pass(w1, b1, w2, b2, train_input[n])

        pred = x2.max(0)[1].item()
        if train_target[n, pred] < 0.5: n_train_errors = n_train_errors + 1
        acc_loss = acc_loss + loss(x2, train_target[n])

        backward_pass(w1, b1, w2, b2,
                      train_target[n],
                      x0, s1, x1, s2, x2,
                      dl_dw1, dl_db1, dl_dw2, dl_db2)

    # Gradient step, updating weights and bias

    w1 = w1 - eta * dl_dw1
    b1 = b1 - eta * dl_db1
    w2 = w2 - eta * dl_dw2
    b2 = b2 - eta * dl_db2

    # Testing errors

    n_test_errors = 0

    for n in range(test_input.size(0)):
        _, _, _, _, x2 = forward_pass(w1, b1, w2, b2, test_input[n])

        #Maximum response as the as the predicted one.
        pred = x2.max(0)[1].item()
        if test_target[n, pred] < 0.5: n_test_errors = n_test_errors + 1

    #Printing training loss, training error, and test error 
    print('{:d} acc_train_loss {:.02f} acc_train_error {:.02f}% test_error {:.02f}%'
          .format(k,
                  acc_loss,
                  (100 * n_train_errors) / train_input.size(0),
                  (100 * n_test_errors) / test_input.size(0)))


* Using MNIST
** Reduce the data-set (use --full for the full thing)
** Use 1000 train and 1000 test samples




0 acc_train_loss 1000.00 acc_train_error 91.30% test_error 90.10%
1 acc_train_loss 963.68 acc_train_error 88.30% test_error 90.10%
2 acc_train_loss 940.46 acc_train_error 88.30% test_error 90.10%
3 acc_train_loss 925.61 acc_train_error 88.30% test_error 90.10%
4 acc_train_loss 916.12 acc_train_error 88.30% test_error 90.10%
5 acc_train_loss 910.03 acc_train_error 88.30% test_error 90.10%
6 acc_train_loss 906.13 acc_train_error 88.30% test_error 90.10%
7 acc_train_loss 903.63 acc_train_error 88.30% test_error 90.10%
8 acc_train_loss 902.01 acc_train_error 88.30% test_error 90.10%
9 acc_train_loss 900.98 acc_train_error 88.30% test_error 90.10%
10 acc_train_loss 900.32 acc_train_error 88.30% test_error 90.10%
11 acc_train_loss 899.88 acc_train_error 88.30% test_error 90.10%
12 acc_train_loss 899.61 acc_train_error 88.30% test_error 90.10%
13 acc_train_loss 899.43 acc_train_error 88.30% test_error 90.10%
14 acc_train_loss 899.31 acc_train_error 88.30% test_error 90.10%
15 acc_train_loss 8

124 acc_train_loss 348.53 acc_train_error 12.70% test_error 21.10%
125 acc_train_loss 363.61 acc_train_error 11.00% test_error 24.30%
126 acc_train_loss 384.24 acc_train_error 13.80% test_error 22.90%
127 acc_train_loss 390.93 acc_train_error 11.90% test_error 23.60%
128 acc_train_loss 373.40 acc_train_error 13.40% test_error 20.60%
129 acc_train_loss 349.50 acc_train_error 10.90% test_error 20.70%
130 acc_train_loss 336.96 acc_train_error 10.40% test_error 19.90%
131 acc_train_loss 331.12 acc_train_error 10.90% test_error 20.10%
132 acc_train_loss 330.07 acc_train_error 9.60% test_error 20.40%
133 acc_train_loss 331.13 acc_train_error 11.70% test_error 20.30%
134 acc_train_loss 332.57 acc_train_error 9.40% test_error 21.20%
135 acc_train_loss 335.23 acc_train_error 11.30% test_error 19.70%
136 acc_train_loss 335.71 acc_train_error 9.30% test_error 21.00%
137 acc_train_loss 336.14 acc_train_error 10.80% test_error 19.40%
138 acc_train_loss 333.60 acc_train_error 9.30% test_error 20.50%

248 acc_train_loss 196.96 acc_train_error 4.40% test_error 16.20%
249 acc_train_loss 195.55 acc_train_error 4.90% test_error 14.90%
250 acc_train_loss 194.45 acc_train_error 4.30% test_error 16.00%
251 acc_train_loss 193.07 acc_train_error 4.80% test_error 14.90%
252 acc_train_loss 192.08 acc_train_error 4.30% test_error 16.00%
253 acc_train_loss 190.81 acc_train_error 4.80% test_error 15.00%
254 acc_train_loss 189.99 acc_train_error 4.00% test_error 15.80%
255 acc_train_loss 188.88 acc_train_error 4.80% test_error 15.20%
256 acc_train_loss 188.27 acc_train_error 4.00% test_error 15.80%
257 acc_train_loss 187.35 acc_train_error 4.70% test_error 15.30%
258 acc_train_loss 186.97 acc_train_error 4.10% test_error 15.70%
259 acc_train_loss 186.26 acc_train_error 4.60% test_error 15.30%
260 acc_train_loss 186.15 acc_train_error 4.10% test_error 15.90%
261 acc_train_loss 185.65 acc_train_error 4.60% test_error 15.50%
262 acc_train_loss 185.84 acc_train_error 4.10% test_error 16.00%
263 acc_tr

373 acc_train_loss 154.62 acc_train_error 3.70% test_error 16.40%
374 acc_train_loss 153.45 acc_train_error 2.40% test_error 15.30%
375 acc_train_loss 150.57 acc_train_error 3.50% test_error 17.30%
376 acc_train_loss 150.10 acc_train_error 2.80% test_error 15.10%
377 acc_train_loss 148.31 acc_train_error 3.40% test_error 17.30%
378 acc_train_loss 147.82 acc_train_error 2.90% test_error 15.10%
379 acc_train_loss 146.29 acc_train_error 3.40% test_error 17.00%
380 acc_train_loss 145.41 acc_train_error 3.10% test_error 15.20%
381 acc_train_loss 143.87 acc_train_error 3.30% test_error 17.20%
382 acc_train_loss 142.72 acc_train_error 3.20% test_error 15.30%
383 acc_train_loss 141.34 acc_train_error 3.30% test_error 17.20%
384 acc_train_loss 140.27 acc_train_error 3.10% test_error 15.30%
385 acc_train_loss 139.34 acc_train_error 3.20% test_error 16.90%
386 acc_train_loss 138.72 acc_train_error 3.10% test_error 15.60%
387 acc_train_loss 138.48 acc_train_error 3.20% test_error 16.90%
388 acc_tr

498 acc_train_loss 110.07 acc_train_error 1.40% test_error 15.90%
499 acc_train_loss 109.81 acc_train_error 1.60% test_error 15.10%
500 acc_train_loss 109.35 acc_train_error 1.40% test_error 16.10%
501 acc_train_loss 110.16 acc_train_error 1.70% test_error 14.90%
502 acc_train_loss 111.01 acc_train_error 1.50% test_error 16.40%
503 acc_train_loss 113.35 acc_train_error 1.60% test_error 15.20%
504 acc_train_loss 116.10 acc_train_error 1.50% test_error 16.80%
505 acc_train_loss 120.62 acc_train_error 1.70% test_error 14.90%
506 acc_train_loss 125.74 acc_train_error 1.80% test_error 17.40%
507 acc_train_loss 132.31 acc_train_error 1.70% test_error 15.10%
508 acc_train_loss 138.30 acc_train_error 1.90% test_error 17.50%
509 acc_train_loss 143.46 acc_train_error 1.80% test_error 15.10%
510 acc_train_loss 145.29 acc_train_error 1.90% test_error 17.10%
511 acc_train_loss 143.80 acc_train_error 1.60% test_error 15.20%
512 acc_train_loss 139.05 acc_train_error 1.70% test_error 16.90%
513 acc_tr

623 acc_train_loss 109.73 acc_train_error 1.10% test_error 16.10%
624 acc_train_loss 108.65 acc_train_error 0.90% test_error 15.10%
625 acc_train_loss 107.03 acc_train_error 1.00% test_error 16.00%
626 acc_train_loss 105.70 acc_train_error 0.90% test_error 15.30%
627 acc_train_loss 104.04 acc_train_error 1.00% test_error 15.90%
628 acc_train_loss 102.75 acc_train_error 0.90% test_error 15.30%
629 acc_train_loss 101.29 acc_train_error 1.00% test_error 15.80%
630 acc_train_loss 100.17 acc_train_error 0.80% test_error 15.40%
631 acc_train_loss 99.00 acc_train_error 1.00% test_error 15.80%
632 acc_train_loss 98.08 acc_train_error 0.90% test_error 15.40%
633 acc_train_loss 97.16 acc_train_error 0.90% test_error 15.70%
634 acc_train_loss 96.41 acc_train_error 0.90% test_error 15.40%
635 acc_train_loss 95.70 acc_train_error 0.90% test_error 15.70%
636 acc_train_loss 95.08 acc_train_error 0.90% test_error 15.50%
637 acc_train_loss 94.51 acc_train_error 0.80% test_error 15.90%
638 acc_train_los

749 acc_train_loss 84.76 acc_train_error 0.60% test_error 15.60%
750 acc_train_loss 86.24 acc_train_error 0.50% test_error 15.70%
751 acc_train_loss 88.49 acc_train_error 0.60% test_error 15.90%
752 acc_train_loss 90.38 acc_train_error 0.50% test_error 15.60%
753 acc_train_loss 92.97 acc_train_error 0.50% test_error 15.80%
754 acc_train_loss 94.74 acc_train_error 0.50% test_error 15.70%
755 acc_train_loss 96.98 acc_train_error 0.50% test_error 16.20%
756 acc_train_loss 97.77 acc_train_error 0.50% test_error 15.60%
757 acc_train_loss 98.80 acc_train_error 0.50% test_error 16.40%
758 acc_train_loss 98.06 acc_train_error 0.60% test_error 15.60%
759 acc_train_loss 97.56 acc_train_error 0.50% test_error 16.40%
760 acc_train_loss 95.63 acc_train_error 0.60% test_error 15.60%
761 acc_train_loss 94.15 acc_train_error 0.50% test_error 16.00%
762 acc_train_loss 91.94 acc_train_error 0.60% test_error 15.40%
763 acc_train_loss 90.33 acc_train_error 0.50% test_error 15.90%
764 acc_train_loss 88.59 

875 acc_train_loss 82.00 acc_train_error 0.40% test_error 16.10%
876 acc_train_loss 79.56 acc_train_error 0.30% test_error 16.00%
877 acc_train_loss 78.54 acc_train_error 0.40% test_error 15.90%
878 acc_train_loss 76.25 acc_train_error 0.30% test_error 16.10%
879 acc_train_loss 75.33 acc_train_error 0.40% test_error 16.00%
880 acc_train_loss 73.44 acc_train_error 0.30% test_error 16.00%
881 acc_train_loss 72.79 acc_train_error 0.40% test_error 15.80%
882 acc_train_loss 71.36 acc_train_error 0.30% test_error 15.90%
883 acc_train_loss 71.04 acc_train_error 0.40% test_error 15.90%
884 acc_train_loss 70.06 acc_train_error 0.30% test_error 15.90%
885 acc_train_loss 70.09 acc_train_error 0.40% test_error 15.90%
886 acc_train_loss 69.52 acc_train_error 0.30% test_error 16.20%
887 acc_train_loss 69.92 acc_train_error 0.40% test_error 15.80%
888 acc_train_loss 69.74 acc_train_error 0.30% test_error 15.90%
889 acc_train_loss 70.53 acc_train_error 0.40% test_error 15.80%
890 acc_train_loss 70.74 

In [1]:
accuracy = 100-16.80
accuracy

83.2

**Conclusion**

We achieved $0.10\%$ training error and $16.80\%$ testing error.

The accuracy is $83.2\%$