# Loading and normalize data

In [1]:
import mnist
import torch
import torch.nn as nn
import torch.tensor as tensor
import torch.nn.functional as F
from time import time

x_train, t_train, x_test, t_test = mnist.load()

xtrain=tensor(x_train/255,dtype=torch.float)
ttrain=tensor(t_train,dtype=torch.int64)
xtest=tensor(x_test/255,dtype=torch.float)
ttest=tensor(t_test,dtype=torch.int64)
print(xtrain.shape)

torch.Size([60000, 784])


# Setting the model
We define a class Net which inherent from **nn.Module** 

Intial model:

**nn.Linear**: apply a linear transformation to the incoming data
1. We set the first layer *ly1* as a linear transformation from 784 neurons (784 pixels) to 80 neurons.
2. We set the second layer *ly2* as a linear transformation from 80 neurons to 30 neurons. 
3. We set the third layer *ly3* as a linear transformation from 30 neurons to 10 neurons (digit 0-9).

**Note:** We set *ly1_drop* to drop out 50% of the output of *ly1*, and *ly2_drop* to drop out 50% of the output of *ly2*. But since the performance is not good with drop out here, we mark it out.



In [2]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.ly1 = nn.Linear(784,80)
        #self.ly1_drop = torch.nn.Dropout(0.5) #we try drop out but not work well here
        self.ly2 = nn.Linear(80,30)
        #self.ly2_drop = torch.nn.Dropout(0.5)
        self.ly3 = nn.Linear(30,10)
        
    def forward(self, x):
        x = torch.relu(self.ly1(x))
        #x = self.ly1_drop(torch.relu(self.ly1(x))) #we try drop out but not work well
        x = torch.relu(self.ly2(x))
        #x = self.ly2_drop(torch.relu(self.ly2(x)))
        x=self.ly3(x)
        #x=torch.softmax(self.ly3(x),dim=1) #cross entropy already encode th softmax
        return x
net = Net()

# Model training 
* CrossEntropyLoss: cross entropy loss in pytorch already encoded softmax. so we mark out the softmax function in our model (above cell).

* optimizer: We first try the SGD(Stochastic gradient descent) with momentum. The result is 0.967, and takes 2.1 seconds. Then we use optimer Adam, an other popular methed for a method for stochastic optimization. We had accuray 0.97, and time 2.7 seconds.

* We run 5 epochs, and divide the data into batches.
    * We extract batch *x* from training data and batch *t* labels.
    * **optimizer.zero_grad** is to set the gradients to zero. It can avoid mixing the result with previous epoch.
    * we fit the training data (batch *x*) to our model, and have the output y
    * we compute the cross entropy loss with our output y and the training labels (batch *t*)
    * we call **loss.backward** to run the back propagation to compute the error derivatives for model parameters
    * we call **optimizer.step** to update the model parameters with the result from back propagation.
    
**Note that one epoch can achieve 0.93 accuracy and only take 0.34 seconds.**

Note that we can change the batch sizes. Larger batch size may run faster but not necessary have better perofrmance. In some case, batch size too large consumes too much memory and slow down the speed. So users can choose different size depend on the data size and computer load.

The red mark out is the experiment without using batches. The result is not too good, it takes more time but has lower accuracy even with 20 epochs.

In [3]:
criterion = torch.nn.CrossEntropyLoss()
#optimizer = torch.optim.Adam(net.parameters(),lr = 0.01)
optimizer = torch.optim.SGD(net.parameters(), lr=0.1, momentum=0.9)
"""
for epoch in range(20):
    
        optimizer.zero_grad()
        y = net(xtrain)
        loss = criterion(y, ttrain)
        print('epoch: ', epoch, 'loss:', loss.item())
        
        loss.backward()
        optimizer.step()
print('done')
"""

stime=time()
tsize=xtrain.shape[0] #Total size of the data.
bsize=300 #Batch size of the data. We also try 30 or 2000.
for epoch in range(5):
    for i in range(int(tsize/bsize)):
            x=xtrain[bsize*i:bsize*(i+1)]
            t=ttrain[bsize*i:bsize*(i+1)]
            
            optimizer.zero_grad()
            y = net(x)
            loss = criterion(y, t)
            
            loss.backward()
            optimizer.step()
print(loss.item())
print('time',time()-stime)


0.14222663640975952
time 2.082205057144165


# Evaluation 
We fit the testing data(xtest) into our mode, and print out the result and compare with the testing labels. And got the final accuracy 0.9074.

Note that by tuning the parameters like learning rate, batch size or epochs, it is a chance to accelerate the training process or improve the accuracy.

In [4]:
ypred=net(xtest)

print(ypred.max(1)[1])
print(ttest)

tensor([7, 2, 1,  ..., 4, 5, 6])
tensor([7, 2, 1,  ..., 4, 5, 6])


In [6]:
#cmp=ttest.eq(ypred.max(1)[1])
#true=len([x for x in cmp if x==1])
#false=len([x for x in cmp if x==0])
#total=true+false
#print(true/total)

float((ttest==ypred.max(1)[1]).sum())/len(ttest) #compute the accuracy

0.97