# Pytorch - First Neural Net!


This notebook focus on create a neural network using Pytorch and the MNIST dataset. 
This material is the same available By Jeff Hu in [KDnuggets](https://www.kdnuggets.com/2018/02/simple-starter-guide-build-neural-network.html).


**@notebook_author: [Juarez Monteiro](https://jrzmnt.github.io).**

---



*This guide serves as a basic hands-on work to lead you through building a neural network from scratch. Most of the mathematical concepts and scientific decisions are left out.*

### Import PyTorch

It will load PyTorch into the codes. Great! A well beginning is half done.



In [1]:
import torch
import torch.nn as nn
import torchvision.datasets as dsets
import torchvision.transforms as transforms
from torch.autograd import Variable

### Initialize Hyper-parameters

In [2]:
input_size = 784      # The image size = 28x28 = 784
hidden_size = 500     # The number of nodes at the hidden layer
num_classes = 10      # The number of output classes. In this case, from 0 to 9
num_epochs = 5        # The number of times entire datsaet is trained
batch_size = 100      # The size of input data took for one iteration
learning_rate = 0.001 # The speed of convergence

### Download MNIST Dataset

MNIST is a database with tons of handwritten digits (i.e. 0 to 9) aims for the usage of image processing.

In [3]:
train_dataset = dsets.MNIST(root='./../../../Datasets/',
                           train=True,
                           transform=transforms.ToTensor(),
                           download=True
                           )

test_dataset = dsets.MNIST(root='./../../../Datasets/',
                           train=False,
                           transform=transforms.ToTensor()
                          )

### Load the Dataset
After downloading the MNIST dataset, we load them into our codes.

In [4]:
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True
                                          )

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False
                                         )

*Note: We shuffle the loading process of train_dataset to make the learning process independent of data order, but the order of test_loader remains to examine whether we can handle unspecified bias order of inputs.*

### Build the Feedforward Neural Network
Now we have our datasets ready. We will start building the **neural network**. *The conceptual illustration can be viewed as below:*


![img](https://cdn-images-1.medium.com/max/800/1*toBL6XleRkwABSwTAFaY_g.png "Example of a Feedforward Neural Network.")

### Feedforward Neural Network Model Structure

The NN includes **two fully-connected layers** (i.e. fc1 & fc2) and a **non-linear ReLU layer** in between. 
Normally we call this structure **1-hidden layer NN**, without counting the output layer (fc2) in.

By running the forward pass, the input images (x) can go through the neural network and generate a output (out) demonstrating how are the likabilities it belongs to each of the 10 classes. 
*For example, a cat image can have 0.8 likability to a dog class and a 0.3 likability to a airplane class.*

In [5]:
class Net(nn.Module):
    
    # Initialize the neural network
    def __init__(self, input_size, hidden_size, num_classes):
        
        # Inherited from the parent class nn.Module
        super(Net, self).__init__()
        
        # 1st Full-Connected Layer: 784 (input data) -> 500 (hidden node)
        self.fc1 = nn.Linear(input_size, hidden_size)
        
        # Non-Linear ReLU Layer: max(0,x)
        self.relu = nn.ReLU()
        
        # 2nd Full-Connected Layer: 500 (hidden node) -> 10 (output class)
        self.fc2 = nn.Linear(hidden_size, num_classes)
    
    # Forward pass: stacking each layer together
    def forward(self, x):
        
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        return out

### Instantiate the NN
We now create a real NN based on our structure.

In [6]:
net = Net(input_size, hidden_size, num_classes)
print net

Net(
  (fc1): Linear(in_features=784, out_features=500, bias=True)
  (relu): ReLU()
  (fc2): Linear(in_features=500, out_features=10, bias=True)
)


### Enable GPU
Note: you could enable this line to run the codes on GPU.

In [7]:
#net = net.cuda()

### Choose the Loss Function and Optimizer

Loss function (**criterion**) decides how the output can be compared to a class, which determines how good or bad the neural network performs.
And the **optimizer** chooses a way to update the weight in order to converge to find the best weights in this NN.

In [8]:
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate)

### Training the Model

This process might takes around 3 to 5 minutes depending on your machine.
The detailed explanations are listed as comments (#) in the following codes.

In [9]:
for epoch in range(num_epochs):
    
    # Load a batch of images with its (index, data, class)
    for i, (images, labels) in enumerate(train_loader):
        
        # Convert torch tensor to Variable: change image from a vector of size 784 to a matrix of 28 x 28
        images = Variable(images.view(-1, 28*28))
        
        # Initialize the hidden weight to all zeros
        optimizer.zero_grad()
        
        # Forward pass: compute the output class given an image
        outputs = net(images)
        
        
        # Compute the loss: difference between the output class and the pre-given label
        loss = criterion(outputs, labels)
        
        
        # Backward pass: compute the weight
        loss.backward()
        
        # Optimizer: update the weights of hidden nodes
        optimizer.step()
        
        # Logging
        if (i+1) % 100 == 0:
            print('Epoch [%d/%d], Step [%d/%d], Loss: %.4f'
                 %(epoch+1, num_epochs, i+1, len(train_dataset)//batch_size, loss.data[0]))



Epoch [1/5], Step [100/600], Loss: 0.3313
Epoch [1/5], Step [200/600], Loss: 0.3219
Epoch [1/5], Step [300/600], Loss: 0.1749
Epoch [1/5], Step [400/600], Loss: 0.1913
Epoch [1/5], Step [500/600], Loss: 0.1426
Epoch [1/5], Step [600/600], Loss: 0.2501
Epoch [2/5], Step [100/600], Loss: 0.1005
Epoch [2/5], Step [200/600], Loss: 0.0947
Epoch [2/5], Step [300/600], Loss: 0.0938
Epoch [2/5], Step [400/600], Loss: 0.2195
Epoch [2/5], Step [500/600], Loss: 0.0469
Epoch [2/5], Step [600/600], Loss: 0.1042
Epoch [3/5], Step [100/600], Loss: 0.0409
Epoch [3/5], Step [200/600], Loss: 0.0576
Epoch [3/5], Step [300/600], Loss: 0.0665
Epoch [3/5], Step [400/600], Loss: 0.0377
Epoch [3/5], Step [500/600], Loss: 0.1296
Epoch [3/5], Step [600/600], Loss: 0.1437
Epoch [4/5], Step [100/600], Loss: 0.0291
Epoch [4/5], Step [200/600], Loss: 0.0323
Epoch [4/5], Step [300/600], Loss: 0.0554
Epoch [4/5], Step [400/600], Loss: 0.0228
Epoch [4/5], Step [500/600], Loss: 0.0379
Epoch [4/5], Step [600/600], Loss:

### Testing the Model

Similar to training the neural network, we also need to load batches of **test images and collect the outputs**. The **differences** are that:

 - No loss & weights calculations
 - No weights update
 - Has correct prediction calculation

In [11]:
correct = 0
total = 0

for images, labels in test_loader:
    
    images = Variable(images.view(-1, 28*28))
    outputs = net(images)
    
    # Choose the best class from the output (The class with the best score)
    _, predicted = torch.max(outputs.data, 1)
    
    # Increment the total count
    total += labels.size(0)
    
    # Increment the correct count
    correct += (predicted == labels).sum()
    
print('Accuracy of the network on the 10K test images: %d%%' % (100 * correct / total))

Accuracy of the network on the 10K test images: 97%


### Save the trained Model as a pickle that can be loaded and used later

In [14]:
#torch.save(net.state_dict(), 'mnist_nn_model.pkl')