# Creating a Feedforward Neural Network

## Load packages, data, and hyperparameters

Herein, we will use [Pytorch]('https://pytorch.org/) to create a neural network that interprets written numbers.

In [2]:
import torch
import torch.nn as nn
import torchvision.datasets as dsets
import torchvision.transforms as transforms
from torch.autograd import Variable

Hyperparameters control how the network learns and will not be updated during modeling.  While hyperparameters can be tuned in order to optimize a model, they will not be in this example.

In [14]:
input_size = 784       # The image size = 28 x 28 = 784
hidden_size = 500      # The number of nodes at the hidden layer
num_classes = 10       # The number of output classes. In this case, from 0 to 9
num_epochs = 5         # The number of times entire dataset is trained
batch_size = 100       # The size of input data took for one iteration
learning_rate = 0.001  # The speed of convergence

The MNIST dataset will be used herein.  This dataset contains written numbers and is meant for use in image processing.

In [15]:
#download the dataset
train_dataset = dsets.MNIST(root='./data',
                           train=True,
                           transform=transforms.ToTensor(),
                           download=True)

test_dataset = dsets.MNIST(root='./data',
                           train=False,
                           transform=transforms.ToTensor())

#load the dataset
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                          batch_size=batch_size,
                                          shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

## Build Feedforward Neural Network (FNN)

The feedforward neural network will accept inputs into an input layer which feeds them to hidden layers.  In turn, the hidden layers pass to the output layer which provides outputs.  

This FNN has two fully connected layers (fc1 and fc2) with a hidden layer that passes to the output layer (fc2). 

The input images run through the forward pass NN to generate an output with likability of the image to a class.

In [16]:
# define a class for the NN
class Net(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(Net, self).__init__()                    # Inherited from the parent class nn.Module
        self.fc1 = nn.Linear(input_size, hidden_size)  # 1st Full-Connected Layer: 784 (input data) -> 500 (hidden node)
        self.relu = nn.ReLU()                          # Non-Linear ReLU Layer: max(0,x)
        self.fc2 = nn.Linear(hidden_size, num_classes) # 2nd Full-Connected Layer: 500 (hidden node) -> 10 (output class)
    
    def forward(self, x):                              # Forward pass: stacking each layer together
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        return out

In [17]:
# create a FNN based on the created structure
net = Net(input_size, hidden_size, num_classes)

In [18]:
# enable GPU to run on it
net.cuda()

Net(
  (fc1): Linear(in_features=784, out_features=500, bias=True)
  (relu): ReLU()
  (fc2): Linear(in_features=500, out_features=10, bias=True)
)

The loss function determines how the output of the NN is compared to a class. In other words, the loss function determines how well the NN performs.  Based on this performance, the optimizer chooses a way to update the weight in order to converge upon the best outcome (weighting strategy).  

In [19]:
# choose the loss function to dictate how the output is compared to class
criterion = nn.CrossEntropyLoss()
# choose the optimizer to update based on loss function performance
optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate)

## Train the Model

To train the model, a batch of images will be loaded.  These images will be changed from a vector size 784 to a matrix of 28x28.  The hidden weights are initialized as a zero matrix.  Forward pass deterimines the output class of the image.  The loss function determines the defference between the output class and the label (known).  A backward pass then computes weight and the optimizer updates the wights of the hidden nodes for better future performance.  

In [20]:
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):   # Load a batch of images with its (index, data, class)
        images = Variable(images.view(-1, 28*28))         # Convert torch tensor to Variable: change image from a vector of size 784 to a matrix of 28 x 28
        labels = Variable(labels)
        
        optimizer.zero_grad()                             # Intialize the hidden weight to all zeros
        outputs = net(images)                             # Forward pass: compute the output class given a image
        loss = criterion(outputs, labels)                 # Compute the loss: difference between the output class and the pre-given label
        loss.backward()                                   # Backward pass: compute the weight
        optimizer.step()                                  # Optimizer: update the weights of hidden nodes
        
        if (i+1) % 100 == 0:                              # Logging
            print('Epoch [%d/%d], Step [%d/%d], Loss: %.4f'
                 %(epoch+1, num_epochs, i+1, len(train_dataset)//batch_size, loss.data[0]))


RuntimeError: Expected object of backend CUDA but got backend CPU for argument #4 'mat1'

## Test the Model

Similar to training the nerual network, we also need to load batches of test images and collect the outputs. The differences are that: (1) No loss & weights calculation (2) No wights update (3) Has correct prediction calculation

In [21]:
correct = 0
total = 0
for images, labels in test_loader:
    images = Variable(images.view(-1, 28*28))
    outputs = net(images)
    _, predicted = torch.max(outputs.data, 1)  # Choose the best class from the output: The class with the best score
    total += labels.size(0)                    # Increment the total count
    correct += (predicted == labels).sum()     # Increment the correct count
    
print('Accuracy of the network on the 10K test images: %d %%' % (100 * correct / total))

RuntimeError: Expected object of backend CUDA but got backend CPU for argument #4 'mat1'

In [22]:
#save the model
#torch.save(net.state_dict(), 'fnn_model.pkl')

In [None]:
https://github.com/yhuag/neural-network-lab/blob/master/Feedforward%20Neural%20Network.ipynb