# Creating a Basic Feedforward Neural Network with PyTorch

Credit to https://github.com/yunjey for the original Python code found at: 

https://github.com/yunjey/pytorch-tutorial/blob/master/tutorials/01-basics/feedforward_neural_network/main.py

torch.nn is a sub-library of the torch library. When a Parameter is associated with a module as a model attribute, it gets added to the parameter list automatically and can be accessed using the 'parameters' iterator.

In [1]:
import torch
import torch.nn as nn

import torchvision
import torchvision.transforms as transforms

Device Configuration

In [2]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

Assign Model and Training Hyper Parameters

In [3]:
# Model
input_size = 784
hidden_size = 500
num_classes = 10

# Training
num_epochs = 5
batch_size = 64
learning_rate = 0.01

For ease, use an existing dataset(MNIST, handwritten number recognition), and create a training and testing `Dataset` and `DataLoader`.

In [4]:
# Training
training_dataset = torchvision.datasets.MNIST('data', 
                                              train=True,
                                              transform=transforms.ToTensor(),
                                              download=True)

training_dataloader = torch.utils.data.DataLoader(training_dataset,
                                                 batch_size=batch_size,
                                                 shuffle=True)

# Testing
testing_dataset = torchvision.datasets.MNIST('data', 
                                             train=False,
                                             transform=transforms.ToTensor())

testing_dataloader = torch.utils.data.DataLoader(testing_dataset,
                                                batch_size=batch_size,
                                                shuffle=False)

# Create the model

`nn.Module` is the base class for all neural network modules.

Your models should also subclass this class.


In [7]:
class NeuralNetwork(nn.Module):

    def __init__(self, input_size, hidden_size, num_classes):
        super().__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        output = self.fc1(x)
        output = self.relu(output)
        output = self.fc2(output)
        return output

In [8]:
model = NeuralNetwork(input_size, hidden_size, num_classes)

## Loss and Optmizer

In [9]:
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

# Training the Model

In [15]:
# Total steps is number of training files // batch size, aka the training_dataloader size
total_steps = len(training_dataloader)

`.to()`
Performs Tensor dtype and/or device conversion. A torch.dtype and torch.device are inferred from the arguments of self.to(*args, **kwargs)

Returns a Tensor with the specified dtype

Each batch of `images` is packaged as a tensor in the following format: Number of Training Images, Number of Channels for a Single Image, Image Height, Image Width.

Since the batch size is 64, the number of training images in a batch will be 64. As well, MNIST images are single channel meaning the number of channels in each image will be 1. The image height and width are the same: 28 pixels.

Therefore, the shape of the image tensor will be (64, 1, 28, 28). The tensor can be reshaped such that the image data for is a row of values. The reshaped tensor has each row containing a training image, where the column represents the intensity value of a specific training image.

It's not 100% clear to me why we use `to(device)` . It's possible the change is more obvious whe using CUDA and graphics cards for training, whereby the values may change data types.

In [67]:
for epoch in range(num_epochs):
    for index, data in enumerate(training_dataloader):
        images, labels = data
        
        print(type(images))
        
        # Move images to the configured device
        images = images.reshape(-1, 28*28).to(device)
        labels = labels.to(device)
        
        # Forward Pass
        outputs = model(images)
        loss = criterion(outputs, labels)
        
        # Backprop and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        break
        if (index+1) % 100 == 0:
            print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'
                   .format(epoch+1, num_epochs, index+1, total_steps, loss.item()))

<class 'torch.Tensor'>
<class 'torch.Tensor'>
<class 'torch.Tensor'>
<class 'torch.Tensor'>
<class 'torch.Tensor'>


In [19]:
images.reshape(-1, 28*28).to

torch.Size([32, 784])


# Testing the Model

Don't need to compute gradients.

`model(images)` outputs a list of probabilities for the classes. The length of the list is dependent on the number of test images being passed through the batch.

`torch.max(output, 1)` along rank 1 because you want to get the max prediction for each batch of training images, instead of getting the max number of the entire batch. The first set of values it returns are a list of maximum values, while the second list are numbers correspnding to the values index.

`labels.size(0)` provides just the number without being enclosed in a Tensor.

`predicted == labels` does a conditional between two lists of numbers, returns 1 if corresponding elements in a list matchup.

In [65]:
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in testing_dataloader:
        images = images.reshape(-1, 28*28).to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
    print('Accuracy of the network on the 10000 test images: {} %'.format(100 * correct / total))

Accuracy of the network on the 10000 test images: 96.82 %


In [66]:
torch.save(model.state_dict(), 'basic_feedforward.ckpt')