In [1]:
import torch
from torch import nn
from torchvision import datasets, transforms

# Dealing with Neural Networks
Deep learning is a class of machine learning algorithms that is designed to loosely mimic
the neurons in our brain. A neuron takes an input from a number of inputs from
surrounding neurons and sums it up, and if the sum exceeds a certain threshold, then the
neuron fires.
The following is a diagram of a neural unit:

![neural_network](../media/nnet.png)

We will be using the Fashion–MNIST dataset. This is a dataset of Zalando's article images,
consisting of a training set of 60,000 examples and a test set of 10,000 examples. We will
take an individual grayscale image 28 x 28 in size and convert it into a vector of 784.

We will define transforms for the preprocessing of our image data.
For the particular case that we are dealing with, an image consisting of 28 x 28 grayscale pixels, we first need to read from the image and convert it into a tensor using a transforms.ToTensor() transform. We then make the mean and
standard deviation of the pixel values 0.5 and 0.5 respectively so that it becomes easier for the model to train; to do this, we use transforms.Normalize((0.5,),(0.5,)). We combine all of the transformations together with transform.Compose().

In [2]:
transform = transforms.Compose([transforms.ToTensor(),
                                transforms.Normalize((0.5,), (0.5,)),
                                ])

Let's define the batch_size to divide our dataset into chunks to be fed into the model:

In [3]:
batch_size = 64

Let's pull the dataset from torchvision and apply the transform and create batches.

In [4]:
# Download and load the train data
trainset = datasets.FashionMNIST('Fashion_MNIST/', download=True, train=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, shuffle=True)

# Download and load the test data
testset = datasets.FashionMNIST('Fashion_MNIST/', download=True, train=False, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size, shuffle=True)

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to Fashion_MNIST/FashionMNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/26421880 [00:00<?, ?it/s]

Extracting Fashion_MNIST/FashionMNIST/raw/train-images-idx3-ubyte.gz to Fashion_MNIST/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to Fashion_MNIST/FashionMNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/29515 [00:00<?, ?it/s]

Extracting Fashion_MNIST/FashionMNIST/raw/train-labels-idx1-ubyte.gz to Fashion_MNIST/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to Fashion_MNIST/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/4422102 [00:00<?, ?it/s]

Extracting Fashion_MNIST/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to Fashion_MNIST/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to Fashion_MNIST/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/5148 [00:00<?, ?it/s]

Extracting Fashion_MNIST/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to Fashion_MNIST/FashionMNIST/raw



Define the neural network class, which has to be a subclass of nn.Module:

In [5]:
class FashionNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden1 = nn.Linear(784, 256)
        self.hidden2 = nn.Linear(256, 128)
        self.output = nn.Linear(128, 10)
        self.softmax = nn.Softmax(dim=1)
        self.activation = nn.ReLU()
    def forward(self, x):
        x = self.hidden1(x)
        x = self.activation(x)
        x = self.hidden2(x)
        x = self.activation(x)
        x = self.output(x)
        output = self.softmax(x)
        return output

We use nn.Linear() to define fully connected layers by passing in the input and output dimensions.<br>
We use a softmax layer for the last layer output because there are 10 output classes.<br>
We use ReLU activation in the layers before the output layer to learn nonlinearity in the data.<br>
The hidden1 layer takes 784 inputs units and gives out 256 output units.<br>
The hidden2 phrase outputs 128 units and the output layer has 10 output units representing 10 output classes.<br>
The softmax layer converts the activations into probabilities so that it adds to 1 along dimension 1.

There is another method that we can use to define models using nn.Sequential() and pass in the required layers without needing to define a class. There are also other transformations that can be applied to the input image that we will explore in the subsequent chapters.

We will then create the network object:

In [6]:
model = FashionNetwork()

Let's have a quick look at our model:

In [7]:
print(model)

FashionNetwork(
  (hidden1): Linear(in_features=784, out_features=256, bias=True)
  (hidden2): Linear(in_features=256, out_features=128, bias=True)
  (output): Linear(in_features=128, out_features=10, bias=True)
  (softmax): Softmax(dim=1)
  (activation): ReLU()
)


In the recipe, the network is completed by setting up a forward network, wherein we tied together the network components defined in the constructor. A network defined with nn.Module needs to have a forward() method defined. It takes the input tensor and passes it through the network components defined in the __init__() method in the network class, in the sequence of operations defined in the forward method.

## Defining the loss function

In this recipe, we will define a loss function for our fashion dataset using the loss function
available in PyTorch.

In this recipe, we replaced softmax with log softmax so that we could then use the log of
probabilities over probabilities, which has nice theoretic interpretations. There are various
reasons for doing this, including improved numerical performance and gradient
optimization. These advantages can be extremely important when training a model that can
be computationally challenging and expensive. Furthermore, it has a high penalizing effect
when it is not predicting the correct class. 

In [8]:
class FashionNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden1 = nn.Linear(784, 256)
        self.hidden2 = nn.Linear(256, 128)
        self.output = nn.Linear(128, 10)
        self.log_softmax = nn.LogSoftmax(dim=1)
        self.activation = nn.ReLU()
    def forward(self, x):
        x = self.hidden1(x)
        x = self.activation(x)
        x = self.hidden2(x)
        x = self.activation(x)
        x = self.output(x)
        output = self.log_softmax(x)
        return output

In [9]:
model = FashionNetwork()
print(model)

FashionNetwork(
  (hidden1): Linear(in_features=784, out_features=256, bias=True)
  (hidden2): Linear(in_features=256, out_features=128, bias=True)
  (output): Linear(in_features=128, out_features=10, bias=True)
  (log_softmax): LogSoftmax(dim=1)
  (activation): ReLU()
)


Now, we will define our loss function; we will use negative log likelihood loss
for this:

In [10]:
criterion = nn.NLLLoss()

## Implementing optimizers

Backpropagation is a method by which the neural networks learn from errors; the errors are used to modify weights in such a way that the errors are minimized. Optimization functions are responsible for modifying
weights to reduce the error. Optimization functions calculate the partial derivative of errors with respect to weights. The derivative shows the direction of a positive slope, and so we need to reverse the direction of the gradient. The optimizer function combines the model parameters and loss function to iteratively modify the model parameters to reduce the model error. Optimizers can be thought of as fiddling with the model weights to get the
best possible model based on the difference in prediction from the model and the actual output, and the loss function acts as a guide by indicating when the optimizer is going right or wrong. 

In [11]:
from torch import optim

We will use the Adam optimizer and pass model parameters:

In [12]:
optimizer = optim.Adam(model.parameters())

In [13]:
optimizer.defaults

{'lr': 0.001,
 'betas': (0.9, 0.999),
 'eps': 1e-08,
 'weight_decay': 0,
 'amsgrad': False,
 'maximize': False,
 'foreach': None,
 'capturable': False}

In [14]:
optimizer = optim.Adam(model.parameters(), lr=3e-3)

In [15]:
optimizer

Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    capturable: False
    eps: 1e-08
    foreach: None
    lr: 0.003
    maximize: False
    weight_decay: 0
)

Now we will start training our model, starting with the number of epochs:

In [16]:
epoch = 10

In [17]:
for _ in range(epoch):
    running_loss = 0 #initialize running_loss
    for image, label in trainloader: # we will iterate through each image in training the image loader
        optimizer.zero_grad() # reset the gradients to zero
        image = image.view(image.shape[0],-1) # reshape the image
        pred = model(image) # get the prediction
        loss = criterion(pred, label) # calculate the loss/error
        loss.backward() # we call the .backward() method on the loss
        optimizer.step() # we call the .step() method on the optimizer
        running_loss += loss.item() # we append to the running loss
    else:
        print(f'Training loss: {running_loss/len(trainloader):.4f}') # print the loss after each epoch

Training loss: 0.4926
Training loss: 0.3888
Training loss: 0.3509
Training loss: 0.3291
Training loss: 0.3147
Training loss: 0.3026
Training loss: 0.2906
Training loss: 0.2776
Training loss: 0.2733
Training loss: 0.2650


## Implementing dropouts

Overfitting happens when a model learns the data that is given to it for training rather than generalizing on the solution space—that is, it learns the minute details and noises of the training data, instead of grasping the bigger
picture, and so performs poorly on new data. Regularization is the process of preventing models from overfitting. 

Using a dropout is one of the most popular regularization techniques in neural networks, in which randomly selected neurons are turned off while training—that is, the contribution of neurons is temporarily removed from the forward pass and the backward pass doesn't affect the weights, so that no single neuron or subset of neurons gets all the decisive power of the model; rather, all the neurons are forced to make active contributions to predictions.

Dropouts can be intuitively understood as creating a large number of ensemble models, learning to capture various features under one big definition of a model. 

In [19]:
class FashionNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden1 = nn.Linear(784, 256)
        self.hidden2 = nn.Linear(256, 128)
        self.output = nn.Linear(128, 10)
        self.log_softmax = nn.LogSoftmax()
        self.activation = nn.ReLU()
        self.drop = nn.Dropout(p=0.25) # add dropout layer with a 0.25 prob, 25% of neurones will be turned off
    def forward(self, x):
        x = self.hidden1(x)
        x = self.activation(x)
        x = self.drop(x) # dropouts must be applied on hidden layers to prevent losing input/output data
        x = self.hidden2(x)
        x = self.activation(x)
        x = self.drop(x)
        x = self.output(x)
        output = self.log_softmax(x)
        return output

In [20]:
model = FashionNetwork()
print(model)

FashionNetwork(
  (hidden1): Linear(in_features=784, out_features=256, bias=True)
  (hidden2): Linear(in_features=256, out_features=128, bias=True)
  (output): Linear(in_features=128, out_features=10, bias=True)
  (log_softmax): LogSoftmax(dim=None)
  (activation): ReLU()
  (drop): Dropout(p=0.25, inplace=False)
)


## Implementing functional APIs

In [21]:
import torch.nn.functional as F

In [22]:
class FashionNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden1 = nn.Linear(784,256)
        self.hidden2 = nn.Linear(256,128)
        self.output = nn.Linear(128,10)
        
        
    def forward(self,x):
        x = F.relu(self.hidden1(x))
        x = F.relu(self.hidden2(x))
        x = F.log_softmax(self.output(x))
        return x