In [1]:
%matplotlib inline

# Build a Neural Network

Neural networks comprise of layers/modules that perform operations on data. 

The `torch.nn` <https://pytorch.org/docs/stable/nn.html> namespace provides all the building blocks you need to 
build your own neural network 

Every module in PyTorch subclasses the `nn.Module` <https://pytorch.org/docs/stable/generated/torch.nn.Module.html>

- A neural network is a module itself that consists of other modules (layers)
- This nested structure allows for building and managing complex architectures easily

### Import the necessary libraries

In [2]:
import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

### Get Device for Training

Train our model on a hardware accelerator like the GPU, if it is available.

Let's check to see if 
`torch.cuda` <https://pytorch.org/docs/stable/notes/cuda.html> is available, else we 
continue to use the CPU. 



In [3]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print('Using {} device'.format(device))

Using cpu device


## Define the Class

Define our neural network by 
- subclassing ``nn.Module``, and 
- initialize the neural network layers in ``__init__``

- Every ``nn.Module`` subclass implements the operations on input data in the ``forward`` method



In [4]:
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
            nn.ReLU()
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

Create an instance of ``NeuralNetwork``, and move it to the ``device``, and print 
its structure

In [5]:
model = NeuralNetwork().to(device)
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
    (5): ReLU()
  )
)


To use the model, pass it the input data

This executes the model's ``forward``, along with some `background operations `<https://github.com/pytorch/pytorch/blob/270111b7b611d174967ed204776985cefca9c144/torch/nn/modules/module.py#L866>

**Do not call ``model.forward()`` directly!**

Calling the model on the input returns a **10-dimensional tensor** with raw predicted values for each class.

Get the prediction probabilities by passing it through an instance of the ``nn.Softmax`` module



In [6]:
X = torch.rand(1, 28, 28, device=device)
logits = model(X) 
pred_probab = nn.Softmax(dim=1)(logits)
y_pred = pred_probab.argmax(1)
print(f"Predicted class: {y_pred}")

Predicted class: tensor([5])


--------------




## Model Layers

Layers in the FashionMNIST model
- sample minibatch of 3 images of size 28x28 

In [7]:
input_image = torch.rand(3,28,28)
print(input_image.size())

torch.Size([3, 28, 28])


### nn.Flatten

Initialize the `nn.Flatten`  <https://pytorch.org/docs/stable/generated/torch.nn.Flatten.html>
layer to convert each 2D 28x28 image into a contiguous array of 784 pixel values (
the minibatch dimension (at dim=0) is maintained)



In [8]:
flatten = nn.Flatten()
flat_image = flatten(input_image)
print(flat_image.size())

torch.Size([3, 784])


### nn.Linear 

The `linear layer` <https://pytorch.org/docs/stable/generated/torch.nn.Linear.html>
is a module that applies a linear transformation on the input using its stored weights and biases

In [9]:
layer1 = nn.Linear(in_features=28*28, out_features=20)
hidden1 = layer1(flat_image)
print(hidden1.size())

torch.Size([3, 20])


### nn.ReLU

Non-linear activations creates the complex mappings between the model's inputs and outputs

They are applied after linear transformations to introduce **nonlinearity**, helping neural networks
learn a wide variety of phenomena and patterns

Use `nn.ReLU` <https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html> between the
linear layers

In [10]:
print(f"Before ReLU: {hidden1}\n\n")
hidden1 = nn.ReLU()(hidden1)
print(f"After ReLU: {hidden1}")

Before ReLU: tensor([[-0.5680,  0.3874,  0.3821, -0.5361, -0.1964, -0.3354, -0.2435,  0.3195,
         -0.1812, -0.1125, -0.1830, -0.3787, -0.3603, -0.7641, -0.0396,  0.1862,
          0.6584,  0.4859,  0.3829, -0.4981],
        [-0.4336,  0.0613,  0.4952, -0.4822, -0.2920, -0.2154, -0.2760,  0.6031,
          0.0697, -0.0041,  0.0721, -0.5252, -0.8989, -0.9482, -0.1632,  0.2312,
          0.3729,  0.1233,  0.5083,  0.0125],
        [-0.4565, -0.0355,  0.1932, -0.2305, -0.0583, -0.0084, -0.1247,  0.1883,
         -0.3462, -0.2060, -0.1600, -0.4131, -0.4139, -0.8751, -0.0260,  0.2280,
          0.5177,  0.5567,  0.1884, -0.2145]], grad_fn=<AddmmBackward>)


After ReLU: tensor([[0.0000, 0.3874, 0.3821, 0.0000, 0.0000, 0.0000, 0.0000, 0.3195, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.1862, 0.6584, 0.4859,
         0.3829, 0.0000],
        [0.0000, 0.0613, 0.4952, 0.0000, 0.0000, 0.0000, 0.0000, 0.6031, 0.0697,
         0.0000, 0.0721, 0.0000, 0.0000, 0.0000, 0.000

### nn.Sequential

`nn.Sequential` <https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html> is an ordered 
container of modules

- The data is passed through all the modules in the same order as defined
- Use sequential containers to put together a quick network like ``seq_modules``



In [11]:
seq_modules = nn.Sequential(
    flatten,
    layer1,
    nn.ReLU(),
    nn.Linear(20, 10)
)
input_image = torch.rand(3,28,28)
logits = seq_modules(input_image)

### nn.Softmax

The last linear layer of the neural network returns `logits` - raw values in [-\infty, \infty] - which are passed to the
`nn.Softmax` <https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html> module

- The logits are scaled to values [0, 1] representing the model's predicted probabilities for each class
- ``dim`` parameter indicates the dimension along which the values must sum to 1

In [12]:
softmax = nn.Softmax(dim=1)
pred_probab = softmax(logits)

## Model Parameters

Many layers inside a neural network are **parameterized**, i.e. have associated weights 
and biases that are optimized during training

- Subclassing ``nn.Module`` automatically tracks all fields defined inside model object, and makes all parameters 
accessible using the model's ``parameters()`` or ``named_parameters()`` methods

In [13]:
print("Model structure: ", model, "\n\n")

for name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")

Model structure:  NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
    (5): ReLU()
  )
) 


Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values : tensor([[ 0.0311, -0.0348, -0.0201,  ...,  0.0128,  0.0173, -0.0127],
        [-0.0073,  0.0250,  0.0059,  ..., -0.0003, -0.0159,  0.0077]],
       grad_fn=<SliceBackward>) 

Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values : tensor([-0.0110,  0.0192], grad_fn=<SliceBackward>) 

Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values : tensor([[ 0.0004, -0.0424,  0.0169,  ..., -0.0185, -0.0436,  0.0224],
        [-0.0210,  0.0376, -0.0205,  ..., -0.0282, -0.0434,  0.0192]],
       grad_fn=<SliceBackward>) 

Layer: linear_relu_

--------------


