# Build the Neural Network

Neural network comprise of layers / modules that perform operations on data. 

The torch.nn namespsaces provides all the building blocks you need to build your own neurual network. Every module in pytorch subclasses the nn.module. A neural network is a module itself that consists of other modules (layers). This nested structure allows for building and managing complex archetectures easily.

The following sections, we'll build a nn to classify the fashionmnist dataset

In [1]:
import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

## Get device for training

We want to be able to train our model on a hardware accelerator like gpu or mps.

In [2]:
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)

print(f"using {device} device")

using mps device


## Define the Class

We define our nerual network by subclassing nn.module, and initialize the nerual network layers in init. Every nn.module subclass implements the operations on input data in the forward method

In [3]:
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )
    
    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

We create an instance of neurualnetwork , and move it to the device, and print its structure

In [4]:
model = NeuralNetwork().to(device)
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


To use the model, we pass it the input data. This executes the models forward, along with some background options. Do not call model.forward directly

Calling the the model on the input returns a 2-dimensional tensor with dim=0 corresponding to each output of 10 raw predicted values for each class, and dim=1 corresponding to the indiviual values of each output. 

We get the prediction probabilities by passing it through an instance of the nn.softmax module

In [5]:
X = torch.rand(1, 28, 28, device=device)
logits = model(X)
pred_probab = nn.Softmax(dim=1)(logits)
y_pred = pred_probab.argmax(1)
print(f"Predicted class: {y_pred}")

Predicted class: tensor([2], device='mps:0')


## Model Layers

lets break down the layers in the fashionmnist model. 

To illustrate it, we will take a sample minibatch of 3 images of size 28x28 and see what happens to it as we pass it though the network

In [6]:
input_image = torch.rand(3, 28, 28)
print(input_image.size())

torch.Size([3, 28, 28])


### nn.flatten

We initialize the nn.flatten layer to convert each 2d 28x28 image into a contiguous array of 784 pixel values ( the minibatch dimension is maintained)

In [7]:
flatten = nn.Flatten()
flat_image = flatten(input_image)
print(flat_image.size())

torch.Size([3, 784])


### nn.linear

the linear layer is a modeule that applies a linear tranformation on the input using its stored weights and bias 

In [8]:
layer1 = nn.Linear(in_features=28*28, out_features=20)
hidden1 = layer1(flat_image)

print(hidden1.size())

torch.Size([3, 20])


### nn.relu

Non-linear activations are what create the complex mapping between the models inputs and outputs. They are applied after linear transformations to introduce nonlinearity, helping neural networks learn a wide variety of phenomena

This model, we use nn.relu betwen our linear layes, but there other activation functions to introduce non-linearity in your model

In [9]:
print(f"before relu: {hidden1} \n\n")
hidden1 = nn.ReLU()(hidden1)
print(f"after relu: {hidden1}")

before relu: tensor([[ 0.3142, -0.1475,  0.7625, -0.1126, -0.0028,  0.4756, -0.2604, -0.1498,
         -0.1373,  0.0050,  0.1223,  0.3017, -0.0232, -0.6451, -0.1084,  0.2931,
          0.2026, -0.3242, -0.0237, -0.1021],
        [ 0.1892,  0.0031,  0.4144, -0.2846, -0.1556, -0.1566, -0.0593,  0.0804,
          0.0124, -0.1292,  0.0777,  0.4637, -0.4387, -0.8024, -0.0849,  0.4886,
          0.1013, -0.1476, -0.1980, -0.1790],
        [ 0.0638,  0.3232,  0.7931, -0.0231, -0.0059,  0.4532, -0.3270, -0.0854,
          0.1416,  0.2374, -0.1072,  0.4645, -0.1028, -0.7788,  0.0210,  0.3702,
          0.2642, -0.0280, -0.0492,  0.1410]], grad_fn=<AddmmBackward0>) 


after relu: tensor([[0.3142, 0.0000, 0.7625, 0.0000, 0.0000, 0.4756, 0.0000, 0.0000, 0.0000,
         0.0050, 0.1223, 0.3017, 0.0000, 0.0000, 0.0000, 0.2931, 0.2026, 0.0000,
         0.0000, 0.0000],
        [0.1892, 0.0031, 0.4144, 0.0000, 0.0000, 0.0000, 0.0000, 0.0804, 0.0124,
         0.0000, 0.0777, 0.4637, 0.0000, 0.0000, 0.0