# Build the Nerual Network

`torch.nn` is a module that contains the building blocks for creating neural networks. A `torch.nn.Module` contains layers, and a method `forward(input)` that returns the `output`.

We are going to make a model that will classify the image from FashionMNIST dataset.

In [1]:
import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

## Device for Training

If your GPU(cuda) is available, you can use it for training. Otherwise, you can use CPU for training.

In [2]:
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Using {device} device")

Using cuda device


## Define the Class

Subclass `nn.Module` to define the model and initialize the neural network layers by using `super().__init__()`.

In [3]:
class NerualNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
            nn.ReLU()
        )
    
    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

In [4]:
model = NerualNetwork().to(device)
print(model)

NerualNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
    (5): ReLU()
  )
)


The model is defined by the class `NerualNetwork`, which extends `nn.Module`. The class defines the layers of the network and the forward method. The `forward` method is the actual computation. It takes input data and passes it through the layers of the network in order.

In [5]:
X = torch.rand(1, 28, 28, device=device)
logits = model(X)
pred_probab = nn.Softmax(dim=1)(logits)
y_pred = pred_probab.argmax(1)
print(f"Predicted class: {y_pred}")

Predicted class: tensor([6], device='cuda:0')


## Model Layers

Minibatch of 3 images of size 28X28

In [7]:
input_image=torch.rand(3,28,28)
print(input_image.size())

torch.Size([3, 28, 28])


### nn.Flatten

Makes input image of size `(3, 28, 28)` to `(3, 28*28)`
flattens 2D image to 1D tensor

In [8]:
flatten = nn.Flatten()
flat_image = flatten(input_image)
print(flat_image.size())

torch.Size([3, 784])


### nn.Linear

Linear layer applies linear transformation to the input using weights and biases.

In [10]:
layer1 = nn.Linear(in_features=28*28, out_features=20)
hidden1=layer1(flat_image)
print(hidden1.size())

torch.Size([3, 20])


### nn.ReLU

Non-linear activation function that is applied to the output of linear layer.

In [11]:
print(f"Before ReLU: {hidden1}\n\n")
hidden1 = nn.ReLU()(hidden1)
print(f"After ReLU: {hidden1}")

Before ReLU: tensor([[-0.4002, -0.1341, -0.0208, -0.5645,  0.1906,  0.2422,  0.4346, -0.2328,
          0.0578, -0.0147,  0.0323,  0.0865, -0.1331, -0.6104, -0.1172,  0.1024,
         -0.0350,  0.3996, -0.0200,  0.1878],
        [ 0.3342,  0.1330, -0.0168, -0.3143,  0.0918,  0.0762,  0.2696, -0.4713,
          0.2836,  0.2177, -0.1606, -0.0189,  0.0950, -0.6881, -0.2108, -0.0788,
          0.1499,  0.0394, -0.0248, -0.1180],
        [-0.1174, -0.0715,  0.0584, -0.4914, -0.1830,  0.1703,  0.4167, -0.0601,
         -0.1321,  0.0018, -0.2493,  0.0330,  0.1562, -0.2690, -0.3354, -0.0751,
         -0.2483, -0.0276,  0.0654,  0.0171]], grad_fn=<AddmmBackward0>)


After ReLU: tensor([[0.0000, 0.0000, 0.0000, 0.0000, 0.1906, 0.2422, 0.4346, 0.0000, 0.0578,
         0.0000, 0.0323, 0.0865, 0.0000, 0.0000, 0.0000, 0.1024, 0.0000, 0.3996,
         0.0000, 0.1878],
        [0.3342, 0.1330, 0.0000, 0.0000, 0.0918, 0.0762, 0.2696, 0.0000, 0.2836,
         0.2177, 0.0000, 0.0000, 0.0950, 0.0000, 0.00

### nn.Sequential

`nn.Sequential` is an ordered container of modules. You can go through multiple layers by passing input to the `Sequential`.

In [12]:
seq_modules = nn.Sequential(
    flatten,
    layer1,
    nn.ReLU(),
    nn.Linear(20, 10)
)
input_image = torch.rand(3, 28, 28)
logits = seq_modules(input_image)

### nn.Softmax

Legit is returned from the last linear layer of the neural network. Legit 

In [13]:
logits.shape

torch.Size([3, 10])

In [14]:
softmax = nn.Softmax(dim=1)
pred_probab = softmax(logits)
pred_probab

tensor([[0.1068, 0.0800, 0.0946, 0.1149, 0.1175, 0.1176, 0.0728, 0.0863, 0.1255,
         0.0840],
        [0.1107, 0.0875, 0.0951, 0.1015, 0.1170, 0.1041, 0.0822, 0.0952, 0.1224,
         0.0843],
        [0.1071, 0.0896, 0.1002, 0.0986, 0.1204, 0.1151, 0.0734, 0.0895, 0.1210,
         0.0851]], grad_fn=<SoftmaxBackward0>)

In [16]:
y_pred = pred_probab.argmax(1)
y_pred

tensor([8, 8, 8])

## Model Parameters

Layers of the neural network have parameters (weights and biases) that can be optimized during training. You can track these parameters using `model.parameters()` or `model.named_parameters()` methods.

In [17]:
print("Model structure: ", model, "\n\n")

Model structure:  NerualNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
    (5): ReLU()
  )
) 




In [18]:
for name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values: {param[:2]} \n")

Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values: tensor([[ 0.0237, -0.0028, -0.0038,  ..., -0.0335,  0.0267,  0.0106],
        [-0.0146,  0.0141, -0.0224,  ...,  0.0124, -0.0149, -0.0329]],
       device='cuda:0', grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values: tensor([ 0.0052, -0.0355], device='cuda:0', grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values: tensor([[ 0.0010,  0.0365,  0.0022,  ...,  0.0206,  0.0233, -0.0356],
        [ 0.0019,  0.0159, -0.0084,  ...,  0.0079, -0.0004,  0.0151]],
       device='cuda:0', grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.2.bias | Size: torch.Size([512]) | Values: tensor([-0.0411, -0.0052], device='cuda:0', grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.4.weight | Size: torch.Size([10, 512]) | Values: tensor([[-0.0439,  0.0076,  0.0346,  ..., -0.0124,  0.0384,  0.0367],
        [ 0.0139, -0.0104, -0.0364,  ...