# **Recap of Previous Lecture ⬇**

## **Artificial Neural Network (ANN) Architecture**

The diagram below illustrates the structure of an Artificial Neural Network (ANN), including the input layer, hidden layers, and output layer. It visually represents how data flows through neurons, with connections (weights) adjusting during training to improve predictions.


![ANN Diagram](https://i0.wp.com/i.postimg.cc/rmZkmFJ2/Artificial-neural-network-architecture-ANN-i-h-1-h-2-h-n-o.png?resize=710%2C416&ssl=1)


### **Building Blocks of ANN**

| **Category** | **Components & Definitions** |
| ----- | ----- |
| **Neural Units** | **Neuron (Perceptron), Node, Unit →** Fundamental units that process inputs and pass signals forward. |
| **Weights & Bias** | 	**Weights (w), Bias(b) →** Learnable parameters that adjust how inputs influence outputs. |
| **Layers** | 	**Input Layer, Hidden Layer, Output Layer →** Different stages where data is processed and transformed. |
| **Activation Functions** | **ReLU, Sigmoid, Tanh, Softmax →** Functions that introduce non-linearity and help in learning complex patterns. |
| **Learning Process** | **Forward Propagation, Backpropagation, Loss Function →** The mechanism of learning by adjusting weights based on errors.|
| **Optimization** | **Optimizer, Gradient Descent, Adam, RMSprop →** Algorithms that update weights to minimize loss. |
| **Hyperparameters** | **Learning Rate, Epochs, Batch Size →** Parameters set before training to control how the model learns. |
| **Regularization** | **Dropout, L1/L2 Regularization (Lasso, Ridge) →** Techniques to prevent overfitting by controlling weights. |


> **Note:** We will go through each of these building blocks in detail to gain a better understanding of the ANN architecture.
>


### **Perceptron Architecture**
The diagram below illustrates the structure of a perceptron, showcasing how inputs are weighted, summed, and passed through an activation function to produce an output.


![Perceptron](https://aiml.com/wp-content/uploads/2023/09/perceptron-a-neuron-2.png)

### **Perceptron Training Algorithm**
The following algorithm outlines the steps involved in training a perceptron, where weights are adjusted iteratively based on errors to achieve accurate classification.

**The process typically follows these steps:**

* **Initialize weights:** Set initial weights randomly.

* **Forward pass:** Compute the weighted sum of inputs and pass it through an activation function.

* **Calculate error:** Compare the predicted output with the actual target.

* **Update weights:** Adjust weights based on the error using a learning rate and the gradient descent algorithm.

* **Repeat:** Iterate through the dataset multiple times(epochs) until the more converges.

## **Geometric Intuition of MLP**
Before diving into the mathematical details of forward and backward propagation, let’s first develop an intuitive understanding of how a Multi-Layer Perceptron (MLP) transforms data in a geometric sense.

An MLP applies a series of linear transformations (matrix multiplications) followed by non-linear activations. This process helps in learning complex decision boundaries, making it capable of solving classification and regression tasks.

To visualize how an MLP learns to separate data in high-dimensional space, explore the interactive demonstration below:

🔗 [MLP Geometric Intuition Demo](https://playground.tensorflow.org/)

👉 Click the link above to see how different MLP architectures transform data and learn decision boundaries in real-time!

# **Lecture 2: Forward and Backward Propagation in a 2-Layer MLP**


![Ann_single](https://github.com/sohag95/ANN_Lecture_Materials/blob/main/matrix.jpg?raw=true)

**Understanding the Flow of Information in a Multi-Layer Perceptron (MLP)**

The diagram below represents a 2-layer MLP architecture, consisting of an input layer, one hidden layer, and an output layer. We will now analyze how data flows through this network during:

* **Forward Propagation –** Computing activations layer by layer.

* **Backward Propagation –** Updating weights using gradients.

**1. Forward Propagation:**

1. Input features pass through the first layer, where weights and biases transform them.

2. The activation function (e.g., ReLU, Sigmoid) is applied to introduce non-linearity.

3. The process repeats for the output layer to compute final predictions.

**2. Backward Propagation:**

1. The loss function measures the error between predicted and actual outputs.

2. Gradients of the loss w.r.t. weights are computed using the chain rule.

3. The weights are updated using an optimization algorithm like Gradient Descent.

This understanding is crucial for training neural networks effectively. Now, let’s dive into the mathematical formulation of each step.

## ✍**Lecture Notes on Forward and Backward Propagation**
For a detailed explanation of forward and backward propagation in a 2-layer MLP, refer to the lecture notes provided in the PDF below. This document covers the step-by-step derivations, mathematical formulations, and key insights into how gradients are computed and used for weight updates.

📄 [Download Lecture Notes](https://github.com/sohag95/ANN_Lecture_Materials/blob/main/Lab%20Neural%20Net.pdf)
**(provided by Br. Tamal Maharaj)**

Click the link above to access the notes. Reviewing this material will help reinforce the concepts discussed in the lecture!

# **Implementation of Forward and Backward Propagation in a 2-Layer MLP**

## Warm-up: numpy
Before introducing PyTorch, we will first implement the network using numpy.

Numpy provides an n-dimensional array object, and many functions for manipulating these arrays. Numpy is a generic framework for scientific computing; it does not know anything about computation graphs, or deep learning, or gradients. However we can easily use numpy to fit a two-layer network to random data by manually implementing the forward and backward passes through the network using numpy operations:

In [14]:
# Code in file tensor/two_layer_net_numpy.py
import numpy as np

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random input and output data
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)

# Randomly initialize weights
w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)


In [15]:
learning_rate = 1e-6
for t in range(500):
  # Forward pass: compute predicted y
  h = x.dot(w1)
  h_relu = np.maximum(h, 0)
  y_pred = h_relu.dot(w2)

  # Compute and print loss
  loss = np.square(y_pred - y).sum()
  print(f"Iteration {t}: Loss = {loss:.6f}")

  # Backprop to compute gradients of w1 and w2 with respect to loss
  grad_y_pred = 2.0 * (y_pred - y)
  grad_w2 = h_relu.T.dot(grad_y_pred)
  grad_h_relu = grad_y_pred.dot(w2.T)
  grad_h = grad_h_relu.copy()
  grad_h[h < 0] = 0
  grad_w1 = x.T.dot(grad_h)

  # Update weights
  w1 -= learning_rate * grad_w1
  w2 -= learning_rate * grad_w2

Iteration 0: Loss = 41562340.899624
Iteration 1: Loss = 44499987.138660
Iteration 2: Loss = 50702481.330080
Iteration 3: Loss = 48111419.668701
Iteration 4: Loss = 32639419.871779
Iteration 5: Loss = 15578652.920235
Iteration 6: Loss = 6337092.426986
Iteration 7: Loss = 2990065.997808
Iteration 8: Loss = 1862371.708903
Iteration 9: Loss = 1397678.237552
Iteration 10: Loss = 1137021.582692
Iteration 11: Loss = 955390.518012
Iteration 12: Loss = 815099.456147
Iteration 13: Loss = 701879.227962
Iteration 14: Loss = 608618.471690
Iteration 15: Loss = 530840.110504
Iteration 16: Loss = 465419.691936
Iteration 17: Loss = 409962.916863
Iteration 18: Loss = 362648.403355
Iteration 19: Loss = 322044.159590
Iteration 20: Loss = 286955.263990
Iteration 21: Loss = 256508.068581
Iteration 22: Loss = 229987.773369
Iteration 23: Loss = 206821.786190
Iteration 24: Loss = 186447.209760
Iteration 25: Loss = 168465.793223
Iteration 26: Loss = 152505.845589
Iteration 27: Loss = 138360.050377
Iteration 28:

### **Extending MLP Implementation: Adding Bias Terms to Forward and Backward Propagation**

In [3]:
import numpy as np

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random input and output data
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)

# Randomly initialize weights and biases
w1 = np.random.randn(D_in, H)
b1 = np.zeros((1, H))  # Bias for hidden layer
w2 = np.random.randn(H, D_out)
b2 = np.zeros((1, D_out))  # Bias for output layer

learning_rate = 1e-6
for t in range(500):
    # Forward pass: compute predicted y
    h = x.dot(w1) + b1  # Adding bias term
    h_relu = np.maximum(h, 0)
    y_pred = h_relu.dot(w2) + b2  # Adding bias term

    # Compute and print loss
    loss = np.square(y_pred - y).sum()
    print(f"Iteration {t}: Loss = {loss:.6f}")

    # Backprop to compute gradients
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.T.dot(grad_y_pred)
    grad_b2 = np.sum(grad_y_pred, axis=0, keepdims=True)

    grad_h_relu = grad_y_pred.dot(w2.T)
    grad_h = grad_h_relu.copy()
    grad_h[h < 0] = 0

    grad_w1 = x.T.dot(grad_h)
    grad_b1 = np.sum(grad_h, axis=0, keepdims=True)

    # Update weights and biases
    w1 -= learning_rate * grad_w1
    b1 -= learning_rate * grad_b1
    w2 -= learning_rate * grad_w2
    b2 -= learning_rate * grad_b2


Iteration 0: Loss = 37865598.259198
Iteration 1: Loss = 33721822.564776
Iteration 2: Loss = 31801351.251798
Iteration 3: Loss = 27065982.656061
Iteration 4: Loss = 19503505.124063
Iteration 5: Loss = 11951835.828404
Iteration 6: Loss = 6736787.222641
Iteration 7: Loss = 3847463.146075
Iteration 8: Loss = 2392605.467024
Iteration 9: Loss = 1650190.756852
Iteration 10: Loss = 1239710.186026
Iteration 11: Loss = 985535.959410
Iteration 12: Loss = 811392.711802
Iteration 13: Loss = 681817.567590
Iteration 14: Loss = 580302.398593
Iteration 15: Loss = 498549.438831
Iteration 16: Loss = 431112.857068
Iteration 17: Loss = 374773.170976
Iteration 18: Loss = 327322.339609
Iteration 19: Loss = 287052.211527
Iteration 20: Loss = 252656.862095
Iteration 21: Loss = 223091.758009
Iteration 22: Loss = 197568.397756
Iteration 23: Loss = 175458.085238
Iteration 24: Loss = 156215.893077
Iteration 25: Loss = 139421.322066
Iteration 26: Loss = 124705.804943
Iteration 27: Loss = 111769.681655
Iteration 28:

## PyTorch: Tensors
Numpy is a great framework, but it cannot utilize GPUs to accelerate its numerical computations. For modern deep neural networks, GPUs often provide speedups of 50x or greater, so unfortunately numpy won't be enough for modern deep learning.

Here we introduce the most fundamental PyTorch concept: the Tensor. A PyTorch Tensor is conceptually identical to a numpy array: a Tensor is an n-dimensional array, and PyTorch provides many functions for operating on these Tensors. Any computation you might want to perform with numpy can also be accomplished with PyTorch Tensors; you should think of them as a generic tool for scientific computing.

However unlike numpy, PyTorch Tensors can utilize GPUs to accelerate their numeric computations. To run a PyTorch Tensor on GPU, you use the device argument when constructing a Tensor to place the Tensor on a GPU.

Here we use PyTorch Tensors to fit a two-layer network to random data. Like the numpy example above we manually implement the forward and backward passes through the network, using operations on PyTorch Tensors:

In [4]:
# Code in file tensor/two_layer_net_tensor.py
import torch

device = torch.device('cpu')
#device = torch.device('cuda') # Uncomment this to run on GPU
print("device:",device)
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random input and output data
x = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device=device)

# Randomly initialize weights
w1 = torch.randn(D_in, H, device=device)
w2 = torch.randn(H, D_out, device=device)

learning_rate = 1e-6
for t in range(500):
  # Forward pass: compute predicted y
  h = x.mm(w1)
  h_relu = h.clamp(min=0)
  y_pred = h_relu.mm(w2)

  # Compute and print loss; loss is a scalar, and is stored in a PyTorch Tensor
  # of shape (); we can get its value as a Python number with loss.item().
  loss = (y_pred - y).pow(2).sum()
  print(f"Iteration {t}: Loss = {loss.item():.6f}")

  # Backprop to compute gradients of w1 and w2 with respect to loss
  grad_y_pred = 2.0 * (y_pred - y)
  grad_w2 = h_relu.t().mm(grad_y_pred)
  grad_h_relu = grad_y_pred.mm(w2.t())
  grad_h = grad_h_relu.clone()
  grad_h[h < 0] = 0
  grad_w1 = x.t().mm(grad_h)

  # Update weights using gradient descent
  w1 -= learning_rate * grad_w1
  w2 -= learning_rate * grad_w2

device: cpu
Iteration 0: Loss = 26638656.000000
Iteration 1: Loss = 22390720.000000
Iteration 2: Loss = 21406948.000000
Iteration 3: Loss = 20782100.000000
Iteration 4: Loss = 18961674.000000
Iteration 5: Loss = 15572472.000000
Iteration 6: Loss = 11473173.000000
Iteration 7: Loss = 7715688.000000
Iteration 8: Loss = 4934873.000000
Iteration 9: Loss = 3121183.000000
Iteration 10: Loss = 2025165.500000
Iteration 11: Loss = 1375202.750000
Iteration 12: Loss = 987179.875000
Iteration 13: Loss = 746736.062500
Iteration 14: Loss = 590436.000000
Iteration 15: Loss = 482895.218750
Iteration 16: Loss = 404745.000000
Iteration 17: Loss = 345260.375000
Iteration 18: Loss = 298363.656250
Iteration 19: Loss = 260276.218750
Iteration 20: Loss = 228640.156250
Iteration 21: Loss = 201981.187500
Iteration 22: Loss = 179230.234375
Iteration 23: Loss = 159695.078125
Iteration 24: Loss = 142748.390625
Iteration 25: Loss = 127965.875000
Iteration 26: Loss = 115012.296875
Iteration 27: Loss = 103623.781250

##PyTorch: Autograd

In the above examples, we had to manually implement both the forward and backward passes of our neural network. Manually implementing the backward pass is not a big deal for a small two-layer network, but can quickly get very hairy for large complex networks.

Thankfully, we can use automatic differentiation to automate the computation of backward passes in neural networks. The autograd package in PyTorch provides exactly this functionality. When using autograd, the forward pass of your network will define a computational graph; nodes in the graph will be Tensors, and edges will be functions that produce output Tensors from input Tensors. Backpropagating through this graph then allows you to easily compute gradients.

This sounds complicated, it's pretty simple to use in practice. If we want to compute gradients with respect to some Tensor, then we set requires_grad=True when constructing that Tensor. Any PyTorch operations on that Tensor will cause a computational graph to be constructed, allowing us to later perform backpropagation through the graph. If x is a Tensor with requires_grad=True, then after backpropagation x.grad will be another Tensor holding the gradient of x with respect to some scalar value.

Sometimes you may wish to prevent PyTorch from building computational graphs when performing certain operations on Tensors with requires_grad=True; for example we usually don't want to backpropagate through the weight update steps when training a neural network. In such scenarios we can use the torch.no_grad() context manager to prevent the construction of a computational graph.

Here we use PyTorch Tensors and autograd to implement our two-layer network; now we no longer need to manually implement the backward pass through the network:

In [5]:
# Code in file autograd/two_layer_net_autograd.py
import torch

device = torch.device('cpu')
# device = torch.device('cuda') # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold input and outputs
x = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device=device)

# Create random Tensors for weights; setting requires_grad=True means that we
# want to compute gradients for these Tensors during the backward pass.
w1 = torch.randn(D_in, H, device=device, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, requires_grad=True)

learning_rate = 1e-6
for t in range(500):
  # Forward pass: compute predicted y using operations on Tensors. Since w1 and
  # w2 have requires_grad=True, operations involving these Tensors will cause
  # PyTorch to build a computational graph, allowing automatic computation of
  # gradients. Since we are no longer implementing the backward pass by hand we
  # don't need to keep references to intermediate values.
  y_pred = x.mm(w1).clamp(min=0).mm(w2)

  # Compute and print loss. Loss is a Tensor of shape (), and loss.item()
  # is a Python number giving its value.
  loss = (y_pred - y).pow(2).sum()
  print(f"Iteration {t}: Loss = {loss.item():.6f}")

  # Use autograd to compute the backward pass. This call will compute the
  # gradient of loss with respect to all Tensors with requires_grad=True.
  # After this call w1.grad and w2.grad will be Tensors holding the gradient
  # of the loss with respect to w1 and w2 respectively.
  loss.backward()

  # Update weights using gradient descent. For this step we just want to mutate
  # the values of w1 and w2 in-place; we don't want to build up a computational
  # graph for the update steps, so we use the torch.no_grad() context manager
  # to prevent PyTorch from building a computational graph for the updates
  with torch.no_grad():
    w1 -= learning_rate * w1.grad
    w2 -= learning_rate * w2.grad

    # Manually zero the gradients after running the backward pass
    w1.grad.zero_()
    w2.grad.zero_()

Iteration 0: Loss = 28609870.000000
Iteration 1: Loss = 24658280.000000
Iteration 2: Loss = 24529634.000000
Iteration 3: Loss = 24502900.000000
Iteration 4: Loss = 22552322.000000
Iteration 5: Loss = 17924538.000000
Iteration 6: Loss = 12406175.000000
Iteration 7: Loss = 7668046.000000
Iteration 8: Loss = 4543764.500000
Iteration 9: Loss = 2725073.250000
Iteration 10: Loss = 1734911.875000
Iteration 11: Loss = 1188862.500000
Iteration 12: Loss = 875106.187500
Iteration 13: Loss = 681594.437500
Iteration 14: Loss = 552340.687500
Iteration 15: Loss = 459709.781250
Iteration 16: Loss = 389477.625000
Iteration 17: Loss = 333957.187500
Iteration 18: Loss = 288806.562500
Iteration 19: Loss = 251364.640625
Iteration 20: Loss = 219865.656250
Iteration 21: Loss = 193105.000000
Iteration 22: Loss = 170204.984375
Iteration 23: Loss = 150516.312500
Iteration 24: Loss = 133471.218750
Iteration 25: Loss = 118661.015625
Iteration 26: Loss = 105767.703125
Iteration 27: Loss = 94510.414062
Iteration 28

##PyTorch: nn
Computational graphs and autograd are a very powerful paradigm for defining complex operators and automatically taking derivatives; however for large neural networks raw autograd can be a bit too low-level.

When building neural networks we frequently think of arranging the computation into layers, some of which have learnable parameters which will be optimized during learning.

In TensorFlow, packages like Keras, TensorFlow-Slim, and TFLearn provide higher-level abstractions over raw computational graphs that are useful for building neural networks.

In PyTorch, the nn package serves this same purpose. The nn package defines a set of Modules, which are roughly equivalent to neural network layers. A Module receives input Tensors and computes output Tensors, but may also hold internal state such as Tensors containing learnable parameters. The nn package also defines a set of useful loss functions that are commonly used when training neural networks.

In this example we use the nn package to implement our two-layer network:

In [None]:
# Code in file nn/two_layer_net_nn.py
import torch

device = torch.device('cpu')
#device = torch.device('cuda') # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device=device)

# Use the nn package to define our model as a sequence of layers. nn.Sequential
# is a Module which contains other Modules, and applies them in sequence to
# produce its output. Each Linear Module computes output from input using a
# linear function, and holds internal Tensors for its weight and bias.
# After constructing the model we use the .to() method to move it to the
# desired device.
model = torch.nn.Sequential(
          torch.nn.Linear(D_in, H),
          torch.nn.ReLU(),
          torch.nn.Linear(H, D_out),
        ).to(device)

# The nn package also contains definitions of popular loss functions; in this
# case we will use Mean Squared Error (MSE) as our loss function. Setting
# reduction='sum' means that we are computing the *sum* of squared errors rather
# than the mean; this is for consistency with the examples above where we
# manually compute the loss, but in practice it is more common to use mean
# squared error as a loss by setting reduction='elementwise_mean'.
loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate = 1e-4
for t in range(500):
  # Forward pass: compute predicted y by passing x to the model. Module objects
  # override the __call__ operator so you can call them like functions. When
  # doing so you pass a Tensor of input data to the Module and it produces
  # a Tensor of output data.
  y_pred = model(x)

  # Compute and print loss. We pass Tensors containing the predicted and true
  # values of y, and the loss function returns a Tensor containing the loss.
  loss = loss_fn(y_pred, y)
  print(f"Iteration {t}: Loss = {loss.item():.6f}")

  # Zero the gradients before running the backward pass.
  model.zero_grad()

  # Backward pass: compute gradient of the loss with respect to all the learnable
  # parameters of the model. Internally, the parameters of each Module are stored
  # in Tensors with requires_grad=True, so this call will compute gradients for
  # all learnable parameters in the model.
  loss.backward()

  # Update the weights using gradient descent. Each parameter is a Tensor, so
  # we can access its data and gradients like we did before.
  with torch.no_grad():
    for param in model.parameters():
      param.data -= learning_rate * param.grad

Iteration 0: Loss = 719.143799
Iteration 1: Loss = 665.955078
Iteration 2: Loss = 620.433655
Iteration 3: Loss = 580.002136
Iteration 4: Loss = 543.803955
Iteration 5: Loss = 511.173065
Iteration 6: Loss = 481.436554
Iteration 7: Loss = 454.657715
Iteration 8: Loss = 430.189301
Iteration 9: Loss = 407.492371
Iteration 10: Loss = 386.327820
Iteration 11: Loss = 366.391663
Iteration 12: Loss = 347.479919
Iteration 13: Loss = 329.527100
Iteration 14: Loss = 312.512482
Iteration 15: Loss = 296.303741
Iteration 16: Loss = 280.661224
Iteration 17: Loss = 265.734741
Iteration 18: Loss = 251.484192
Iteration 19: Loss = 237.879181
Iteration 20: Loss = 224.893600
Iteration 21: Loss = 212.535049
Iteration 22: Loss = 200.766052
Iteration 23: Loss = 189.582733
Iteration 24: Loss = 178.945358
Iteration 25: Loss = 168.841995
Iteration 26: Loss = 159.214355
Iteration 27: Loss = 150.069473
Iteration 28: Loss = 141.398544
Iteration 29: Loss = 133.172226
Iteration 30: Loss = 125.369186
Iteration 31: Loss

# **Training a 3-Layer MLP on the MNIST Dataset Using PyTorch**

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
from torchvision.datasets import MNIST
from torch.utils.data import DataLoader


In [None]:
# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Hyperparameters
num_epochs = 10
batch_size = 100
learning_rate = 0.001
input_size = 28 * 28  # MNIST images are 28x28 pixels
hidden_size_1 = 128  # Number of neurons in hidden layers-1
hidden_size_2 = 64   # Number of neurons in hidden layers-2
num_classes = 10  # Digits 0-9


In [None]:
# MNIST dataset
train_dataset = MNIST(root='./data', train=True, transform=transforms.ToTensor(), download=True)
test_dataset = MNIST(root='./data', train=False, transform=transforms.ToTensor())

# Data loader
train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)


100%|██████████| 9.91M/9.91M [00:00<00:00, 17.8MB/s]
100%|██████████| 28.9k/28.9k [00:00<00:00, 484kB/s]
100%|██████████| 1.65M/1.65M [00:00<00:00, 4.47MB/s]
100%|██████████| 4.54k/4.54k [00:00<00:00, 7.72MB/s]


In [None]:
# Define the ANN model
class ANN(nn.Module):
    def __init__(self):
        super(ANN, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size_1)
        self.relu1 = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size_1, hidden_size_2)
        self.relu2 = nn.ReLU()
        self.fc3 = nn.Linear(hidden_size_2, num_classes)

    def forward(self, x):
        x = x.view(-1, input_size)  # Flatten the image
        x = self.fc1(x)
        x = self.relu1(x)
        x = self.fc2(x)
        x = self.relu2(x)
        x = self.fc3(x)
        return x


In [None]:
# Initialize model
model = ANN().to(device)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

# Training the model
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images, labels = images.to(device), labels.to(device)

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward and optimization
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if (i+1) % 100 == 0:
            print(f'Epoch [{epoch+1}/{num_epochs}], Step [{i+1}/{len(train_loader)}], Loss: {loss.item():.4f}')

print('Training finished.')


Epoch [1/10], Step [100/600], Loss: 0.4727
Epoch [1/10], Step [200/600], Loss: 0.4453
Epoch [1/10], Step [300/600], Loss: 0.2940
Epoch [1/10], Step [400/600], Loss: 0.1269
Epoch [1/10], Step [500/600], Loss: 0.1609
Epoch [1/10], Step [600/600], Loss: 0.1834
Epoch [2/10], Step [100/600], Loss: 0.1584
Epoch [2/10], Step [200/600], Loss: 0.0984
Epoch [2/10], Step [300/600], Loss: 0.1417
Epoch [2/10], Step [400/600], Loss: 0.0674
Epoch [2/10], Step [500/600], Loss: 0.1066
Epoch [2/10], Step [600/600], Loss: 0.1027
Epoch [3/10], Step [100/600], Loss: 0.0859
Epoch [3/10], Step [200/600], Loss: 0.1533
Epoch [3/10], Step [300/600], Loss: 0.1169
Epoch [3/10], Step [400/600], Loss: 0.0524
Epoch [3/10], Step [500/600], Loss: 0.1114
Epoch [3/10], Step [600/600], Loss: 0.0639
Epoch [4/10], Step [100/600], Loss: 0.0407
Epoch [4/10], Step [200/600], Loss: 0.0457
Epoch [4/10], Step [300/600], Loss: 0.1354
Epoch [4/10], Step [400/600], Loss: 0.0545
Epoch [4/10], Step [500/600], Loss: 0.0579
Epoch [4/10

In [None]:
# Testing the model
model.eval()
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print(f'Accuracy on the test images: {100 * correct / total:.2f}%')

Accuracy on the test images: 97.90%
