A neural network is a computational model inspired by the way biological neurons work in the human brain. It consists of interconnected nodes or neurons that process and learn from data, enabling tasks such as pattern recognition and decision-making in machine learning.

Things to know:

Activation function(see note): A function is applied to the output of each neuron to introduce non-linearity into the network. Common examples include ReLU (Rectified Linear Unit) and sigmoid functions.

Hyperparameter:A parameter that is set before the training process begins, such as learning rate or number of layers. Unlike model parameters, Hyperparameters are not learned from the training data.

Training loss: The loss computed on the training data during the model training process. Monitoring this helps assess how well the model is learning.

This code implements a complete training and evaluation pipeline for a neural network on the FashionMNIST dataset using PyTorch. It first sets key hyperparameters such as the learning rate, batch size, and number of epochs. The FashionMNIST dataset is downloaded and loaded, with images transformed into tensors suitable for the model. DataLoaders are created to batch the data for training and testing. The device is set to GPU if available, otherwise CPU. The neural network model is defined with a flatten layer that converts each 28x28 image into a 784-length vector, followed by a sequence of two fully connected layers with 512 neurons each and ReLU activations, and a final output layer producing 10 logits for the classes. The `train_loop` function handles one epoch of training by iterating over batches, computing predictions, calculating the loss with cross-entropy, performing backpropagation, and updating weights using stochastic gradient descent. The `test_loop` function evaluates the model’s performance on the test set without computing gradients, calculating average loss and accuracy. Finally, the code runs the training and testing loops for the specified number of epochs, printing progress and results at each step.


In [None]:
import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor
from torchvision import datasets, transforms

#hyperparameters 
learning_rate = 1e-3
batch_size = 64
epochs = 10

training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

train_dataloader = DataLoader(training_data, batch_size)
test_dataloader = DataLoader(test_data, batch_size)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using {device} device")

class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits
model = NeuralNetwork().to(device)
print(model)

def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    # Set the model to training mode - important for batch normalization and dropout layers
    # Unnecessary in this situation but added for best practices
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        # Compute prediction and loss
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch % 100 == 0:
            loss, current = loss.item(), batch * batch_size + len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")


def test_loop(dataloader, model, loss_fn):
    # Set the model to evaluation mode - important for batch normalization and dropout layers
    # Unnecessary in this situation but added for best practices
    model.eval()
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_loss, correct = 0, 0

    # Evaluating the model with torch.no_grad() ensures that no gradients are computed during test mode
    # also serves to reduce unnecessary gradient computations and memory usage for tensors with requires_grad=True
    with torch.no_grad():
        for X, y in dataloader:
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

#loss function 
loss_fn = nn.CrossEntropyLoss()
#optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)
print("Done!")

Using cpu device
NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)
Epoch 1
-------------------------------
loss: 2.300700  [   64/60000]
loss: 2.287054  [ 6464/60000]
loss: 2.267402  [12864/60000]
loss: 2.265351  [19264/60000]
loss: 2.245767  [25664/60000]
loss: 2.202098  [32064/60000]
loss: 2.229139  [38464/60000]
loss: 2.186772  [44864/60000]
loss: 2.184453  [51264/60000]
loss: 2.156507  [57664/60000]
Test Error: 
 Accuracy: 38.4%, Avg loss: 2.145272 

Epoch 2
-------------------------------
loss: 2.157872  [   64/60000]
loss: 2.146770  [ 6464/60000]
loss: 2.083392  [12864/60000]
loss: 2.108032  [19264/60000]
loss: 2.054494  [25664/60000]
loss: 1.976589  [32064/60000]
loss: 2.027245  [38464/60000]
loss: 1.937592  [44864

: 