# Neural Network Design

Each part in the neural network such as layers, neurons, activations, and parameters plays a special role in how the network learns and understands data. The goal of this notebook is to study how every design choice affects learning and performance. By changing the number of layers, the size of filters, learning rate, and other parameters, we can see how the network behaves, improves, or fails. Understanding these details helps us build better and more efficient models that can solve real problems in vision, language, and creativity.

### What We will cover

#### 1. Architecture Design

- Number of layers (depth): How many layers (hidden, convolutional, etc.) the network has.

- Layer types: Dense (Fully Connected), Convolutional, Recurrent, Attention, Residual, etc.

- Layer order and connections: Sequential vs. skip connections (e.g. ResNet, U-Net).

- Width of layers: Number of neurons or filters per layer.

- Bottlenecks / latent dimensions: How compressed the representation becomes (important in VAEs and autoencoders).

#### 2. Neuron-Level Parameters

- Activation functions: ReLU, LeakyReLU, Sigmoid, Tanh, GELU, Swish.

- Normalization: BatchNorm, LayerNorm, GroupNorm; helps stabilize learning.

- Dropout rate: Percentage of neurons randomly dropped to prevent overfitting.

#### 3. Training Parameters (Optimization)

- Learning rate: The most important hyperparameter controlling how fast the model learns.

- Optimizer type: SGD, Adam, RMSProp, AdamW.

- Weight initialization: Xavier, He, Normal, Uniform; affects early learning stability.

- Loss function: MSE, CrossEntropy, KL Divergence, etc., depending on the task.

- Batch size: Number of samples per gradient update.

- Number of epochs: How many times the full dataset passes through the model.

#### 4. Data-Related Parameters

- Input size / image resolution: Affects both accuracy and computation.

- Data augmentation: Flips, rotations, noise, color jitter; improves generalization.

- Normalization / scaling: Preprocessing to balance data distribution.

#### 5. Regularization and Generalization

- L1 / L2 weight decay: Penalizes large weights to prevent overfitting.

- Early stopping: Stops training when validation loss stops improving.

- Dropout: Revisited here as a key technique for generalization control.

#### 6. Advanced Architectural Controls

- Skip connections: Allow gradients to flow through long networks (ResNet, U-Net).

- Attention mechanisms: Help focus on important features (used in Transformers).

- Multi-branch architectures: Parallel paths combining different feature maps.

- Latent space structure: In VAEs or GANs, defines how data is represented internally.

#### 7. Evaluation and Metrics

- Training vs. validation loss: Used for detecting overfitting.

- Accuracy / F1 / Precision / Recall: Common metrics for classification tasks.

- PSNR / SSIM / FID: Metrics for generative or image-based models.

- Learning curves: Visualize training and validation behavior over time.

# Day 0

In [None]:
import torch 
import torch.nn as nn
from torchvision import datasets
from torch.utils.data import DataLoader
from torchvision.transforms import ToTensor

import os

## Loading Dataset

In [None]:
import os
from torch.utils.data import Dataset, DataLoader, random_split
from torchvision import datasets, transforms
from PIL import Image


class CustomDataset(Dataset):
    """
    A complete dataset class that:
      - Handles both torchvision and local folder datasets.
      - Applies correct transformations and augmentations.
      - Splits data into train/val/test.
      - Creates DataLoaders for each split.

    Designed for MNIST and general image folders.
    """

    def __init__(
        self,
        root,
        dataset_name=None,
        use_torchvision=False,
        img_size=128,
        batch_size=8,
        train_ratio=0.7,
        val_ratio=0.15,
        transform=None
    ):
        """
        Initialize dataset class parameters.

        Parameters
        ----------
        root : str
            Path to dataset folder or where torchvision downloads data.
        dataset_name : str
            Dataset name ("MNIST", etc.)
        use_torchvision : bool
            Whether to use a built-in dataset (True) or a folder (False).
        img_size : int
            Resize images to this size.
        batch_size : int
            Number of samples per batch.
        train_ratio : float
            Proportion of data for training.
        val_ratio : float
            Proportion of data for validation.
        transform : torchvision.transforms.Compose or None
            Custom user-defined transform.
        """

        super().__init__()
        self.root = root
        self.dataset_name = dataset_name
        self.use_torchvision = use_torchvision
        self.img_size = img_size
        self.batch_size = batch_size
        self.train_ratio = train_ratio
        self.val_ratio = val_ratio
        self.transform = transform

        # Determine color channels automatically
        if dataset_name == "MNIST":
            mean, std = [0.5], [0.5]  # Single channel (grayscale)
        else:
            mean, std = [0.5, 0.5, 0.5], [0.5, 0.5, 0.5]  # RGB

        # Training transformations — with data augmentation
        self.train_transform = transforms.Compose([
            transforms.Resize((img_size, img_size)),
            transforms.RandomHorizontalFlip(),
            transforms.RandomRotation(10),
            transforms.ToTensor(),
            transforms.Normalize(mean=mean, std=std)
        ])

        # Validation/Test transformations — without augmentation
        self.test_transform = transforms.Compose([
            transforms.Resize((img_size, img_size)),
            transforms.ToTensor(),
            transforms.Normalize(mean=mean, std=std)
        ])

    def __len__(self):
        """
        Return the number of images in a local folder dataset.
        Not used for torchvision datasets.
        """
        if not self.use_torchvision:
            return len(os.listdir(self.root))
        else:
            raise NotImplementedError("Torchvision datasets handle __len__ internally.")

    def __getitem__(self, idx):
        """
        Retrieve a single image (and its label if available).
        Only used for folder datasets (not torchvision).
        """
        img_path = os.path.join(self.root, os.listdir(self.root)[idx])
        image = Image.open(img_path).convert("RGB")

        if self.transform:
            image = self.transform(image)
        else:
            image = self.test_transform(image)
        return image

    def setup(self):
        """
        Prepare datasets for training, validation, and testing.

        Includes:
        - Downloading or loading data.
        - Splitting datasets.
        - Assigning transforms.

        Splitting logic:
        - Training: for model learning.
        - Validation: for hyperparameter tuning.
        - Test: for final evaluation.
        """
        if self.use_torchvision:
            if self.dataset_name == "MNIST":
                # Load MNIST from torchvision
                full_dataset = datasets.MNIST(
                    root=self.root,
                    train=True,
                    download=True,
                    transform=self.train_transform
                )
                test_dataset = datasets.MNIST(
                    root=self.root,
                    train=False,
                    download=True,
                    transform=self.test_transform
                )

                total_len = len(full_dataset)
                train_len = int(self.train_ratio * total_len)
                val_len = total_len - train_len

                # Randomly split training into train and val
                self.train_ds, self.val_ds = random_split(full_dataset, [train_len, val_len])
                self.test_ds = test_dataset
            else:
                raise ValueError("Currently only MNIST is supported for torchvision datasets.")
        else:
            # Custom local folder dataset
            full_dataset = CustomDataset(root=self.root, transform=self.test_transform)

            total_len = len(full_dataset)
            train_len = int(self.train_ratio * total_len)
            val_len = int(self.val_ratio * total_len)
            test_len = total_len - train_len - val_len

            # Split into train/val/test
            self.train_ds, self.val_ds, self.test_ds = random_split(
                full_dataset, [train_len, val_len, test_len]
            )

            # Assign transforms
            self.train_ds.dataset.transform = self.train_transform
            self.val_ds.dataset.transform = self.test_transform
            self.test_ds.dataset.transform = self.test_transform

    def train_loader(self):
        """
        Create DataLoader for the training dataset.
        Shuffle=True for better generalization.
        """
        return DataLoader(self.train_ds, batch_size=self.batch_size, shuffle=True)

    def val_loader(self):
        """
        Create DataLoader for validation dataset.
        Shuffle=False for stable evaluation.
        """
        return DataLoader(self.val_ds, batch_size=self.batch_size, shuffle=False)

    def test_loader(self):
        """
        Create DataLoader for test dataset.
        Used for final evaluation.
        """
        return DataLoader(self.test_ds, batch_size=self.batch_size, shuffle=False)


In [None]:
data_module = CustomDataset(
    root="data",
    dataset_name="MNIST",
    use_torchvision=True,
    img_size=28,
    batch_size=64
)

data_module.setup()

train_loader = data_module.train_loader()
val_loader = data_module.val_loader()
test_loader = data_module.test_loader()

for imgs, labels in train_loader:
    print(imgs.shape, labels.shape)
    break

## Simple Neural Network

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F

# ===============================================================
# Fully Connected Neural Network for MNIST Classification
# ===============================================================
# Purpose:
#   - Takes a batch of MNIST images (1x28x28)
#   - Passes them through a series of linear transformations + ReLU activations
#   - Outputs 10 values (logits), one per class (digits 0–9)
#
# This is a simple "Multi-Layer Perceptron" (MLP), also called a Feedforward Neural Network.
# ===============================================================

class NeuralNetwork(nn.Module):
    """
    This class defines the structure and computation of our neural network.

    The model follows this pipeline:
        Input: [batch_size, 1, 28, 28]
        ↓
        Flatten → [batch_size, 784]
        ↓
        Linear(784 → 512) + ReLU
        ↓
        Linear(512 → 512) + ReLU
        ↓
        Linear(512 → 10)
        ↓
        Output logits: [batch_size, 10]
    """

    def __init__(self):
        super().__init__()

        # ----------------------------------------------------------
        # 1. Flatten Layer
        # ----------------------------------------------------------
        # Converts each image (1 channel, 28x28 pixels) into a 1D tensor of 784 elements.
        # Neural networks need a vector as input for fully connected (Linear) layers.
        # No weights or biases here — it's a simple reshape operation.
        #
        # Input shape:  [batch_size, 1, 28, 28]
        # Output shape: [batch_size, 784]
        self.flatten = nn.Flatten()

        # ----------------------------------------------------------
        # 2. Define the Linear + ReLU layers using nn.Sequential
        # ----------------------------------------------------------
        self.linear_relu_stack = nn.Sequential(
            # Linear layer #1
            # Learns weights of shape [512, 784] and biases of shape [512].
            # Each output neuron “sees” *all 784 pixels* at once.
            nn.Linear(28 * 28, 512),

            # ReLU activation
            # Applies f(x) = max(0, x)
            # Keeps positive values, zeros out negative ones.
            nn.ReLU(),

            # Linear layer #2
            # Learns weights of shape [512, 512] and biases of shape [512].
            # Each neuron “sees” all 512 outputs from the previous layer.
            nn.Linear(512, 512),

            # ReLU activation again
            nn.ReLU(),

            # Linear layer #3 (Output layer)
            # Learns weights of shape [10, 512] and biases of shape [10].
            # Each neuron corresponds to one class (digits 0–9).
            nn.Linear(512, 10)
        )

    # ----------------------------------------------------------
    # 3. Forward Pass
    # ----------------------------------------------------------
    def forward(self, x):
        """
        Defines how data moves through the network step by step.

        Args:
            x: Tensor of shape [batch_size, 1, 28, 28]
               (A batch of grayscale MNIST images)

        Returns:
            logits: Tensor of shape [batch_size, 10]
                    Raw, unnormalized class scores.
        """

        # Step 1: Flatten the image
        # Before: x.shape = [batch_size, 1, 28, 28]
        # After:  x.shape = [batch_size, 784]
        x = self.flatten(x)

        # Step 2: First Linear Layer (784 → 512)
        # Each output neuron has 784 weights and 1 bias.
        # Mathematically: y = x @ W^T + b
        # W: [512, 784], b: [512]
        # Input shape:  [batch_size, 784]
        # Output shape: [batch_size, 512]
        x = self.linear_relu_stack[0](x)
        x = self.linear_relu_stack[1](x)  # ReLU activation

        # Step 3: Second Linear Layer (512 → 512)
        # Each of 512 neurons "looks" at all 512 values from the previous layer.
        # W: [512, 512], b: [512]
        # Input shape:  [batch_size, 512]
        # Output shape: [batch_size, 512]
        x = self.linear_relu_stack[2](x)
        x = self.linear_relu_stack[3](x)  # ReLU activation

        # Step 4: Output Linear Layer (512 → 10)
        # W: [10, 512], b: [10]
        # Produces 10 outputs per sample — one per digit class.
        # Input shape:  [batch_size, 512]
        # Output shape: [batch_size, 10]
        logits = self.linear_relu_stack[4](x)

        # These logits are *not probabilities* yet.
        # Later, CrossEntropyLoss applies Softmax internally.
        return logits


In [19]:
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")
model = NeuralNetwork().to(device)

Using cpu device


In [20]:
print(f"Model structure: {model}\n\n")

for name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")

Model structure: NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values : tensor([[-0.0149, -0.0226, -0.0097,  ..., -0.0318, -0.0063, -0.0088],
        [-0.0318,  0.0300, -0.0080,  ..., -0.0068, -0.0106,  0.0351]],
       grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values : tensor([-0.0252,  0.0305], grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values : tensor([[ 0.0292, -0.0166, -0.0224,  ..., -0.0140, -0.0096, -0.0180],
        [ 0.0011,  0.0438, -0.0197,  ...,  0.0135,  0.0252, -0.0388]],
       grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.2.bias | 