A typical training procedure for a neural network is as follows:

Define the neural network that has some learnable parameters (or weights)

Iterate over a dataset of inputs

Process input through the network

Compute the loss (how far is the output from being correct)

Propagate gradients back into the network’s parameters

Update the weights of the network, typically using a simple update rule: weight = weight - learning_rate * gradient

> torch.nn only supports mini-batches. The entire torch.nn package only supports
> inputs that are a mini-batch of samples, and not a single sample.
> For example, nn.Conv2d will take in a 4D Tensor of nSamples x nChannels x Height
> x Width. If you have a single sample, just use input.unsqueeze(0) to add a
> fake batch dimension.

> In deep learning, "w.r.t" is an abbreviation for "with respect to". It's a
> common notation used in mathematical expressions and particularly in the
> context of derivatives and gradients. When you see "w.r.t" in a deep learning
> context, it means you're calculating a derivative or gradient of a function
> (like a loss function) concerning a specific variable (like a weight or input).
> This helps in understanding how changes in that variable affect the overall
> function. 

- torch.Tensor
  A multi-dimensional array with support for autograd operations like backward().
  Also holds the gradient w.r.t the tensor.
- torch.nn.Module
  Neural network module. Convenient way of encapsulating parameters, with helpers
  for moving them to GPU, exporting, loading, etc.
- torch.nn.Parameter

- torch.nn.functional
  provides a collection of functions commonly used in building neural networks.
  Unlike modules defined in torch.nn, which are typically instantiated as layers
  (e.g., nn.Linear, nn.Conv2d), functions in torch.nn.functional are directly
  callable operations that take tensors as input and return tensors as output.
 -  Activation Functions: 
    relu, sigmoid, softmax, tanh
 -  Loss Functions:
    mse_loss, cross_entropy, binary_cross_entropy
 -  Pooling Operations:
    max_pool1d, avg_pool2d
 -  Convolution Operations:
    conv1d, conv2d
 -  Normalization Techniques:
    batch_norm, layer_norm
 -  Linear Transformations:
    linear

# Define the network

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
import torchaudio
import numpy as np


print("PyTorch version    :", torch.__version__)
print("Torchvision version:", torchvision.__version__)
print("Torchaudio version :", torchaudio.__version__)
print("CUDA version       :", torch.version.cuda)
print("cuDNN version      :", torch.backends.cudnn.version())
print("NumPy version      :", np.__version__)


if hasattr(torch, 'accelerator'):
    print("torch.accelerator is available\n")
    device = torch.accelerator.current_accelerator() if torch.accelerator.is_available() else "cpu"
else:
    print("torch.accelerator is NOT available\n")
    if torch.backends.mps.is_available():
        device = "mps"
    elif torch.cuda.is_available():
        device = "cuda"
    else:
        device = "cpu"

print("Selected device    :", device)




class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 5x5 square convolution kernel
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120)  # 5*5 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)


    def forward(self, input): 
        # input 32 * 32
        # Convolution layer C1: 1 input image channel, 6 output channels,
        # 5x5 square convolution, it uses RELU activation function, and
        # outputs a Tensor with size (N, 6, 28, 28), where N is the size of the batch
        # How to identify the output size ?
        # output size = (Input size - kernel size + 2 * padding)/stride + 1
        # The last valid start idx of slide kernel : 2P + I - k / step size 

        # padding: extra pixels (usually zeros) added to the border of the input
        #          to control the output size.
        # No padding (P = 0): Output gets smaller
        # "Same" padding: Keeps output size the same as input.
        # PyTorch default is padding=0

        # stride 
        # How far the filter moves at each step.
        #    S=1: moves one pixel at a time (most common)
        #    S=2: skips every other pixel (reduces size faster)
        c1 = F.relu(self.conv1(input))

        # Subsampling layer S2: 2x2 grid, purely functional,
        # this layer does not have any parameter, and outputs a (N, 6, 14, 14) Tensor
        # max_pool2d(,(2,2)) cut the side into half, 28->14
        s2 = F.max_pool2d(c1, (2, 2))

        # Convolution layer C3: 6 input channels, 16 output channels,
        # 5x5 square convolution, it uses RELU activation function, and
        # outputs a (N, 16, 10, 10) Tensor
        # I + 2p - k = 14-5 = 9        9/stride + 1 = 10
        c3 = F.relu(self.conv2(s2))

        # Subsampling layer S4: 2x2 grid, purely functional,
        # this layer does not have any parameter, and outputs a (N, 16, 5, 5) Tensor
        # same as F.max_pool2d(c3, (2, 2))
        s4 = F.max_pool2d(c3, 2)

        # Flatten operation: purely functional, outputs a (N, 400) Tensor
        # Keeps dimensions before start_dim unchanged.
        # Flattens everything from start_dim onward into a single dimension.
        s4 = torch.flatten(s4, 1)

        # Fully connected layer F5: (N, 400) Tensor input,
        # and outputs a (N, 120) Tensor, it uses RELU activation function
        f5 = F.relu(self.fc1(s4))

        # Fully connected layer F6: (N, 120) Tensor input,
        # and outputs a (N, 84) Tensor, it uses RELU activation function
        f6 = F.relu(self.fc2(f5))

        # Gaussian layer OUTPUT: (N, 84) Tensor input, and
        # outputs a (N, 10) Tensor
        output = self.fc3(f6)

        return output


net = Net()
print(net)



#Make the first dimension 1 (a batch of 1), and automatically figure out the size of the second dimension."
target = torch.randn(10)  # a dummy target, for example
target = target.view(1, -1)  # make it the same shape as output





PyTorch version    : 2.0.1+cu117
Torchvision version: 0.15.2+cu117
Torchaudio version : 2.0.2+cu117
CUDA version       : 11.7
cuDNN version      : 8500
NumPy version      : 1.24.4
torch.accelerator is NOT available

Selected device    : cuda
Net(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)
