# PyTorch Cheat Sheet

<!--- Start of badges -->
<!-- Badges: python,pytorch,machinelearning,deeplearning -->

<p align="left">
<img alt="Deeplearning" src="https://img.shields.io/badge/-Deep_Learning-333333.svg?logo=&style=flat-square" />
 <img alt="Machinelearning" src="https://img.shields.io/badge/-Machine_Learning-333333.svg?logo=&style=flat-square" />
 <img alt="Python" src="https://img.shields.io/badge/-Python-3776AB?logo=python&logoColor=white&style=flat-square" />
 <img alt="Pytorch" src="https://img.shields.io/badge/-PyTorch-EE4C2C?logo=pytorch&logoColor=white&style=flat-square" />
</p>
<!--- End of badges -->

<!--- Blurb
This notebook covers fundamental PyTorch concepts and provides code examples for tensor manipulation, dataset handling, building various neural network architectures, and implementing training and testing workflows. Key topics include tensor operations, custom datasets and transforms, common layers, model creation, loss functions, and optimisers.
-->

<!--- Start of Thumbnail-->
<!--- src="Images/pytorch_thumbnail.png" --->
<!--- End of Thumbnail-->

This notebook provides a comprehensive PyTorch cheat sheet, based on material from the ['Introduction to Neural Networks and PyTorch'](https://www.coursera.org/learn/deep-neural-networks-with-pytorch/home/welcome) and ['Deep Learning with PyTorch'](https://www.coursera.org/learn/advanced-deep-learning-with-pytorch/home/welcome) course by IBM on Coursera. It covers fundamental PyTorch concepts and provides code examples for tensor manipulation, dataset handling, building various neural network architectures, and implementing training and testing workflows. Key topics include tensor operations, custom datasets and transforms, common layers, model creation, loss functions, and optimisers.




In [1]:
import generate_notebook_toc 
from IPython.display import display, Markdown

current_notebook_filename = "CS_PyTorch.ipynb"

html_toc = generate_notebook_toc.get_html_toc(current_notebook_filename)

display(Markdown(html_toc))

<div style="background-color: whitesmoke; padding: 10px; padding-left: 30px;">
  <h2>Table of Contents</h2>
  <hr>
  <div style="font-weight: bold; font-size: 1.1em;"><a href="#Basics">1. Basics</a></div>
  <div style="padding-left: 25px;"><a href="#Create-tensors">Create tensors</a></div>
  <div style="padding-left: 25px;"><a href="#Indexing">Indexing</a></div>
  <div style="padding-left: 25px;"><a href="#Slicing">Slicing</a></div>
  <div style="padding-left: 25px;"><a href="#Data-type-and-tensor-type">Data type and tensor type</a></div>
  <div style="padding-left: 25px;"><a href="#Size-&-dimensions">Size & dimensions</a></div>
  <div style="padding-left: 25px;"><a href="#Reshape">Reshape</a></div>
  <div style="padding-left: 25px;"><a href="#Numpy-array-to/from-tensor">Numpy array to/from tensor</a></div>
  <div style="padding-left: 25px;"><a href="#Pandas-series/dataframe-to-tensor">Pandas series/dataframe to tensor</a></div>
  <div style="padding-left: 25px;"><a href="#Tensor-to-python-list">Tensor to python list</a></div>
  <div style="font-weight: bold; font-size: 1.1em;"><a href="#Tensor-Operations">2. Tensor Operations</a></div>
  <div style="padding-left: 25px;"><a href="#Tensor-addition">Tensor addition</a></div>
  <div style="padding-left: 25px;"><a href="#Tensor-multiplication">Tensor multiplication</a></div>
  <div style="padding-left: 25px;"><a href="#Dot-product">Dot product</a></div>
  <div style="padding-left: 25px;"><a href="#Derivatives">Derivatives</a></div>
  <div style="padding-left: 25px;"><a href="#Partial-derivatives">Partial derivatives</a></div>
  <div style="font-weight: bold; font-size: 1.1em;"><a href="#Tensor-Functions">3. Tensor Functions</a></div>
  <div style="padding-left: 25px;"><a href="#Mean-and-standard-deviation">Mean and standard deviation</a></div>
  <div style="padding-left: 25px;"><a href="#Max-and-min">Max and min</a></div>
  <div style="padding-left: 25px;"><a href="#Sin">Sin</a></div>
  <div style="padding-left: 25px;"><a href="#Linspace">Linspace</a></div>
  <div style="font-weight: bold; font-size: 1.1em;"><a href="#Transforms">4. Transforms</a></div>
  <div style="padding-left: 25px;"><a href="#Torchvision-transforms">Torchvision transforms</a></div>
  <div style="padding-left: 25px;"><a href="#Create-custom-transforms-via-subclassing">Create custom transforms via subclassing</a></div>
  <div style="padding-left: 25px;"><a href="#Compose-transforms">Compose transforms</a></div>
  <div style="font-weight: bold; font-size: 1.1em;"><a href="#Datasets">5. Datasets</a></div>
  <div style="padding-left: 25px;"><a href="#Pre-built-datasets">Pre-built datasets</a></div>
  <div style="padding-left: 50px;"><a href="#MNIST">MNIST</a></div>
  <div style="padding-left: 25px;"><a href="#Create-custom-datasets-via-subclassing">Create custom datasets via subclassing</a></div>
  <div style="padding-left: 50px;"><a href="#Simple-dataset">Simple dataset</a></div>
  <div style="padding-left: 50px;"><a href="#Image-dataset">Image dataset</a></div>
  <div style="font-weight: bold; font-size: 1.1em;"><a href="#Data-Loader">6. Data Loader</a></div>
  <div style="font-weight: bold; font-size: 1.1em;"><a href="#Building-blocks">7. Building blocks</a></div>
  <div style="padding-left: 25px;"><a href="#Layers">Layers</a></div>
  <div style="padding-left: 50px;"><a href="#Linear-layers">Linear layers</a></div>
  <div style="padding-left: 50px;"><a href="#Convolutional-layers">Convolutional layers</a></div>
  <div style="padding-left: 50px;"><a href="#Max-pooling-layers">Max pooling layers</a></div>
  <div style="padding-left: 50px;"><a href="#Regularisation:-dropout-layers">Regularisation: dropout layers</a></div>
  <div style="padding-left: 50px;"><a href="#Regularisation:-batch-normalisation">Regularisation: batch normalisation</a></div>
  <div style="padding-left: 25px;"><a href="#Activation-functions">Activation functions</a></div>
  <div style="padding-left: 25px;"><a href="#Weight-initialisation">Weight initialisation</a></div>
  <div style="font-weight: bold; font-size: 1.1em;"><a href="#Models">8. Models</a></div>
  <div style="padding-left: 25px;"><a href="#Linear-Regression">Linear Regression</a></div>
  <div style="padding-left: 50px;"><a href="#Basic-model">Basic model</a></div>
  <div style="padding-left: 50px;"><a href="#Custom-model">Custom model</a></div>
  <div style="padding-left: 25px;"><a href="#Logistic-Regression">Logistic Regression</a></div>
  <div style="padding-left: 50px;"><a href="#Basic-model">Basic model</a></div>
  <div style="padding-left: 50px;"><a href="#Custom-model">Custom model</a></div>
  <div style="padding-left: 25px;"><a href="#Softmax">Softmax</a></div>
  <div style="padding-left: 50px;"><a href="#Basic-model">Basic model</a></div>
  <div style="padding-left: 50px;"><a href="#Custom-model">Custom model</a></div>
  <div style="padding-left: 25px;"><a href="#Neural-Networks">Neural Networks</a></div>
  <div style="padding-left: 50px;"><a href="#Conventional-Fully-Connected-Neural-Network">Conventional Fully Connected Neural Network</a></div>
  <div style="padding-left: 50px;"><a href="#Convolutional-neural-network-(CNN)">Convolutional neural network (CNN)</a></div>
  <div style="padding-left: 25px;"><a href="#Pre-trained-ResNet">Pre-trained ResNet</a></div>
  <div style="font-weight: bold; font-size: 1.1em;"><a href="#Train-and-Test">9. Train and Test</a></div>
  <div style="padding-left: 25px;"><a href="#Train/test/val-split">Train/test/val split</a></div>
  <div style="padding-left: 25px;"><a href="#Cost-function">Cost function</a></div>
  <div style="padding-left: 25px;"><a href="#Optimiser">Optimiser</a></div>
  <div style="padding-left: 25px;"><a href="#Learning-rate-scheduling">Learning rate scheduling</a></div>
  <div style="padding-left: 25px;"><a href="#Define-training-procedure">Define training procedure</a></div>
  <div style="padding-left: 25px;"><a href="#Make-predictions">Make predictions</a></div>
  <div style="padding-left: 25px;"><a href="#Evaluate">Evaluate</a></div>
  <div style="padding-left: 50px;"><a href="#Accuracy">Accuracy</a></div>
  <div style="font-weight: bold; font-size: 1.1em;"><a href="#Other">10. Other</a></div>
  <div style="padding-left: 25px;"><a href="#Run-on-GPU">Run on GPU</a></div>
  <div style="padding-left: 25px;"><a href="#Save-and-load-PyTorch-models">Save and load PyTorch models</a></div>
  <hr>
</div>

In [None]:
# !pip -q install torch
# !pip -q install numpy
# !pip -q install pandas
# !pip -q install torchvision

import torch
import numpy as np
import pandas as pd

## Basics

### Create tensors

In [None]:
# Create a 1D tensor
t = torch.tensor([0, 1, 2, 3, 4])
t = torch.tensor([0.0, 1.0, 2.0, 3.0, 4.0])
t = torch.tensor([0.0, 1.0, 2.0, 3.0, 4.0], dtype=torch.int64)
t = torch.FloatTensor([0, 1, 2, 3, 4])

# Create a 2D tensor
t = torch.tensor([[11, 12, 13], [21, 22, 23], [31, 32, 33]])

### Indexing

In [None]:
t[0] # returns a tensor
t[1, 2] # returns a tensor with the value in the 2nd row 3rd column
t[1][2] # returns a tensor with the value in the 2nd row 3rd column
t[0].item() # returns a number

# Update value at index
t[0] = 100

### Slicing

In [None]:
t[1:4] # returns new tensor containing the values in t from index 1 to index 3

In [None]:
# tensor[row, col1:col2] SAME AS tensor[row][col1: col2]
t[0, 0:2] # returns new tensor containing the values in t from row 1, columns 1 & 2
t[0][0:2] # returns new tensor containing the values in t from row 1, columns 1 & 2

# tensor[row1:row2, col] NOT THE SAME AS tensor[row1:row2][col]
t[1:3, 1] #this is the correct way 
t[1:3][1]

# Update values at slice
t[3:5] = torch.tensor([300.0, 400.0])
t[1:3] = 200 # Change the values on index 1 and index 2 to the same number

### Data type and tensor type

In [None]:
# Find data type
t.dtype

# Find tensor type
t.type()

# Redefine tensor type
t = t.type(torch.FloatTensor)

### Size & dimensions

In [None]:
# Size
t.size()
t.shape

# Dimensions
t.ndimension()

# Number of elements
t.numel()

### Reshape 

In [None]:
# Reshape tensor
t = t.view(-1,1) # -1: infer number of rows, 1: number of columns

### Numpy array to/from tensor

In [None]:
# numpy_array = np.array([0.0, 1.0, 2.0, 3.0, 4.0]) #1D
numpy_array = np.array([[0.0, 1.0, 2.0, 3.0, 4.0], [5.0, 6.0, 7.0, 8.0, 9.0]]) #2D

new_tensor = torch.from_numpy(numpy_array)
back_to_numpy = new_tensor.numpy()

<code>back_to_numpy</code> and <code>new_tensor</code> still point to <code>numpy_array</code>. As a result if we change <code>numpy_array</code> both <code>back_to_numpy</code> and <code>new_tensor</code> will change. For example if we set all the elements in <code>numpy_array</code> to zeros, <code>back_to_numpy</code> and <code>new_tensor</code> will follow suit.

### Pandas series/dataframe to tensor

In [None]:
# 1D
pandas_series=pd.Series([0.1, 2, 0.3, 10.1])
new_tensor=torch.from_numpy(pandas_series.values)

# 2D
pandas_dataframe=pd.DataFrame({'a':[11,21,31],'b':[12,22,312]})
new_tensor=torch.from_numpy(pandas_dataframe.values)

### Tensor to python list

In [None]:
new_list=t.tolist()

## Tensor Operations

In [None]:
# Sample tensors
u = torch.tensor([[3, 2], [5, 1]])
v = torch.tensor([[4, 1], [3, 6]])
print(u)
print(v)

### Tensor addition

In [None]:
# Tensor + scalar
u+5

In [None]:
# Addition between two tensors
u + v

### Tensor multiplication

In [None]:
# Tensor * scalar
u * 3

In [None]:
# Element-wise Product/Hadamard Product
u * v

In [None]:
# Matrix multiplication
torch.mm(u, v)

### Dot product

In [None]:
torch.dot(u,v)

### Derivatives

In [None]:
# Create a tensor x with the parameter 'requires_grad' set to 'True'
x = torch.tensor(4.0, requires_grad = True)
x = torch.linspace(-10, 10, 10, requires_grad = True)

# Create a tensor y which specifies a certain function
Y = x ** 2 + 2 * x + 1

# Take the derivative
y = torch.sum(Y) #When calculating the derivative with respect to a function with multiple values, you can use the sum trick to produce a scalar valued function and then take the gradient 
y.backward()

# Now we can access the derivative at x values
dY_dx = x.grad

In [None]:
print('data:',x.data)
print('grad_fn:',x.grad_fn)
print('grad:',x.grad)
print("is_leaf:",x.is_leaf)
print("requires_grad:",x.requires_grad)

print('\ndata:',y.data)
print('grad_fn:',y.grad_fn)
print('grad:',y.grad)
print("is_leaf:",y.is_leaf)
print("requires_grad:",y.requires_grad)

### Partial derivatives

In [None]:
# Create tensors with the parameter 'requires_grad' set to 'True'
u = torch.tensor([1.0, 2.0, 3.0],requires_grad=True)
v = torch.tensor([2.0, 3.0, 4.0],requires_grad=True)

# Create a tensor f which specifies the function
F = u * v + u ** 2

# Take the derivative
f = torch.sum(F) #When calculating the derivative with respect to a function with multiple values, you can use the sum trick to produce a scalar valued function and then take the gradient 
f.backward()

# Now we can access the derivative with respect to u and v
dF_du = u.grad #partial derivative with respect to u
dF_dv = v.grad #partial derivative with respect to u

## Tensor Functions

### Mean and standard deviation

In [None]:
t.mean()
t.std()

### Max and min

In [None]:
t.max()
t.min()

### Sin

In [None]:
pi_tensor = torch.tensor([0, np.pi/2, np.pi])
torch.sin(pi_tensor)

### Linspace

In [None]:
torch.linspace(-2, 2, steps = 5)

## Transforms

### Torchvision transforms

In [None]:
transforms.Resize((new_height, new_width)) # resize image
transforms.CenterCrop(20) # crop image
transforms.ToTensor() # convert image to a tensor
transforms.RandomVerticalFlip(p=1) # vertically flip the given image randomly with a given probability (p)
transforms.RandomHorizontalFlip(p = 1) # horizontally flip the given image randomly with a given probability (p)

### Create custom transforms via subclassing

In [None]:
class add_mult(object):
    
    # Constructor
    def __init__(self, addx = 1, muly = 2):
        self.addx = addx
        self.muly = muly
    
    # Executor
    def __call__(self, sample):
        x = sample[0]
        y = sample[1]
        x = x + self.addx
        y = y * self.muly
        sample = x, y
        return sample

class mult(object):
    
    # Constructor
    def __init__(self, mult = 100):
        self.mult = mult
        
    # Executor
    def __call__(self, sample):
        x = sample[0]
        y = sample[1]
        x = x * self.mult
        y = y * self.mult
        sample = x, y
        return sample
    
a_m = add_mult()

for i in range(2):
    x, y = dataset[i]
    print('Index: ', i, 'Original x: ', x, 'Original y: ', y)
    x_, y_ = a_m(dataset[i])
    print('Index: ', i, 'Transformed x_:', x_, 'Transformed y_:', y_)   
    
dataset_tr = custom_dataset(transform=a_m)
print(' \n')
for i in range(2):
    x, y = dataset[i]
    print('Index: ', i, 'Original x: ', x, 'Original y: ', y)
    x_, y_ = dataset_tr[i]
    print('Index: ', i, 'Transformed x_:', x_, 'Transformed y_:', y_)

### Compose transforms

In [None]:
from torchvision import transforms

data_transform = transforms.Compose([add_mult(), mult()])

x,y=dataset[0]
x_,y_=data_transform(dataset[0])

print( 'Original x: ', x, 'Original y: ', y)
print( 'Transformed x_:', x_, 'Transformed y_:', y_)

## Datasets

### Pre-built datasets

#### MNIST

In [None]:
import torchvision.datasets as dsets
train_dataset = dsets.MNIST(root='./data', train=True, download=True, transform=transforms.ToTensor())
validation_dataset = dsets.MNIST(root='./data', download=True, transform=transforms.ToTensor())

### Create custom datasets via subclassing
#### Simple dataset

In [None]:
from torch.utils.data import Dataset

class Dataset(Dataset):
    
    # Constructor with defult values 
    def __init__(self, length = 100, transform = None):
        self.len = length
        self.x = 2 * torch.ones(length, 2)
        self.y = torch.ones(length, 1)
        self.transform = transform
     
    # Getter (called when indexing dataset, e.g. custom_dataset[0])
    def __getitem__(self, index):
        sample = self.x[index], self.y[index]
        if self.transform:
            sample = self.transform(sample)     
        return sample
    
    # Get Length (executed when calling len(custom_dataset))
    def __len__(self):
        return self.len
    
dataset = Dataset()
print(dataset)

In [None]:
# Indexing (if a transform is defined when the dataset was created, this is applied automatically)
x,y = dataset[0]

In [None]:
# Return dataset length
len(dataset)

In [None]:
# iterating through the dataset
for i in range(3):
    x, y=dataset[i]
    print("index: ", i, '; x:', x, '; y:', y)
    
# for x,y in dataset:
#     print(' x:', x, 'y:', y)

#### Image dataset

In [None]:
from torch.utils.data import Dataset

class Dataset(Dataset):

    # Constructor
    def __init__(self, csv_file, data_dir, transform=None):
        
        # Image directory
        self.data_dir=data_dir
        
        # The transform is goint to be used on image
        self.transform = transform
        data_dircsv_file=os.path.join(self.data_dir,csv_file)
        # Load the CSV file contians image info
        self.data_name= pd.read_csv(data_dircsv_file)
        
        # Number of images in dataset
        self.len=self.data_name.shape[0] 
    
    # Get the length
    def __len__(self):
        return self.len
    
    # Getter
    def __getitem__(self, idx):
        
        # Image file path
        img_name=os.path.join(self.data_dir,self.data_name.iloc[idx, 1])
        # Open image file
        image = Image.open(img_name)
        
        # The class label for the image
        y = self.data_name.iloc[idx, 0]
        
        # If there is any transform method, apply it onto the image
        if self.transform:
            image = self.transform(image)

        return image, y
    
dataset = Dataset(csv_file=csv_file, data_dir=directory)
image=dataset[0][0]
y=dataset[0][1]

## Data Loader

In [None]:
from torch.utils.data import DataLoader

# Create the DataLoader
# bath size determines how many samples are used in each optimisation step
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=2000, shuffle=True)
validation_loader = torch.utils.data.DataLoader(dataset=validation_dataset, batch_size=5000, shuffle=False) 

## Building blocks
### Layers

In [None]:
import torch
from torch import nn

#### Linear layers

In [None]:
nn.Linear(input_size, output_size, bias)

#### Convolutional layers

**in_channels**: a unique convolution is applied to each input channel. The result from each input channel is then added together.

**out_channels:** when specifying multiple output channels, multiple convolutions with unique kernels are performed.

After the convolution, the size of the image is [batch_size, out_channels, new_image_rows, new_image_columns]

**stride:** number of shifts the kernel moves per iteration 

**padding:**
- adds rows and columns of zeros around the image
- can hold information about the borders
- keeps the image at a reasonable size since it will shrink with each convolution

In [None]:
nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3, stride = 2, padding = 1) 

#### Max pooling layers

By default, stride = None, i.e. stride = kernel_size

In [None]:
nn.MaxPool2d(kernel_size=2,stride=1) 

#### Regularisation: dropout layers

Dropout is a widely used regularisation technique to improve model generalisation and prevent overfitting. Overfitting occurs when a model learns the training data too well, including noise and irrelevant details. Dropout temporarily deactivates a random subset of neurons in a layer during training. The dropout rate, typically between 0.2 and 0.5, determines the proportion of neurons that are deactivated. This process forces the network to learn more robust features and reduces its reliance on specific neurons or pathways.

Used for:

- dense layers of fully connected networks, especially in models handling image or text data.
- models with a large number of parameters, like deep neural networks, which are more prone to overfitting 
- when training data is scarce, dropout encourages the network to learn diverse feature representations.

In [None]:
nn.Dropout(p=dropout_rate)

#### Regularisation: batch normalisation

In [None]:
nn.BatchNorm1d(n_nodes)

### Activation functions

In [None]:
import torch
from torch import nn

# classes which you can use when building a model with the Sequential method, or you can instantiate them to get a function object
nn.Sigmoid()
nn.Tanh()
nn.ReLU()

# functions
torch.sigmoid()
torch.tanh()
torch.relu()

### Weight initialisation

In [None]:
from torch import nn

# Xavier initialization
nn.init.xavier_uniform_(linear.weight) # good for tanh activation functions

# He initialization
nn.init.kaiming_uniform_(linear.weight, nonlinearity='relu') # good for relu activation functions

## Models

### Linear Regression

#### Basic model

In [None]:
from torch import nn
input_size=2
output_size=1

model = nn.Linear(input_size, output_size, bias=True)
print("Python dictionary: ",model.state_dict()) #weights are randomly initialised

#### Custom model

In [None]:
from torch import nn

class linear_regression(nn.Module):
    
    # Constructor
    def __init__(self, input_size, output_size):
        super(linear_regression, self).__init__()
        self.linear = nn.Linear(input_size, output_size)
    
    # Prediction function
    def forward(self, x):
        yhat = self.linear(x)
        return yhat
    
input_size=2
output_size=1
model = linear_regression(input_size, output_size)
print("Python dictionary: ",model.state_dict()) #weights are randomly initialised

### Logistic Regression
#### Basic model

In [None]:
from torch.nn import Linear, Sequential, Sigmoid

input_size=3

model = Sequential(Linear(input_size, 1), Sigmoid())
print("Python dictionary: ",model.state_dict()) #weights are randomly initialised

#### Custom model

In [None]:
from torch import nn

class logistic_regression(nn.Module):
    
    # Constructor
    def __init__(self, input_size):
        super(logistic_regression, self).__init__()
        self.linear = nn.Linear(input_size, 1)
    
    # Prediction
    def forward(self, x):
        yhat = torch.sigmoid(self.linear(x))
        return yhat
    
input_size=2
model = logistic_regression(input_size)
print("Python dictionary: ",model.state_dict()) #weights are randomly initialised

### Softmax
Similar to linear regression, where the magnitude of each output determines which category a sample is assigned to. 
with multiple outputs (i.e. for non-binary classification).

#### Basic model

In [None]:
from torch.nn import Linear, Sequential

input_size=1
output_size=3

model = nn.Sequential(nn.Linear(input_size, output_size))

#### Custom model

In [None]:
from torch import nn

class SoftMax(nn.Module):
    
    # Constructor
    def __init__(self, input_size, output_size):
        super(SoftMax, self).__init__()
        self.linear = nn.Linear(input_size, output_size)
        
    # Prediction
    def forward(self, x):
        z = self.linear(x)
        return z

input_size = 5
num_classes = 10
model = SoftMax(input_size, num_classes)
print("Python dictionary: ",model.state_dict()) #weights are randomly initialised

### Neural Networks

#### Conventional Fully Connected Neural Network 


In [None]:
from torch import nn

input_size = 1
output_size = 1
hidden_neurons = 9

model= torch.nn.Sequential(
    torch.nn.Linear(input_size, hidden_neurons), 
    torch.nn.Sigmoid(), #OR: torch.nn.Tanh(), torch.nn.ReLU()
    torch.nn.Linear(hidden_neurons,output_size),
    
    # For binary classification: 
    torch.nn.Sigmoid() 
)

In [None]:
from torch import nn

class Net(nn.Module):

    # Constructor
    def __init__(self, Layers,p=0):
        super(Net, self).__init__()
        
        #Optional: drop out (no drop out if p=0)
        self.drop = nn.Dropout(p=p)
        
        self.hidden = nn.ModuleList()
        self.batchnorm = nn.ModuleList()
        
        for input_size, output_size in zip(Layers, Layers[1:]):
            linear = nn.Linear(input_size, output_size)
            self.hidden.append(linear)
            
            #Optional: custom weight initialisation
            nn.init.xavier_uniform_(linear.weight) # good for tanh activation
            nn.init.kaiming_uniform_(linear.weight, nonlinearity='relu') # good for relu activation
            
            #Optional: batch normalisation
            bn = nn.BatchNorm1d(output_size)
            self.batchnorm.append(bn)
             
    # Prediction
    def forward(self, activation):
        L = len(self.hidden)
        for (l, linear_transform, batch_norm) in zip(range(L), self.hidden):
            
            if l < L - 1: # Hidden layer activation function
                activation = torch.relu(self.drop(linear_transform(activation)))
                # activation = torch.tanh(self.drop(linear_transform(activation)))
                # activation = torch.sigmoid(self.drop(linear_transform(activation)))
                
                # if using batch normalisation:
                activation = batch_norm(activation)
        
            else: # Output layer activation function
                
                # For binary classification:
                activation = torch.sigmoid(linear_transform(activation)) #sigmoid function applied to output layer to get binary probabilities
                
                # For regression / multi-class classification: 
                activation = linear_transform(activation)
                
        return activation

Layers = [2, 50, 3] # first element: size of the input layer, last element: size of output layer, central elements: number of neurons in hidden layers
model = Net(Layers, p=0.5)

#### Convolutional neural network (CNN)

In [None]:
class CNN(nn.Module):
    
    # Contructor
    def __init__(self, out_1=16, out_2=32):
        super(CNN, self).__init__()
        
        #first Convolutional layers 
        self.cnn1 = nn.Conv2d(in_channels=1, out_channels=out_1, kernel_size=5, padding=2)
        #batch normalisation 
        self.conv1_bn = nn.BatchNorm2d(out_1)
        #max pooling 
        self.maxpool1=nn.MaxPool2d(kernel_size=2)
        #second Convolutional layers
        self.cnn2 = nn.Conv2d(in_channels=out_1, out_channels=out_2, kernel_size=5, stride=1, padding=2)
        #batch normalisation 
        self.conv2_bn = nn.BatchNorm2d(out_2)
        #max pooling 
        self.maxpool2=nn.MaxPool2d(kernel_size=2)
        #fully connected layer 
        self.fc1 = nn.Linear(out_2 * 4 * 4, 10)
        #batch normalisation 
        self.bn_fc1 = nn.BatchNorm1d(10)
    
    # Prediction
    def forward(self, x):
        x = self.cnn1(x)
        x=self.conv1_bn(x)
        x = torch.relu(x)
        x = self.maxpool1(x)
        x = self.cnn2(x)
        x=self.conv2_bn(x)
        x = torch.relu(x)
        x = self.maxpool2(x)
        x = x.view(x.size(0), -1)
        x = self.fc1(x)
        x=self.bn_fc1(x)
        return x

out_1 = 16
out_2 = 32

model = CNN(out_1, out_2)

### Pre-trained ResNet

In [None]:
from torchvision import models
from torch import optim

model = models.resnet18(pretrained = True)

for param in model.parameters():
    param.requires_grad = False
    
model.fc = nn.Linear(512,7)

# specify that the optimizer should only use parameters where the grad attribute is true 
optimizer = optim.Adam([parameters for parameters in model.parameters() if parameters.requires_grad], lr=0.003)

## Train and Test

### Train/test/val split

In [None]:
# Determine the number of samples to be used for training and validation (5% for validation).
num_train = int(len(train_dataset) * 0.95)

# Randomly split the training dataset into training and validation datasets using `random_split`.
# The training dataset will contain 95% of the samples, and the validation dataset will contain the remaining 5%.
split_train_, split_valid_ = random_split(train_dataset, [num_train, len(train_dataset) - num_train])

### Cost function

In [None]:
from torch import nn

# Mean-sqaured error
criterion = nn.MSELoss()

# Binary cross entropy loss (good for classification)
criterion = nn.BCELoss()

# Cross entropy loss (good for multi-class classification)
criterion = nn.CrossEntropyLoss()

### Optimiser

In [None]:
from torch import optim
optimizer = optim.SGD(model.parameters(), lr = 0.01, momentum = 0.1) # with momentum you can avoid getting stuck at saddle points or local minima
optimizer = optim.Adam(model.parameters(), lr = 0.001)

### Learning rate scheduling

In [None]:
# adjusts the learning rate during training, reducing it by a factor (gamma) after every epoch (step) to improve convergence and fine-tune the model's performance.
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 1.0, gamma=0.1)

### Define training procedure

In [None]:
def train(model, criterion, train_loader, validation_loader, optimizer, epochs=100):
    
    cost_list=[]
    accuracy_list=[]
    
    for epoch in range(epochs):
        COST=0
        model.train()
        for x, y in train_loader:
            optimizer.zero_grad()
            z = model(x.view(-1, 28 * 28))
            loss = criterion(z, y)
            loss.backward()
            optimizer.step()
            COST+=loss.data
        cost_list.append(COST)
        
        correct = 0
        model.eval()
        for x, y in validation_loader:
            z = model(x.view(-1, 28 * 28))
            _, label = torch.max(z, 1)
            correct += (label == y).sum().item()
    
        accuracy = 100 * (correct / len(validation_dataset))
        accuracy_list.append(accuracy)
    
    return cost_list, accuracy_list

### Make predictions

In [None]:
from torch.nn import Softmax

# Regression
yhat = model(x)

# Binary classification
z =  model(data_set.x)
yhat = (z[:,0] > 0.5)

# Multi-class classification
Softmax_fn = Softmax(dim=-1)
z =  model(data_set.x)
probability = Softmax_fn(z)
_, yhat = z.max(1)

### Evaluate

#### Accuracy

In [None]:
# Binary classification
accuracy = (data_set.y == yhat).mean()

# Multi-class classification
correct = (data_set.y == yhat).sum().item()
accuracy = correct / len(data_set)

## Other

### Run on GPU

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

### Save and load PyTorch models

In [None]:
torch.save(model.state_dict(), 'my_model.pth')
model.load_state_dict(torch.load('my_model.pth'))