# Chapter 1

## PyTorch and object-oriented programming

In [1]:
# Object-Oriented Programming (OOP)
class BankAccount:
    def __init__(self, balance):
        self.balance = balance

# __init__ is called when BankAccount object is created
# balance is the attribute of the BankAccount object
account = BankAccount(100)
print(account.balance)

100


In [3]:
# Object-Oriented Programming (OOP)
# Methods: Python functions to perform tasks
class BankAccount:
    # deposit method increases balance
    def __init__(self, balance):
        self.balance = balance
        
    def deposit(self, amount):
        self.balance += amount

account = BankAccount(100)
account.deposit(50)
print(account.balance)
# 150

150


In [13]:
import torch

In [24]:
torch.cuda.is_available()

True

In [25]:
torch.cuda.current_device()

0

In [26]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [27]:
torch.set_default_device(device)

In [28]:
# Water potability dataset
# PyTorch Dataset
import pandas as pd
from torch.utils.data import Dataset
class WaterDataset(Dataset):
    # init: load data, store as numpy array
    # super().__init__() ensures
    # WaterDataset behaves like torch Dataset
    def __init__(self, csv_path):
        super().__init__()
        df = pd.read_csv(csv_path)
        self.data = df.to_numpy()
        
    # len: return the size of the dataset
    def __len__(self):
        return self.data.shape[0]
        
    # getitem: take one argument called idx
    # and return features and label for a single sample at index idx
    def __getitem__(self, idx):
        features = self.data[idx, :-1]
        label = self.data[idx, -1]
        return features, label


In [29]:
# PyTorch DataLoader
dataset_train = WaterDataset(
    "./data/water_potability/water_train.csv"
)

In [36]:
from torch.utils.data import DataLoader
dataloader_train = DataLoader(
    dataset_train,
    batch_size=2,
    shuffle=True,
    generator=torch.Generator(device=device),
)

In [37]:
features, labels = next(iter(dataloader_train))
print(f"Features: {features},\nLabels: {labels}")

Features: tensor([[0.7486, 0.5194, 0.2412, 0.5090, 0.7967, 0.2523, 0.5996, 0.5558, 0.4704],
        [0.4123, 0.4281, 0.4803, 0.6372, 0.5199, 0.3339, 0.4732, 0.4508, 0.6655]],
       device='cuda:0', dtype=torch.float64),
Labels: tensor([1., 0.], device='cuda:0', dtype=torch.float64)


In [38]:
# PyTorch Model
# Sequential model definition:
import torch.nn as nn

net = nn.Sequential(
    nn.Linear(9, 16),
    nn.ReLU(),
    nn.Linear(16, 8),
    nn.ReLU(),
    nn.Linear(8, 1),
    nn.Sigmoid(),
)

# Class-based model definition:
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(9, 16, dtype=torch.float64)
        self.fc2 = nn.Linear(16, 8, dtype=torch.float64)
        self.fc3 = nn.Linear(8, 1, dtype=torch.float64)

    def forward(self, x):
        x = nn.functional.relu(self.fc1(x))
        x = nn.functional.relu(self.fc2(x))
        x = nn.functional.sigmoid(self.fc3(x))
        return x

net = Net()

In [39]:
# net.to(device)

In [40]:
next(net.parameters()).is_cuda

True

## Optimizers, training, and evaluation

In [41]:
# Training loop
import torch.nn as nn
import torch.optim as optim
criterion = nn.BCELoss()
optimizer = optim.SGD(net.parameters(), lr=0.01)

for epoch in range(1000):
    for features, labels in dataloader_train:
        optimizer.zero_grad()
        outputs = net(features)
        loss = criterion(
            outputs, labels.view(-1, 1)
        )
        loss.backward()
        optimizer.step()

### Optimizers

#### Stochastic Gradient Descent (SGD)
```
optimizer = optim.SGD(net.parameters(), lr=0.01)
```

* Update depends on learning rate
* Simple and efficient, for basic models
* Rarely used in practice

#### Adaptive Gradient (Adagrad)
```
optimizer = optim.Adagrad(net.parameters(), lr=0.01)
```

* Adapts learning rate for each parameter
* Good for sparse data
* May decrease the learning rate too fast

#### Root Mean Square Propagation (RMSprop)
```
optimizer = optim.RMSprop(net.parameters(), lr=0.01)
```

* Update for each parameter based on the size of its previous gradients

#### Adaptive Moment Estimation (Adam)
```
optimizer = optim.Adam(net.parameters(), lr=0.01)
```

* Arguably the most versatile and widely used
* RMSprop + gradient momentum
* Often used as the go-to optimizer


In [44]:
# PyTorch DataLoader
dataset_test = WaterDataset(
    "./data/water_potability/water_test.csv"
)
from torch.utils.data import DataLoader
dataloader_test = DataLoader(
    dataset_test,
    batch_size=2,
    shuffle=True,
generator=torch.Generator(device=device),
)

In [45]:
# Model evaluation
# Set up accuracy metric

# Put model in eval mode and iterate over
# test data batches with no gradients

# Pass data to model to get predicted
# probabilities
# Compute predicted labels
# Update accuracy metric

from torchmetrics import Accuracy

acc = Accuracy(task="binary")
net.eval()
with torch.no_grad():
    for features, labels in dataloader_test:
        outputs = net(features)
        preds = (outputs >= 0.5).float()
        acc(preds, labels.view(-1, 1))

accuracy = acc.compute()
print(f"Accuracy: {accuracy}")

Accuracy: 0.6163021922111511


## Vanishing and exploding gradients

In [None]:

Vanishing gradients
Gradients get smaller and smaller during
backward pass
Earlier layers get small parameter updates
Model doesn't learn

Exploding gradients
Gradients get bigger and bigger
Parameter updates are too large
Training diverges

Solution to unstable gradients
1. Proper weights initialization
2. Good activations
3. Batch normalization

In [None]:
Weights initialization
layer = nn.Linear(8, 1)
print(layer.weight)
Parameter containing:
tensor([[-0.0195,

0.0992,

0.0391,

-0.3386, -0.1892, -0.3170,

0.0212,
0.2148]])

In [None]:
Weights initialization
Good initialization ensures:
Variance of layer inputs = variance of layer outputs
Variance of gradients the same before and after a layer

How to achieve this depends on the activation:
For ReLU and similar, we can use He/Kaiming initialization

In [None]:
Weights initialization
import torch.nn.init as init
init.kaiming_uniform_(layer.weight)
print(layer.weight)
Parameter containing:
tensor([[-0.3063, -0.2410,

0.0588,

0.2664,

0.0502, -0.0136,

0.2274,

0.0901]])

In [None]:
He / Kaiming initialization
init.kaiming_uniform_(self.fc1.weight)
init.kaiming_uniform_(self.fc2.weight)
init.kaiming_uniform_(
self.fc3.weight,
nonlinearity="sigmoid",
)

In [None]:
# He / Kaiming initialization
import torch.nn as nn
import torch.nn.init as init
class Net(nn.Module):
def __init__(self):

def forward(self, x):

super().__init__()

x = nn.functional.relu(self.fc1(x))

self.fc1 = nn.Linear(9, 16)

x = nn.functional.relu(self.fc2(x))

self.fc2 = nn.Linear(16, 8)

x = nn.functional.sigmoid(self.fc3(x))

self.fc3 = nn.Linear(8, 1)

return x

init.kaiming_uniform_(self.fc1.weight)
init.kaiming_uniform_(self.fc2.weight)
init.kaiming_uniform_(
self.fc3.weight,
nonlinearity="sigmoid",
)

In [None]:
Activation functions

Often used as the default activation

nn.functional.elu()

nn.functional.relu()

Non-zero gradients for negative values helps against dying neurons

Zero for negative inputs - dying neurons

Average output around zero - helps against
vanishing gradients

In [None]:
Batch normalization
After a layer:
1. Normalize the layer's outputs by:
Subtracting the mean
Dividing by the standard deviation
2. Scale and shift normalized outputs using learned parameters
Model learns optimal inputs distribution for each layer:
Faster loss decrease
Helps against unstable gradients

Batch normalization
class Net(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(9, 16)
self.bn1 = nn.BatchNorm1d(16)
...
def forward(self, x):
x = self.fc1(x)
x = self.bn1(x)
x = nn.functional.elu(x)


# Chapter 2

In [None]:
Handling images
with PyTorch
I N T E R M E D I AT E D E E P L E A R N I N G W I T H P Y T O R C H

Michal Oleszak
Machine Learning Engineer

Clouds dataset

1 https://www.kaggle.com/competitions/cloud-type-classification2/data

INTERMEDIATE DEEP LEARNING WITH PYTORCH

What is an image?
Image consists of pixels ("picture elements")
Each pixel contains color information
Grayscale images: integer in 0 - 255
30:

Color images: three integers, one for each
color channel (Red, Green, Blue)
RGB = (52, 171, 235):

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Loading images to PyTorch
Desired directory structure:
clouds_train
- cumulus
- 75cbf18.jpg
- ...
- cumulonimbus
- ...
clouds_test

Main folders: clouds_train and
clouds_test

Inside each main folder: one folder per
category
Inside each class folder: image files

- cumulus
- cumulonimbus
- ...

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Loading images to PyTorch
from torchvision.datasets import ImageFolder
from torchvision import transforms
train_transforms = transforms.Compose([
transforms.ToTensor(),
transforms.Resize((128, 128)),
])
dataset_train = ImageFolder(

Define transformations:
Parse to tensor
Resize to 128×128
Create dataset passing:
Path to data
Predefined transformations

"data/clouds_train",
transform=train_transforms,
)

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Displaying images
dataloader_train = DataLoader(

import matplotlib.pyplot as plt

dataset_train,

plt.imshow(image)

shuffle=True,

plt.show()

batch_size=1,
)
image, label = next(iter(dataloader_train))
print(image.shape)

torch.Size([1, 3, 128, 128])

image = image.squeeze().permute(1, 2, 0)
print(image.shape)

torch.Size([128, 128, 3])

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Data augmentation
train_transforms = transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(45),
transforms.ToTensor(),
transforms.Resize((128, 128)),
])
dataset_train = ImageFolder(
"data/clouds/train",
transform=train_transforms,

Data augmentation: Generating more data by
applying random transformations to original
images
Increase the size and diversity of the
training set
Improve model robustness
Reduce overfitting

)

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Let's practice!
I N T E R M E D I AT E D E E P L E A R N I N G W I T H P Y T O R C H

Convolutional
Neural Networks
I N T E R M E D I AT E D E E P L E A R N I N G W I T H P Y T O R C H

Michal Oleszak
Machine Learning Engineer

Why not use linear layers?

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Why not use linear layers?

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Why not use linear layers?

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Why not use linear layers?

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Why not use linear layers?

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Why not use linear layers?
Linear layers:
Slow training
Overfitting
Don't recognize spatial patterns
A better alternative: convolutional layers!

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Convolutional layer
Slide filter(s) of parameters over the input
At each position, perform convolution
Resulting feature map:
Preservers spatial patterns from input
Uses fewer parameters than linear layer
One filter = one feature map
Apply activations to feature maps
All feature maps combined form the output
nn.Conv2d(3, 32, kernel_size=3)

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Convolution
1. Compute dot product of input patch and
filter
Top-left field: 2 × 1 = 2
2. Sum the result

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Zero-padding
Add a frames of zeros to convolutional
layer's input
nn.Conv2d(
3, 32, kernel_size=3, padding=1
)

Maintains spatial dimensions of the input
and output tensors
Ensures border pixels are treated equally to
others

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Max Pooling
Slide non-overlapping window over input
At each position, retain only the maximum
value
Used after convolutional layers to reduce
spatial dimensions
nn.MaxPool2d(kernel_size=2)

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Convolutional Neural Network
class Net(nn.Module):
def __init__(self, num_classes):
super().__init__()
self.feature_extractor = nn.Sequential(
nn.Conv2d(3, 32, kernel_size=3, padding=1),
nn.ELU(),
nn.MaxPool2d(kernel_size=2),
nn.Conv2d(32, 64, kernel_size=3, padding=1),
nn.ELU(),
nn.MaxPool2d(kernel_size=2),

feature_extractor : (convolution,

activation, pooling), repeated twice and
flattened
classifier : single linear layer
forward() : pass input image through

feature extractor and classifier

nn.Flatten(),
)
self.classifier = nn.Linear(64*16*16, num_classes)
def forward(self, x):
x = self.feature_extractor(x)
x = self.classifier(x)
return x

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Feature extractor output size
self.feature_extractor = nn.Sequential(
nn.Conv2d(3, 32, kernel_size=3, padding=1),
nn.ELU(),
nn.MaxPool2d(kernel_size=2),
nn.Conv2d(32, 64, kernel_size=3, padding=1),
nn.ELU(),
nn.MaxPool2d(kernel_size=2),
nn.Flatten(),
)
self.classifier = nn.Linear(64*16*16, num_classes)
`

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Feature extractor output size
self.feature_extractor = nn.Sequential(
nn.Conv2d(3, 32, kernel_size=3, padding=1),
nn.ELU(),
nn.MaxPool2d(kernel_size=2),
nn.Conv2d(32, 64, kernel_size=3, padding=1),
nn.ELU(),
nn.MaxPool2d(kernel_size=2),
nn.Flatten(),
)
self.classifier = nn.Linear(64*16*16, num_classes)
`

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Feature extractor output size
self.feature_extractor = nn.Sequential(
nn.Conv2d(3, 32, kernel_size=3, padding=1),
nn.ELU(),
nn.MaxPool2d(kernel_size=2),
nn.Conv2d(32, 64, kernel_size=3, padding=1),
nn.ELU(),
nn.MaxPool2d(kernel_size=2),
nn.Flatten(),
)
self.classifier = nn.Linear(64*16*16, num_classes)
`

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Feature extractor output size
self.feature_extractor = nn.Sequential(
nn.Conv2d(3, 32, kernel_size=3, padding=1),
nn.ELU(),
nn.MaxPool2d(kernel_size=2),
nn.Conv2d(32, 64, kernel_size=3, padding=1),
nn.ELU(),
nn.MaxPool2d(kernel_size=2),
nn.Flatten(),
)
self.classifier = nn.Linear(64*16*16, num_classes)
`

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Feature extractor output size
self.feature_extractor = nn.Sequential(
nn.Conv2d(3, 32, kernel_size=3, padding=1),
nn.ELU(),
nn.MaxPool2d(kernel_size=2),
nn.Conv2d(32, 64, kernel_size=3, padding=1),
nn.ELU(),
nn.MaxPool2d(kernel_size=2),
nn.Flatten(),
)
self.classifier = nn.Linear(64*16*16, num_classes)
`

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Let's practice!
I N T E R M E D I AT E D E E P L E A R N I N G W I T H P Y T O R C H

Training image
classifiers
I N T E R M E D I AT E D E E P L E A R N I N G W I T H P Y T O R C H

Michal Oleszak
Machine Learning Engineer

Data augmentation revisited

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Data augmentation revisited

INTERMEDIATE DEEP LEARNING WITH PYTORCH

What should not be augmented

INTERMEDIATE DEEP LEARNING WITH PYTORCH

What should not be augmented

INTERMEDIATE DEEP LEARNING WITH PYTORCH

What should not be augmented
Augmentations can impact the label
Whether this is confusing depends on the
task
Always choose augmentations with the
data and task in mind!

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Augmentations for cloud classification
Random rotation: expose model to different
angles of cloud formations
Horizontal flip: simulate different
viewpoints of the sky
Auto contrast adjustment: simulate
different lighting conditions
train_transforms = transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(45),
transforms.RandomAutocontrast(),
transforms.ToTensor(),
transforms.Resize((128, 128))
])

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Cross-Entropy loss
Binary classification: binary cross-entropy (BCE) loss
Multi-class classification: cross-entropy loss
criterion = nn.CrossEntropyLoss()

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Image classifier training loop
net = Net(num_classes=7)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.001)
for epoch in range(10):
for images, labels in dataloader_train:
optimizer.zero_grad()
outputs = net(images)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Let's practice!
I N T E R M E D I AT E D E E P L E A R N I N G W I T H P Y T O R C H

Evaluating image
classifiers
I N T E R M E D I AT E D E E P L E A R N I N G W I T H P Y T O R C H

Michal Oleszak
Machine Learning Engineer

Data augmentation at test time
Data augmentation for training data:

Data augmentation for test data:

train_transforms = transforms.Compose([

test_transforms = transforms.Compose([

transforms.RandomHorizontalFlip(),

#

transforms.RandomRotation(45),

# NO DATA AUGMENTATION AT TEST TIME

transforms.RandomAutocontrast(),

#

transforms.ToTensor(),

transforms.ToTensor(),

transforms.Resize((64, 64)),

transforms.Resize((64, 64)),

])

])

dataset_train = ImageFolder(

dataset_test = ImageFolder(

)

"clouds_train",

"clouds_test",

transform=train_transforms,

transform=test_transforms,
)

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Precision & Recall: binary classification
In binary classification:
Precision: Fraction of correct positive predictions
Recall: Fraction of all positive examples correctly predicted

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Precision & Recall: multi-class classification
In multi-class classification: separate precision and recall for each class
Precision: Fraction of cumulus-predictions that were correct
Recall: Fraction of all cumulus examples correctly predicted

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Averaging multi-class metrics
With 7 classes, we have 7 precision and 7 recall scores
We can analyze them per-class, or aggregate:
Micro average: global calculation
Macro average: mean of per-class metrics
Weighted average: weighted mean of per-class metrics

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Averaging multi-class metrics
from torchmetrics import Recall
recall_per_class = Recall(task="multiclass", num_classes=7, average=None)
recall_micro = Recall(task="multiclass", num_classes=7, average="micro")
recall_macro = Recall(task="multiclass", num_classes=7, average="macro")
recall_weighted = Recall(task="multiclass", num_classes=7, average="weighted")

When to use each:
Micro: Imbalanced datasets
Macro: Care about performance on small classes
Weighted: Consider errors in larger classes as more important

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Evaluation loop
from torchmetrics import Precision, Recall
metric_precision = Precision(
task="multiclass", num_classes=7, average="macro"
)
metric_recall = Recall(
task="multiclass", num_classes=7, average="macro"
)
net.eval()
with torch.no_grad():
for images, labels in dataloader_test:

Import and define precision and recall
metrics
Iterate over test examples with no gradient
For each test batch, get model outputs,
take most likely class, and pass it to metric
functions along with the labels
Compute the metrics

outputs = net(images)

print(f"Precision: {precision}")

_, preds = torch.max(outputs, 1)

print(f"Recall: {recall}")

metric_precision(preds, labels)
metric_recall(preds, labels)
precision = metric_precision.compute()
recall = metric_recall.compute()

Precision: 0.7284010648727417
Recall: 0.763038694858551

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Analyzing performance per class
metric_recall = Recall(
task="multiclass", num_classes=7, average=None
)
net.eval()
with torch.no_grad():
for images, labels in dataloader_test:

Compute metric with average=None
This gives one score per class
Dataset 's .class_to_idx attribute maps

class names to indices

outputs = net(images)
_, preds = torch.max(outputs, 1)

dataset_test.class_to_idx

metric_recall(preds, labels)
recall = metric_recall.compute()

{'cirriform clouds': 0,
'clear sky': 1,

print(recall)

tensor([0.6364, 1.0000, 0.9091, 0.7917,
0.5049, 0.9500, 0.5493],
dtype=torch.float32)

'cumulonimbus clouds': 2,
'cumulus clouds': 3,
'high cumuliform clouds': 4,
'stratiform clouds': 5,
'stratocumulus clouds': 6}

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Analyzing performance per class
k = class name, e.g. cirriform clouds

{
k: recall[v].item()
for k, v
in dataset_test.class_to_idx.items()
}

{'cirriform clouds': 0.6363636255264282,

v = class index, e.g. 0
recall[v] =
tensor(0.6364, dtype=torch.float32)
recall[v].item() = 0.6364

'clear sky': 1.0,
'cumulonimbus clouds': 0.9090909361839294,
'cumulus clouds': 0.7916666865348816,
'high cumuliform clouds': 0.5048543810844421,
'stratiform clouds': 0.949999988079071,
'stratocumulus clouds': 0.5492957830429077}

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Let's practice!
I N T E R M E D I AT E D E E P L E A R N I N G W I T H P Y T O R C H



# Chapter 3

In [None]:
Handling sequences
with PyTorch
I N T E R M E D I AT E D E E P L E A R N I N G W I T H P Y T O R C H

Michal Oleszak
Machine Learning Engineer

Sequential data
Ordered in time or space
Order of the data points contains
dependencies between them
Examples of sequential data:
Time series
Text
Audio waves

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Electricity consumption prediction
Task: predict future electricity consumption based on past patterns
Electricity consumption dataset:
timestamp

consumption

0

2011-01-01 00:15:00

-0.704319

1

2011-01-01 00:30:00

-0.704319

...

...

140254 2014-12-31 23:45:00

-0.095751

140255 2015-01-01 00:00:00

-0.095751

...

1 Trindade,Artur. (2015). ElectricityLoadDiagrams20112014. UCI Machine Learning Repository.

https://doi.org/10.24432/C58C86.

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Train-test split
No random splitting for time series!
Look-ahead bias: model has info about the future
Solution: split by time

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Creating sequences
Sequence length = number of data points in one training example
24 × 4 = 96 -> consider last 24 hours
Predict single next data point

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Creating sequences in Python
import numpy as np
def create_sequences(df, seq_length):
xs, ys = [], []
for i in range(len(df) - seq_length):
x = df.iloc[i:(i+seq_length), 1]
y = df.iloc[i+seq_length, 1]
xs.append(x)
ys.append(y)

Take data and sequence length as inputs
Initialize inputs and targets lists
Iterate over data points
Define inputs and target
Append to pre-initialized lists
Return inputs and targets as NumPy arrays

return np.array(xs), np.array(ys)

INTERMEDIATE DEEP LEARNING WITH PYTORCH

TensorDataset
Create training examples
X_train, y_train = create_sequences(train_data, seq_length)
print(X_train.shape, y_train.shape)

(34944, 96) (34944,)

Convert them to a Torch Dataset
from torch.utils.data import TensorDataset
dataset_train = TensorDataset(
torch.from_numpy(X_train).float(),
torch.from_numpy(y_train).float(),
)

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Applicability to other sequential data
Same techniques are applicable to other sequences:
Large Language Models
Speech recognition

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Let's practice!
I N T E R M E D I AT E D E E P L E A R N I N G W I T H P Y T O R C H

Recurrent Neural
Networks
I N T E R M E D I AT E D E E P L E A R N I N G W I T H P Y T O R C H

Michal Oleszak
Machine Learning Engineer

Recurrent neuron
Feed-forward networks
RNNs: have connections pointing back
Recurrent neuron:
Input x
Output y
Hidden state h
In PyTorch: nn.RNN()

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Unrolling recurrent neuron through time

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Unrolling recurrent neuron through time

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Unrolling recurrent neuron through time

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Deep RNNs

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Sequence-to-sequence architecture
Pass sequence as input, use the entire output sequence
Example: Real-time speech recognition

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Sequence-to-vector architecture
Pass sequence as input, use only the last output
Example: Text topic classification

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Vector-to-sequence architecture
Pass single input, use the entire output sequence
Example: Text generation

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Encoder-decoder architecture
Pass entire input sequence, only then start using output sequence
Example: Machine translation

INTERMEDIATE DEEP LEARNING WITH PYTORCH

RNN in PyTorch
class Net(nn.Module):
def __init__(self):
super().__init__()
self.rnn = nn.RNN(
input_size=1,
hidden_size=32,
num_layers=2,
batch_first=True,
)
self.fc = nn.Linear(32, 1)
def forward(self, x):
h0 = torch.zeros(2, x.size(0), 32)

Define model class with __init__ method
Define recurrent layer, self.rnn
Define linear layer, fc
In forward() , initialize first hidden state to
zeros
Pass input and first hidden state through
RNN layer
Select last RNN's output and pass it
through linear layer

out, _ = self.rnn(x, h0)
out = self.fc(out[:, -1, :])
return out

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Let's practice!
I N T E R M E D I AT E D E E P L E A R N I N G W I T H P Y T O R C H

LSTM and GRU cells
I N T E R M E D I AT E D E E P L E A R N I N G W I T H P Y T O R C H

Michal Oleszak
Machine Learning Engineer

Short-term memory problem
RNN cells maintain memory via hidden
state
This memory is very short-term
Two more powerful cells solve the problem:
LSTM (Long Short-Term Memory) cell
GRU (Gated Recurrent Unit) cell

INTERMEDIATE DEEP LEARNING WITH PYTORCH

RNN cell
Two inputs:
current input data x
previous hidden state h
Two outputs:
current output y
next hidden state h

INTERMEDIATE DEEP LEARNING WITH PYTORCH

LSTM cell
Three inputs and outputs (two hidden
states):
h : short-term state
c : long-term state

Three "gates":
Forget gate: what to remove from longterm memory
Input gate: what to save to long-term
memory

Outputs h and y are the same

Output gate: what to return at the
current time step

INTERMEDIATE DEEP LEARNING WITH PYTORCH

LSTM in PyTorch
class Net(nn.Module):
def __init__(self, input_size):
super().__init__()
self.lstm = nn.LSTM(
input_size=1,
hidden_size=32,
num_layers=2,
batch_first=True,
)

__init__() :

Replace nn.RNN with nn.LSTM
forward() :

Add another hidden state c
Initialize c and h with zeros
Pass both hidden states to lstm layer

self.fc = nn.Linear(32, 1)
def forward(self, x):
h0 = torch.zeros(2, x.size(0), 32)
c0 = torch.zeros(2, x.size(0), 32)
out, _ = self.lstm(x, (h0, c0))
out = self.fc(out[:, -1, :])
return out

INTERMEDIATE DEEP LEARNING WITH PYTORCH

GRU cell
Simplified version of LSTM cell
Just one hidden state
No output gate

INTERMEDIATE DEEP LEARNING WITH PYTORCH

GRU in PyTorch
class Net(nn.Module):
def __init__(self, input_size):
super().__init__()
self.gru = nn.GRU(
input_size=1,

__init__() :

Replace nn.RNN with nn.GRU
forward() :

Use the gru layer

hidden_size=32,
num_layers=2,
batch_first=True,
)
self.fc = nn.Linear(32, 1)
def forward(self, x):
h0 = torch.zeros(2, x.size(0), 32)
out, _ = self.gru(x, h0)
out = self.fc(out[:, -1, :])
return out

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Should I use RNN, LSTM, or GRU?
RNN is not used much anymore
GRU is simpler than LSTM = less computation
Relative performance varies per use-case
Try both and compare

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Let's practice!
I N T E R M E D I AT E D E E P L E A R N I N G W I T H P Y T O R C H

Training and
evaluating RNNs
I N T E R M E D I AT E D E E P L E A R N I N G W I T H P Y T O R C H

Michal Oleszak
Machine Learning Engineer

Mean Squared Error Loss
Error:

Squaring the error:

prediction − target

Ensures positive and negative errors don't
cancel out

Squared Error:

Penalizes large errors more

2

In PyTorch:

(prediction − target)

criterion = nn.MSELoss()

Mean Squared Error:

2

avg[(prediction − target) ]

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Expanding tensors
Recurrent layers expect input shape
(batch_size, seq_length, num_features)

We got (batch_size, seq_length)

for seqs, labels in dataloader_train:
print(seqs.shape)

torch.Size([32, 96])

We must add one dimension at the end
seqs = seqs.view(32, 96, 1)
print(seqs.shape)

torch.Size([32, 96, 1])

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Squeezing tensors
In evaluation loop, we need to revert the
reshaping done in the training loop

Shapes of model outputs and labels must
match for the loss function

Labels are of shape (batch_size)

We can drop the last dimension from model
outputs

for seqs, labels in test_loader:
print(labels.shape)
torch.Size([32])

out = net(seqs).squeeze()
torch.Size([32])

Model outputs are (batch_size, 1)
out = net(seqs)
torch.Size([32, 1])

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Training loop
net = Net()
criterion = nn.MSELoss()
optimizer = optim.Adam(
net.parameters(), lr=0.001
)

Instantiate model, define loss & optimizer
Iterate over epochs and data batches
Reshape input sequence
The rest: as usual

for epoch in range(num_epochs):
for seqs, labels in dataloader_train:
seqs = seqs.view(32, 96, 1)
outputs = net(seqs)
loss = criterion(outputs, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Evaluation loop
mse = torchmetrics.MeanSquaredError()

Set up MSE metric
Iterate through test data with no gradients

net.eval()
with torch.no_grad():
for seqs, labels in test_loader:

Reshape model inputs
Squeeze model outputs

seqs = seqs.view(32, 96, 1)

Update the metric

outputs = net(seqs).squeeze()

Compute final metric value

mse(outputs, labels)
print(f"Test MSE: {mse.compute()}")
Test MSE: 0.13292162120342255

INTERMEDIATE DEEP LEARNING WITH PYTORCH

LSTM vs. GRU
LSTM:
Test MSE: 0.13292162120342255

GRU:
Test MSE: 0.12187089771032333

GRU preferred: same or better results with less processing power

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Let's practice!
I N T E R M E D I AT E D E E P L E A R N I N G W I T H P Y T O R C H



# Chapter 4

In [None]:
Multi-input models
I N T E R M E D I AT E D E E P L E A R N I N G W I T H P Y T O R C H

Michal Oleszak
Machine Learning Engineer

Why multi-input?
Using more information

Multi-modal models

Metric learning

Self-supervised learning

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Omniglot dataset

1 Lake, B. M., Salakhutdinov, R., and Tenenbaum, J. B. (2015). Human-level concept learning through probabilistic

program induction. Science, 350(6266), 1332-1338.

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Character classification

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Character classification

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Character classification

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Character classification

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Two-input Dataset
from PIL import Image

Assign samples and transforms

class OmniglotDataset(Dataset):

print(samples[0])

def __init__(self, transform, samples):
self.transform = transform
self.samples = samples
def __len__(self):
return len(self.samples)
def __getitem__(self, idx):
img_path, alphabet, label = self.samples[idx]
img = Image.open(img_path).convert('L')
img = self.transform(img)
return img, alphabet, label

[(
'omniglot_train/.../0459_14.png',
array([1., 0., 0., ..., 0., 0., 0.]),
0
)]

Implement __len__()
Load and transform image
Return both inputs and label

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Tensor concatenation
x = torch.tensor([
[1, 2, 3],
])
y = torch.tensor([
[4, 5, 6],
])

Concatenation along axis 0
torch.cat((x, y), dim=0)
[[1, 2, 3],
[4, 5, 6]]

Concatenation along axis 1
torch.cat((x, y), dim=1)
[[1, 2, 3, 4, 5, 6]]

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Two-input architecture
class Net(nn.Module):
def __init__(self):
super().__init__()
self.image_layer = nn.Sequential(
nn.Conv2d(1, 16, kernel_size=3, padding=1),

Define image processing layer
Define alphabet processing layer
Define classifier layer

nn.MaxPool2d(kernel_size=2),
nn.ELU(),
nn.Flatten(),
nn.Linear(16*32*32, 128)
)
self.alphabet_layer = nn.Sequential(
nn.Linear(30, 8),
nn.ELU(),
)
self.classifier = nn.Sequential(
nn.Linear(128 + 8, 964),
)

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Two-input architecture
def forward(self, x_image, x_alphabet):
x_image = self.image_layer(x_image)
x_alphabet = self.alphabet_layer(x_alphabet)
x = torch.cat((x_image, x_alphabet), dim=1)
return self.classifier(x)

Pass image through image layer
Pass alphabet through alphabet layer
Concatenate image and alphabet outputs
Pass the result through classifier

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Training loop
net = Net()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.01)
for epoch in range(10):
for img, alpha, labels in dataloader_train:
optimizer.zero_grad()

Training data consists of three items:
Image
Alphabet vector
Labels
We pass the model images and alphabets

outputs = net(img, alpha)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Let's practice!
I N T E R M E D I AT E D E E P L E A R N I N G W I T H P Y T O R C H

Multi-output models
I N T E R M E D I AT E D E E P L E A R N I N G W I T H P Y T O R C H

Michal Oleszak
Machine Learning Engineer

Why multi-output?
Multi-task learning

Multi-label classification

Regularization

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Character and alphabet classification

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Character and alphabet classification

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Two-output Dataset
class OmniglotDataset(Dataset):
def __init__(self, transform, samples):
self.transform = transform
self.samples = samples

We can use the same Dataset...
...with updated samples:
print(samples[0])

def __len__(self):
return len(self.samples)

[(
'omniglot_train/.../0459_14.png',

def __getitem__(self, idx):
img_path, alphabet, label = \
self.samples[idx]

0,
0,
)]

img = Image.open(img_path).convert('L')
img = self.transform(img)
return img, alphabet, label

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Two-output architecture
class Net(nn.Module):
def __init__(self, num_alpha, num_char):
super().__init__()
self.image_layer = nn.Sequential(
nn.Conv2d(1, 16, kernel_size=3, padding=1),
nn.MaxPool2d(kernel_size=2),
nn.ELU(),
nn.Flatten(),
nn.Linear(16*32*32, 128)

Define image-processing sub-network
Define output-specific classifiers
Pass image through dedicated sub-network
Pass the result through each output layer
Return both outputs

)
self.classifier_alpha = nn.Linear(128, 30)
self.classifier_char = nn.Linear(128, 964)
def forward(self, x):
x_image = self.image_layer(x)
output_alpha = self.classifier_alpha(x_image)
output_char = self.classifier_char(x_image)
return output_alpha, output_char

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Training loop
for epoch in range(10):
for images, labels_alpha, labels_char \
in dataloader_train:
optimizer.zero_grad()
outputs_alpha, outputs_char = net(images)
loss_alpha = criterion(

Model produces two outputs
Calculate loss for each output
Combine the losses to one total loss
Backprop and optimize with the total loss

outputs_alpha, labels_alpha
)
loss_char = criterion(
outputs_char, labels_char
)
loss = loss_alpha + loss_char
loss.backward()
optimizer.step()

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Let's practice!
I N T E R M E D I AT E D E E P L E A R N I N G W I T H P Y T O R C H

Evaluation of multioutput models and
loss weighting
I N T E R M E D I AT E D E E P L E A R N I N G W I T H P Y T O R C H

Michal Oleszak
Machine Learning Engineer

Model evaluation
acc_alpha = Accuracy(
task="multiclass", num_classes=30
)
acc_char = Accuracy(
task="multiclass", num_classes=964
)
net.eval()
with torch.no_grad():
for images, labels_alpha, labels_char \
in dataloader_test:

Set up metric for each output
Iterate over test loader and get outputs
Calculate prediction for each output
Update accuracy metrics
Calculate final accuracy scores
print(f"Alphabet: {acc_alpha.compute()}")
print(f"Character: {acc_char.compute()}")

out_alpha, out_char = net(images)
_, pred_alpha = torch.max(out_alpha, 1)

Alphabet: 0.3166305720806122

_, pred_char = torch.max(out_char, 1)

Character: 0.24064336717128754

acc_alpha(pred_alpha, labels_alpha)
acc_char(pred_char, labels_char)

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Multi-output training loop revisited
for epoch in range(10):
for images, labels_alpha, labels_char \
in dataloader_train:

Two losses: for alphabets and characters
Final loss defined as sum of alphabet and

optimizer.zero_grad()

character losses:

outputs_alpha, outputs_char = net(images)

loss = loss_alpha + loss_char

loss_alpha = criterion(
outputs_alpha, labels_alpha
)

Both classification tasks deemed equally
important

loss_char = criterion(
outputs_char, labels_char
)
loss = loss_alpha + loss_char
loss.backward()
optimizer.step()

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Varying task importance
Character classification 2 times more important than alphabet classification
Approach 1: Scale more important loss by a factor of 2
loss = loss_alpha + loss_char * 2
Approach 2: Assign weights that sum to 1
loss = 0.33 * loss_alpha + 0.67 * loss_char

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Warning: losses on different scales
Losses must be on the same scale before they are weighted and added
Example tasks:
Predict house price -> MSE loss
Predict quality: low, medium, high -> CrossEntropy loss
CrossEntropy is typically in the single-digits
MSE loss can reach tens of thousands
Model would ignore quality assessment task
Solution: Normalize both losses before weighting and adding
loss_price = loss_price / torch.max(loss_price)
loss_quality = loss_quality / torch.max(loss_quality)
loss = 0.7 * loss_price + 0.3 * loss_quality

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Let's practice!
I N T E R M E D I AT E D E E P L E A R N I N G W I T H P Y T O R C H

Wrap-up
I N T E R M E D I AT E D E E P L E A R N I N G W I T H P Y T O R C H

Michal Oleszak
Machine Learning Engineer

What you learned
1. Training robust neural networks

2. Images and convolutional neural networks

PyTorch and OOP

Handling images with PyTorch

Optimizers

Training and evaluating convolutional
networks

Vanishing and exploding gradients

Data augmentation
3. Sequences and recurrent neural networks

4. Multi-input and multi-output architectures

Handling sequences with PyTorch

Multi-input models

Training and evaluating recurrent networks
(LSTM and GRU)

Multi-output models
Loss weighting

INTERMEDIATE DEEP LEARNING WITH PYTORCH

What's next?
What you might consider learning next:
Transformers
Self-supervised learning

Courses:
Deep Learning for Text with PyTorch
Deep Learning for Images with PyTorch
Efficient AI Model Training with PyTorch

INTERMEDIATE DEEP LEARNING WITH PYTORCH

Congratulations and
good luck!
I N T E R M E D I AT E D E E P L E A R N I N G W I T H P Y T O R C H

