# 02 - Define, train and evaluate a basic Neural Network in Pytorch

These tutorials are inspired by the book "[Deep Learning with PyTorch](https://pytorch.org/assets/deep-learning/Deep-Learning-with-PyTorch.pdf)" by Stevens et al and can be seen as a summary of the part I of the book regarding PyTorch itself. Normally, following the tutorials should be enough and reading the book is not required.

## Contents

1. Loading data  
    1. Loading CIFAR-10  (see previous tutorial)  
    2. From CIFAR-10 to CIFAR-2  
2. Basic building blocks for neural networks in PyTorch  
    1. The 'torch.nn' module and the 'torch.nn.Module' class  
    2. Our network as a nn.Sequential object  
    3. Pytorch notations and dimensions  
    4. Inspecting a module object
3. Training our model  
    1. Training on CPU  
    2. Training on GPU 

In [15]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from datetime import datetime
from torch.utils.data import random_split

torch.manual_seed(123)

<torch._C.Generator at 0x7f015bfff730>

## 1. Loading data

### 1.1 Loading CIFAR-10  (see previous tutorial)

In [16]:
def load_cifar(train_val_split=0.9, data_path='../data/', preprocessor=None):
    
    # Define preprocessor if not already given
    if preprocessor is None:
        preprocessor = transforms.Compose([
            transforms.ToTensor(),
            transforms.Normalize((0.4915, 0.4823, 0.4468),
                                (0.2470, 0.2435, 0.2616))
        ])
    
    # load datasets
    data_train_val = datasets.CIFAR10(
        data_path,       
        train=True,      
        download=True,  
        transform=preprocessor)

    data_test = datasets.CIFAR10(
        data_path, 
        train=False,
        download=True,
        transform=preprocessor)

    # train/validation split
    n_train = int(len(data_train_val)*train_val_split)
    n_val =  len(data_train_val) - n_train

    data_train, data_val = random_split(
        data_train_val, 
        [n_train, n_val],
        generator=torch.Generator().manual_seed(123)
    )

    print("Size of the train dataset:        ", len(data_train))
    print("Size of the validation dataset:   ", len(data_val))
    print("Size of the test dataset:         ", len(data_test))
    
    return (data_train, data_val, data_test)

cifar10_train, cifar10_val, cifar10_test = load_cifar()

Files already downloaded and verified
Files already downloaded and verified
Size of the train dataset:         45000
Size of the validation dataset:    5000
Size of the test dataset:          10000


### 1.2 From CIFAR-10 to CIFAR-2

We define a lighter version of CIFAR-10, which is now CIFAR-2, containing only the planes and birds

In [17]:
label_map = {0: 0, 2: 1}
class_names = ['airplane', 'bird']

# For each dataset, keep only airplanes and birds
cifar2_train = [(img, label_map[label]) for img, label in cifar10_train if label in [0, 2]]
cifar2_val = [(img, label_map[label]) for img, label in cifar10_val if label in [0, 2]]
cifar2_test = [(img, label_map[label]) for img, label in cifar10_test if label in [0, 2]]

print('Size of the training dataset: ', len(cifar2_train))
print('Size of the validation dataset: ', len(cifar2_val))
print('Size of the test dataset: ', len(cifar2_test))

Size of the training dataset:  9017
Size of the validation dataset:  983
Size of the test dataset:  2000


## 2. Basic building blocks for neural networks in PyTorch 

### 2.1 The 'torch.nn' module and the 'torch.nn.Module' class

In Pytorch, the basic building blocks for neural networks are available in the [torch.nn](https://pytorch.org/docs/stable/nn.html) module (often imported as 'nn'). The base class for all the basic components of a neural network is then [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html)

For example:

- the [nn.ReLU](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html#torch.nn.ReLU) activation fonction is a subclass of the [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) class
- the 1D convolutional layer [nn.Conv1d](https://pytorch.org/docs/stable/generated/torch.nn.Conv1d.html#torch.nn.Conv1d) is a subclass of the [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) class
- the MSE loss function [nn.MSELoss](https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss) is a subclass of the [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) class
- the distance function [nn.PairwiseDistance ](https://pytorch.org/docs/stable/generated/torch.nn.PairwiseDistance.html#torch.nn.PairwiseDistance) is a subclass of the [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) class
- the container [nn.Sequential](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html#torch.nn.Sequential) (will see in the next cell what it is exactly)  is also a subclass of the [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) class

Exception: 

- [nn.Parameter](https://pytorch.org/docs/stable/generated/torch.nn.parameter.Parameter.html) is not a subclass of [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) but of [torch.Tensor](https://pytorch.org/docs/stable/tensors.html#torch.Tensor) instead (the other extremely important class in PyTorch)

So in short, almost everything in torch.nn can be seen as a nn.Module in PyTorch :) 

In [18]:
print("--- Things implemented in nn.module inherit from the nn.Module class ---")
print("ReLU activation function:          ", issubclass(nn.ReLU, nn.Module))
print("Conv1d layer:                      ", issubclass(nn.Conv1d, nn.Module))
print("MSELoss loss function:             ", issubclass(nn.MSELoss, nn.Module))
print("PairwiseDistance distance measure: ", issubclass(nn.PairwiseDistance, nn.Module))
print("Sequential (group of layers):      ", issubclass(nn.Sequential, nn.Module))
print("\n--- nn.Parameter is not a subclass of nn.Module but of torch.Tensor instead ---")
print("nn.Parameter, subclass of nn.Module?    ", issubclass(nn.Parameter, nn.Module))
print("nn.Parameter, subclass of torch.Tensor? ", issubclass(nn.Parameter, torch.Tensor))

--- Things implemented in nn.module inherit from the nn.Module class ---
ReLU activation function:           True
Conv1d layer:                       True
MSELoss loss function:              True
PairwiseDistance distance measure:  True
Sequential (group of layers):       True

--- nn.Parameter is not a subclass of nn.Module but of torch.Tensor instead ---
nn.Parameter, subclass of nn.Module?     False
nn.Parameter, subclass of torch.Tensor?  True




### 2.2 Our network as a nn.Sequential object

*(inspired by 6.3. Finally a neural network)*

Now what is this 
*[nn.Sequential](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html#torch.nn.Sequential) container* thing? 
Well [nn.Sequential](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html#torch.nn.Sequential) provides a simple way to concatenate [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) objects.

Links to the documentation:
- [nn.Flatten](https://pytorch.org/docs/stable/generated/torch.nn.Flatten.html#torch.nn.Flatten), flattens each input tensor
- [nn.Linear](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#torch.nn.Linear), fully connected linear layer
- [nn.Tanh](https://pytorch.org/docs/stable/generated/torch.nn.Tanh.html?highlight=tanh#torch.nn.Tanh), activation function
- [nn.ReLU](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html?highlight=relu#torch.nn.ReLU), activation function

In [19]:
n_in = 32*32*3   # Determined by our dataset: 32x32 RGB images
n_hidden1 = 256  # Choose whatever you want here, often powers of 2
n_hidden2 = 64
n_out = 2        # Determined by our number of classes, so 2: birds and planes

model_seq = nn.Sequential(
    # Flatten is required in this case because each input is (32x32x3)
    # dimensional and linear layers expect 1D inputs
    nn.Flatten(),              
    nn.Linear(n_in, n_hidden1),
    nn.Tanh(),                   
    nn.Linear(n_hidden1, n_hidden2),
    nn.ReLU(),
    nn.Linear(n_hidden2, n_out),
    # Note that we don't need a softmax function in the output layer if we
    # use nn.CrossEntropyLoss as the loss function
    )

### 2.3 Pytorch notations and dimensions

Pytorch's modules (so neural networks, layers, loss functions, etc.) expect inputs of specific dimensions. The required shape is most of the time specified in the documentation, but using their own notations. For example, for the [nn.Conv2d](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html#torch.nn.Conv2d) layer, it is written that the input shape should be "``((N, C_in, H, W))`` and the output shape "``(N, C_out, H_out, W_out))``" .  
Here are the most common notations and their meanings:

- ``N``: batch size,         (how many inputs do you feed at the same time)
- ``C``: number of channels, (number of color channels RGB=3, RGBA=4, etc if refering to an image or the number of filters if refering to a convolutional layer)
- ``H``: height of the image
- ``W``: width of the image
- ``∗``: means any number of dimensions including none 
- $_{in} / _{out}$ : as a subscript, refers to "input" and "output". e.g. $H_{in}$ for input width and $H_{out}$ for output width
- ``L``: sequence length (for recurrent neural networks)
- ``in_features``: number of components of the input tensor (when the input is a vector)
- ``out_features``: number of components of the output tensor (when the output is a vector)

If the input is supposed to be images, the expected input shape is usually ``(N, C_in, H_in, W_in)`` in pytorch because the first layer is either 
- a [nn.Conv2d](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html#torch.nn.Conv2d) layer (which precisely expects ``(N, C_in, H_in, W_in)`` inputs ) 
- or a [nn.Linear](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#torch.nn.Linear) layer (which expects ``((N, in_features))`` inputs) preceded by a [nn.Flatten](https://pytorch.org/docs/stable/generated/torch.nn.Flatten.html#torch.nn.Flatten) layer (that precisely reshapes ``(N, C_in, H_in, W_in)`` inputs into ``(N, C_in*H_in*W_in)`` inputs)


#### Feeding one image into our custom neural network

As explained in the previous paragraph, we should first make sure that our input has shape ``(N, C_in, H_in, W_in)``, or more specifically ``(1, 3, 32, 32)`` because we want a batch composed of only one (i.e. ``N=1``) RGB image (i.e. ``C_in =3``) of dimensions ``32x32`` (i.e. ``H_in = 32``, ``W_in = 32``).

In [20]:
# Shape of an image
print("Shape of an image:                       ", cifar2_train[0][0].shape)
# Add a extra dimension for the batch dimension
batch_t = torch.unsqueeze(cifar2_train[0][0], 0)
print("Shape of our input batch of one image:   ", batch_t.shape)
# Feed our batch into our network and get the output
out = model_seq(batch_t)
print("Shape of our output batch of one image:  ", out.shape)   
print("Output tensor (values are just rubbish because the nn is not trained yet!):\n Ouput: ", out)


Shape of an image:                        torch.Size([3, 32, 32])
Shape of our input batch of one image:    torch.Size([1, 3, 32, 32])
Shape of our output batch of one image:   torch.Size([1, 2])
Output tensor (values are just rubbish because the nn is not trained yet!):
 Ouput:  tensor([[-0.1395, -0.3130]], grad_fn=<AddmmBackward0>)


### 2.4 Inspecting a module object

We saw earlier that [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) is an essential part of the PyTorch library and that it is the base class for all the basic components of a neural network.
The fact that so many PyTorch objects inherit from this class has many advantages. One of them is that they share many important methods such as:

- [forward](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.forward) Defines the computation performed at every call. **Should be overridden by all subclasses** (We'll see that later)
- [modules](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.modules): Returns an iterator over all modules in the network.
- [named\_modules](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.named_modules): Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
- [parameters](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.parameters): Returns an iterator over module parameters (the so-called [nn.Parameter](https://pytorch.org/docs/stable/generated/torch.nn.parameter.Parameter.html), the subclass of Tensor)
- [named\_parameters](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.named_parameters): Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
- [state_dict](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.state_dict): Returns a dictionary containing a whole state of the module (can be useful when saving a module)
- [load\_state\_dict](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.load_state_dict): Copies parameters and buffers from state_dict into this module and its descendants (can be useful when loading/copying a module)
- [to](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.to) Moves and/or casts the parameters and buffers (typically to a GPU or CPU)
- [cpu](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.cpu) / [cuda](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.cuda): Moves all model parameters and buffers to the CPU / GPU
- [train](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.train) / [eval](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.eval): Sets the module in training/evaluation mode
- [zero_grad](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.zero_grad): Sets gradients of all model parameters to zero.

We will use most of these methods in this tutorial already. Let's start with the ones returning parameters / modules. We also use [torch.numel](https://pytorch.org/docs/stable/generated/torch.numel.html#torch.numel) to get the total number of elements in a given tensor

In [21]:
print("Inspecting parameters")
# Iterate over all the named parameters of our network
for p in model_seq.named_parameters():
    # p is a tuple: 
    # - p[0] is the name of parameter
    # - p[1] is a tensor containing the current parameter values
    print("name: ", p[0], "   length: ", p[1].numel())
    

print("\nTotal number of trainable parameters: ", sum([p.numel() for p in model_seq.parameters() if p.requires_grad == True]))

print("\nInspecting modules")
# Iterate over all the named modules of our network
for m in model_seq.named_modules():
    print(m)

Inspecting parameters
name:  1.weight    length:  786432
name:  1.bias    length:  256
name:  3.weight    length:  16384
name:  3.bias    length:  64
name:  5.weight    length:  128
name:  5.bias    length:  2

Total number of trainable parameters:  803266

Inspecting modules
('', Sequential(
  (0): Flatten(start_dim=1, end_dim=-1)
  (1): Linear(in_features=3072, out_features=256, bias=True)
  (2): Tanh()
  (3): Linear(in_features=256, out_features=64, bias=True)
  (4): ReLU()
  (5): Linear(in_features=64, out_features=2, bias=True)
))
('0', Flatten(start_dim=1, end_dim=-1))
('1', Linear(in_features=3072, out_features=256, bias=True))
('2', Tanh())
('3', Linear(in_features=256, out_features=64, bias=True))
('4', ReLU())
('5', Linear(in_features=64, out_features=2, bias=True))


## 3 Training our model

### 3.1 Training on CPU

#### Defining the training loop 

*(inspired by 8.4 Training our convnet)*



Links to the documentation:
- [nn.Module.train()](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.train): Sets the module in training mode. As stated in the PyTorch documentation: 

> "Some models use modules which have different training and evaluation behavior, such as batch normalization. To switch between these modes, use [model.train()](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.train) or [model.eval()](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.eval) (from the [torch.nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#module)) as appropriate.
- [nn.Module.zero_grad](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.zero_grad): Sets gradients of all model parameters to zero.
- [torch.autograd.Function.backward()](https://pytorch.org/docs/stable/generated/torch.autograd.Function.backward.html#torch-autograd-function-backward), backpropagates the loss
- [torch.optim.SGD.step()](https://pytorch.org/docs/stable/generated/torch.optim.SGD.html?highlight=step#torch.optim.SGD.step), updates trainable parameters
- [torch.Tensor.item()](https://pytorch.org/docs/stable/generated/torch.Tensor.item.html?highlight=item#torch.Tensor.item), returns the tensor's value as a standard Python number.

In [22]:
def train(n_epochs, optimizer, model, loss_fn, train_loader):
    
    n_batch = len(train_loader)
    
    # We'll store there the training loss for each epoch
    losses_train = []
    
    # Set the network in training mode
    model.train()
    
    # Re-initialize gradients, just in case the model has been inappropriately 
    # manipulated before the training
    optimizer.zero_grad(set_to_none=True)
    
    for epoch in range(1, n_epochs + 1): 
        
        # Training loss for the current epoch
        loss_train = 0

        # Loop over our dataset (in batches the data loader creates for us)
        for imgs, labels in train_loader:
            
            # Feed a batch into our model
            outputs = model(imgs)
            
            # Compute the loss we wish to minimize 
            # Note that by default, it is the mean loss that is computed
            # (so entire_batch_loss / batch_size)
            loss = loss_fn(outputs, labels) 
            
            
            # Perform the backward step. That is, compute the gradients of all parameters we want the network to learn
            loss.backward()
            
            # Update the model
            optimizer.step() 
            
            # Zero out gradients before the next round (or the end of training)
            optimizer.zero_grad() 

            # Update loss for this epoch
            # It is important to transform the loss to a number with .item()
            loss_train += loss.item()
            
        # Store current epoch loss. 
        losses_train.append(loss_train / n_batch)

        if epoch == 1 or epoch % 10 == 0:
            print('{}  |  Epoch {}  |  Training loss {:.3f}'.format(
                datetime.now().time(), epoch, loss_train / n_batch))
            
    return losses_train

#### Train the model using the training loop

Links to the documentation:
- [torch.utils.data.DataLoader](https://pytorch.org/docs/stable/data.html?highlight=dataloader#torch.utils.data.DataLoader), efficiently loads the dataset into batches
- [optim.SGD](https://pytorch.org/docs/stable/generated/torch.optim.SGD.html?highlight=sgd#torch.optim.SGD), optimizer
- [nn.CrossEntropyLoss](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html?highlight=crossentropy#torch.nn.CrossEntropyLoss), loss function

In [23]:
# The DataLoader batches up the examples of our cifar dataset
# Here we use shuffle = True in order to shuffle the dataset for the training
train_loader = torch.utils.data.DataLoader(cifar2_train, batch_size=64, shuffle=True) 

# Instantiate the optimizer, here:
# 1. Stochastic Gradient Descent optimizer, 
# 2. that has to be applied to our parameters (model.parameters())
# 3. With a learning rate of 1e-2
optimizer = optim.SGD(model_seq.parameters(), lr=1e-2)

# Instantiate the loss function (here we use cross entropy)
loss_fn = nn.CrossEntropyLoss()

# Now all we have to do is calling the training loop
# WARNING THIS MIGHT BE EXTREMELY SLOW. STOP YOUR KERNEL TO STOP THE TRAINING
train(
    n_epochs = 21,
    optimizer = optimizer,
    model = model_seq,
    loss_fn = loss_fn,
    train_loader = train_loader,
)
print('')

08:26:27.500662  |  Epoch 1  |  Training loss 0.544
08:26:34.727586  |  Epoch 10  |  Training loss 0.343
08:26:43.514128  |  Epoch 20  |  Training loss 0.243



#### Measuring accuracy

In [24]:
# Here we use shuffle = False
# Because it is easier to check the predictions made.
train_loader = torch.utils.data.DataLoader(cifar2_train, batch_size=64, shuffle=False)
val_loader = torch.utils.data.DataLoader(cifar2_val, batch_size=64, shuffle=False)

def compute_accuracy(model, loader):
    model.eval()
    correct = 0
    total = 0

    # We do not want gradients here, as we will not want to update the parameters.
    with torch.no_grad():
        for imgs, labels in loader:

            outputs = model(imgs)
            _, predicted = torch.max(outputs, dim=1)
            total += labels.shape[0]
            correct += int((predicted == labels).sum())

    acc =  correct / total
    print("Accuracy: {:.2f}".format(acc))
    return acc

print("Training accuracy:")
compute_accuracy(model_seq, train_loader)
print("Validation accuracy:")
compute_accuracy(model_seq, val_loader)

Training accuracy:
Accuracy: 0.91
Validation accuracy:
Accuracy: 0.85


0.8463886063072228

### 3.2 Training on GPU

*(Inspired by 8.4.3 Training on the GPU)*

#### Check if a GPU is available


In [25]:
device = (torch.device('cuda') if torch.cuda.is_available()
          else torch.device('cpu'))
print(f"Training on device {device}.")

Training on device cpu.


#### Defining the training loop 

In [26]:
def train_on_gpu(n_epochs, optimizer, model, loss_fn, train_loader):
    
    n_batch = len(train_loader)
    losses_train = []
    model.train()
    optimizer.zero_grad(set_to_none=True)
    
    for epoch in range(1, n_epochs + 1):
        # Training loss for the current epoch
        loss_train = 0.0
        for imgs, labels in train_loader:
            # These two lines following lines are what differs from 
            # our previous traini function.
            # They move imgs and labels to the device we are training
            # on (gpu if available, cpu otherwise)
            imgs = imgs.to(device=device) 
            labels = labels.to(device=device)

            outputs = model(imgs)
            
            loss = loss_fn(outputs, labels)
            loss.backward()
            
            optimizer.step()
            optimizer.zero_grad()

            loss_train += loss.item()
            
        losses_train.append(loss_train / n_batch)

        if epoch == 1 or epoch % 10 == 0:
            print('{}  |  Epoch {}  |  Training loss {:.3f}'.format(
                datetime.now().time(), epoch, loss_train / n_batch))
    return losses_train

#### Train the model using the training loop

In [27]:
# Again shuffle = True for the training phase
train_loader = torch.utils.data.DataLoader(cifar2_train, batch_size=64, shuffle=True)

# Moves our model (all parameters) to the GPU. If 
# you forget to move either the model or the inputs to the
# GPU, you will get errors about tensors not being on the same
# device, because the PyTorch operators do not support
# mixing GPU and CPU inputs.
model_seq.to(device=device) 
optimizer = optim.SGD(model_seq.parameters(), lr=1e-2)
loss_fn = nn.CrossEntropyLoss()


# WARNING. This is supposed to much much faster than previously but it 
# might still take a while if your gpu is not available
# AGAIN STOP YOUR KERNEL IF IT'S TOO SLOW 
train_on_gpu(
    n_epochs = 20,
    optimizer = optimizer,
    model = model_seq,
    loss_fn = loss_fn,
    train_loader = train_loader,
)
print('')

08:26:46.384795  |  Epoch 1  |  Training loss 0.227
08:26:53.472497  |  Epoch 10  |  Training loss 0.147
08:27:00.503042  |  Epoch 20  |  Training loss 0.084



#### Measuring accuracy

In [28]:
# Again shuffle = False outside training
train_loader = torch.utils.data.DataLoader(cifar2_train, batch_size=64, shuffle=False)
val_loader = torch.utils.data.DataLoader(cifar2_val, batch_size=64, shuffle=False)

def compute_accuracy_on_gpu(model, loader):
    model.eval()
    correct = 0
    total = 0

    with torch.no_grad():
        for imgs, labels in loader:
            # These two lines following lines are what differs from 
            # our previous validate function.
            # They move imgs and labels to the device we are predicting
            # on (gpu if available, cpu otherwise)
            imgs = imgs.to(device=device)
            labels = labels.to(device=device)

            outputs = model(imgs)
            _, predicted = torch.max(outputs, dim=1)
            total += labels.shape[0]
            correct += int((predicted == labels).sum())

    acc =  correct / total
    print("Accuracy: {:.2f}".format(acc))
    return acc

print("Training accuracy:")
compute_accuracy_on_gpu(model_seq, train_loader)
print("Validation accuracy:")
compute_accuracy_on_gpu(model_seq, val_loader)

Training accuracy:
Accuracy: 0.98
Validation accuracy:
Accuracy: 0.84


0.8402848423194303