# Lab Instructions

In the lab, you're presented a task such as building a dataset, training a model, or writing a training loop, and we'll provide the code structured in such a way that you can fill in the blanks in the code using the knowledge you acquired in the chapters that precede the lab. You should be able to find appropriate snippets of code in the course content that work well in the lab with minor or no adjustments.

The blanks in the code are indicated by ellipsis (`...`) and comments (`# write your code here`).

In some cases, we'll provide you partial code to ensure the right variables are populated and any code that follows it runs accordingly.

```python
# write your code here
x = ...
```

The solution should be a single statement that replaces the ellipsis, such as:

```python
# write your code here
x = [0, 1, 2]
```

In some other cases, when there is no new variable being created, the blanks are shown like in the example below: 

```python
# write your code here
...
```

Although we're showing you only a single ellipsis (`...`), you may have to write more than one line of code to complete the step, such as:

```python
# write your code here
for i, xi in enumerate(x):
    x[i] = xi * 2
```

## Installation Notes

To run this notebook on Google Colab, you will need to install the following librarie: portalocker.

In Google Colab, you can run the following command to install this library:

In [None]:
!pip install portalocker

## 8.5 Lab 4: Sentiment Analysis

In this lab, you'll fine-tune an encoder-based model to perform sentiment analysis on the Standford Sentiment Treebank (SST2) dataset. You'll load RoBERTa's sibling, XLM-RoBERTa, use its prescribed transformations to preprocess text in the SST2 dataset, and fine-tune (train) it for one epoch.

### 8.5.1 Model

![](https://raw.githubusercontent.com/dvgodoy/assets/main/PyTorchInPractice/images/ch0/model_step1.png)

You'll use Torchtext's `XLMR_BASE_ENCODER` in this lab. Create an instance of a classification head (`RobertaClassificationHead`) to perform binary classification (we have two classes, "positive" and "negative" sentiment), matching the input dimensions to the embeddings generated by the base model, and then load the model with the head attached to it.

In [None]:
import torchtext

xlmr_base = torchtext.models.XLMR_BASE_ENCODER

#write your code here
classifier_head = ...

# Tip: you can call a method from xlmr_base to load the model with the head
# write your code here
model = ...
model

### 8.5.2 Dataset

![](https://raw.githubusercontent.com/dvgodoy/assets/main/PyTorchInPractice/images/ch0/data_step1.png)

Now, you will load Torchtext's ["Stanford Sentiment Treebank (SST2)"](https://pytorch.org/text/stable/datasets.html#sst2) dataset. This dataset uses Torchdata's `DataPipe`s instead of traditional `Dataset`s. It is already split into `train`, `dev` (validation), and `test` sets. You only need to specify it in the `split` argument in the constructor of `SST2`.

In [None]:
from torchtext.datasets import SST2

datapipes = {}
# write your code here
datapipes['train'] = ...
datapipes['val'] = ...

Let's take a look at one data point from the SST2 dataset. Just run the code below as is to visualize the output:

In [None]:
row = next(iter(datapipes['train']))
text, label = row
text, label

Each data point is a tuple, containing a line of text, and the corresponding label - the sentiment (0 for negative, 1 for positive).

### 8.5.3 Transforms

![](https://raw.githubusercontent.com/dvgodoy/assets/main/PyTorchInPractice/images/ch0/data_step3.png)

You already know the drill: you must preprocess the input (the text) using the prescribed transformation for the model you're using, so it gets tokenized, converted into token ids, and prependend/appended with the appropriate special tokens.

Retrieve the transformation function/model from the XLM-RoBERTa model, and write a function that takes a tuple of `(text, label)` and returns another tuple of `(list of tokens ids, label)`.

In [None]:
# write your code here
transform_fn = ...
transform_fn(text)

In [None]:
def apply_transform(row):
    text, label = row
    # Use the transform_fn you retrieved in the previous cell to
    # preprocess the text
    # write your code here
    ...

Let's apply your function to our data point to see if it is working as expected (just run the code below as is to visualize the output):

In [None]:
apply_transform(row)

Did you notice the transformation is returning a regular Python list of token ids, not a PyTorch tensor? Remember, we cannot make a tensor out of lists of different lengths (see section 6.3.3). The solution? Padding the shorter sentences, so they all have the same length.

But, how can we think of padding sentences if we don't have a mini-batch yet? It turns out, datapipes offer a `batch()` method that we can leverage to group data points together as mini-batches way before even thinking of creating a data loader.

Let's try one out (just run the code below as is to visualize the output):

In [None]:
batched_datapipe = datapipes['train'].map(apply_transform).batch(4)
batch_of_tuples = next(iter(batched_datapipe))
batch_of_tuples

The returned mini-batch is a list of four elements, each element being a tuple `(token ids, label)` returned as-is from the previous step of the datapipe.

However, in order to pad the sequences, it would be much better to have the a list of list of token ids instead. In other words, we need to turn rows into columns, and there is a method for that as well: `rows2columnar()`. Even better, we can name the columns, and they will be returned as dictionary keys. Just run the code below as is to visualize the output:

In [None]:
columnar_datapipe = batched_datapipe.rows2columnar(['token_ids', 'labels'])
dict_of_batches = next(iter(columnar_datapipe))

dict_of_batches['labels'], dict_of_batches['token_ids']

Awesome, now we're ready for the next step!

Write a function that takes a batch of (transformed) data points, pads the sequences (using `to_tensor` and the padding id provided above), and converts the labels into a tensor as well.

In [None]:
import torch
from torchtext.functional import to_tensor

padding_idx = transform_fn[1].vocab.lookup_indices(['<pad>'])[0]

def tensor_batch(batch):
    tokens = batch['token_ids']
    labels = batch['labels']
    # write your code here
    ...

Now, let's line up all these steps:
- applying transformation
- batching sequences
- turning rows of tuples into columns
- padding sequences

![](https://raw.githubusercontent.com/dvgodoy/assets/main/PyTorchInPractice/images/ch0/data_step4.png)

Just run the code below as is to apply all preprocessing steps to the datapipes:

In [None]:
for k in datapipes.keys():
    datapipes[k] = datapipes[k].map(apply_transform)
    datapipes[k] = datapipes[k].batch(16)
    datapipes[k] = datapipes[k].rows2columnar(['token_ids', 'labels'])
    datapipes[k] = datapipes[k].map(tensor_batch)

If we fetch from our data pipe, it should return a tuple of two tensors, each tensor containing as many rows as the mini-batch size. Just run the code below as is to visualize the output:

In [None]:
dp_out = next(iter(datapipes['train']))
dp_out

![](https://raw.githubusercontent.com/dvgodoy/assets/main/PyTorchInPractice/images/ch0/data_step5.png)

Now, create a data loader for each data pipe. Since the batches are already defined inside the data pipe, the batch size should be `None`. It is still OK to shuffle the training set, though.

In [None]:
from torch.utils.data import DataLoader

dataloaders = {}
# write your code here
dataloaders['train'] = ...
dataloaders['val'] = ...

Now, let's fetch a mini-batch from our data loader (just run the code below as is to visualize the output):

In [None]:
dl_out = next(iter(dataloaders['train']))
dl_out

Do you see any difference between the two outputs, from the (batched) datapipe and the data loader? The former returns a tuple while the latter returns a list, but the contents are the same: a mini-batch of features and a mini-batch of labels. The length of the features may differ depending on how long the longest sequence in a given mini-batch is.

Just run the two cells of code below as they are to visualize their outputs:

In [None]:
dp_out[0].shape, dl_out[0].shape # features

In [None]:
dp_out[1].shape, dl_out[1].shape # labels

This means that it is possible to use data pipes directly in the training loop.

### 8.5.4 Training

Now, it is time to write a training loop to fine-tune your XLM-RoBERTa model on the SST2 dataset. This is a large model, and the training set has over 60,000 data points, so you can train it over a single epoch, that is, looping over the mini-batches from the datapipe (or data loader) only once. For the sake of speed, keep the evalution for the end only.

#### 8.5.4.1 Loss Function

![](https://raw.githubusercontent.com/dvgodoy/assets/main/PyTorchInPractice/images/ch0/model_step2.png)

Sentiment analysis is a classification task, so we need to use the appropriate loss function for the task. Even though it is a binary classification, RoBERTa's classification head is actually producing two logits instead of one, so you have to use `CrossEntropyLoss` (which can handle two or more logits).

In [None]:
import torch.nn as nn

loss_fn = ...

#### 8.5.4.2 Optimizer

![](https://raw.githubusercontent.com/dvgodoy/assets/main/PyTorchInPractice/images/ch0/model_step3.png)

Although `Adam` is the optimizer of choice, we suggest you try out `AdamW`, a modified version that is also commonly used.

In [None]:
import torch.optim as optim

# suggested learning rate
lr = 1e-5

optimizer = ...

#### 8.4.4.2 Training Loop

![](https://raw.githubusercontent.com/dvgodoy/assets/main/PyTorchInPractice/images/ch0/model_step4.png)

So far, we haven't logged or inspected our losses in real-time. Why bother, if it takes only a minute to train the model? This time is different, though: fine-tuning RoBERTa on more than 67,000 data points, even for a single epoch, will take about 15 min or so in Google Colab. So, let's use TensorBoard to see how our loss is doing as training progresses.

First, we need to load it using the corresponding Jupyter magic (just run the code below as is to load TensorBoard):

In [None]:
%load_ext tensorboard
%tensorboard --logdir runs

Next, we need to create an instance of the `SummaryWriter` to be able to send loss values to TensorBoard. Just run the code below as is to create it:

In [None]:
from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter('runs/roberta')

Now, it's your turn to write the missing parts of the training loop below. We have already taken care of the sending the losses to TensorBoard for you.

In [None]:
from tqdm import tqdm

device = 'cuda' if torch.cuda.is_available() else 'cpu'

model.to(device)

batch_losses = []

## Training
for i, (batch_features, batch_targets) in tqdm(enumerate(datapipes['train'])):
    # Set the model's mode
    # write your code here
    ...
    
    # Send batch features and targets to the device
    # write your code here
    ...
    
    # Step 1 - forward pass
    # write your code here
    predictions = ...

    # Step 2 - computing the loss
    # write your code here
    loss = ...

    # Step 3 - computing the gradients
    # Tip: it requires a single method call to backpropagate gradients
    # write your code here
    ...

    batch_losses.append(loss.item())
    
    writer.add_scalars(main_tag='loss',
                       tag_scalar_dict={'training': loss.item()},
                       global_step=i)    

    # Step 4 - updating parameters and zeroing gradients
    # Tip: it takes two calls to optimizer's methods
    # write your code here
    ...


writer.close()

## Validation   
with torch.inference_mode():
    val_losses = []

    for i, (val_features, val_targets) in enumerate(dataloaders['val']):
        # Set the model's mode
        # write your code here
        ...

        # Send batch features and targets to the device
        # write your code here
        ...

        # Step 1 - forward pass
        # write your code here
        predictions = ...

        # Step 2 - computing the loss
        # write your code here
        loss = ...
        
        val_losses.append(loss.item())

By the end of it, your losses on TensorBoard should look more or less like this (if you drag the slider on the right to the maximum level of smoothing):

![](https://raw.githubusercontent.com/dvgodoy/assets/main/PyTorchInPractice/images/ch6/tensorboard.png)

### 8.5.5 Inference

![](https://raw.githubusercontent.com/dvgodoy/assets/main/PyTorchInPractice/images/ch0/model_step5.png)

Write a function that takes some text (a sequence of words), a model, its prescribed transformations, and a list of target categories for the classification, and returns the most likely category and the corresponding probability.

Since you're handling a single sequence, there's no need for any padding, but you still need to provide a tensor containing a mini-batch (of one) as input to the model.

The model returns two logits, one for each class, so you must use the softmax function to convert them into probabilities.

In [None]:
def predict(sequence, model, transforms_fn, categories):        
    # Build a tensor of token ids out of the input sequence
    # write your code here
    ...

    # Set the model to the appropriate mode
    # write your code here
    ...

    device = next(iter(model.parameters())).device
    
    # Use the model to make predictions/logits
    # Tip: Don't forget to send the input to the same device as the model
    # Tip: Don't forget models take mini-batches as inputs, not single data points
    # write your code here
    pred = ...
    
    # Compute the probabilities corresponding to the logits
    # and return the top value and index
    # write your code here
    probabilities = ...
    values, indices = ...
    
    return [{'label': categories[i], 'value': v.item()} for i, v in zip(indices, values)]

Now, try out your prediction function and fine-tuned model (just run the code cells below as they are to visualize their outputs):

In [None]:
categories = ['negative', 'positive']
text = "I am really liking this course"
predict(text, model, xlmr_base.transform(), categories)

In [None]:
text = "This course is too complicated!"
predict(text, model, xlmr_base.transform(), categories)

That's cool, but what if we could perform sentiment analysis out-of-the-box? That's what we'll do in the second part of Chapter 6.