# TP BERT



## Introduction

In this lab, we will:

0. Understand how to use a *Jupyter Notebook*, a very popular tool in data analysis  
1. Prepare a dataset for training a [BERT](https://arxiv.org/abs/1810.04805) model  
3. Understand how to use a pre-trained BERT model  
4. Create a multi-class classification model that leverages the hidden representations of a BERT encoder  
5. Train this model and test its performance  

With the following tools:  
1. [PyTorch](https://pytorch.org/docs/stable/index.html): an *open-source* Python library for machine learning  
2. [HuggingFace’s Transformers](https://huggingface.co/transformers/): a PyTorch-based library for natural language processing, especially Transformer models (like BERT)  
3. [HuggingFace’s Tokenizers](https://github.com/huggingface/tokenizers): a PyTorch-based library for tokenization, explicitly designed to work with the Transformers library  
4. Google Colab, which hosts this *Jupyter Notebook*. Before starting the lab, you can review introductory pages on [Colab](https://colab.research.google.com/github/tensorflow/examples/blob/master/courses/udacity_intro_to_tensorflow_for_deep_learning/l01c01_introduction_to_colab_and_python.ipynb#scrollTo=YHI3vyhv5p85) and [Notebooks](https://realpython.com/jupyter-notebook-introduction/)

Colab machines come with a Linux system and a Python environment with several pre-installed libraries like PyTorch. However, if you're using a Colab machine, you must first install the HuggingFace libraries, which are not pre-installed.

To run a Unix command on a Jupyter Notebook, you need to prefix the command with an exclamation mark: `!command`.

In [None]:
!pip install transformers tokenizers

Maintenant nous pouvons importer les bibliothéques principales dont nous aurons besoin.

In [None]:
!pip install matplotlib

In [None]:
import torch
import transformers
import pandas as pd

# Managing arrays
import numpy as np

import random, math

# Plotting tools:
import matplotlib.pyplot as plt
import seaborn as sns
# load the TensorBoard notebook extension
%load_ext tensorboard

**GPU**

L'apprentissage de notre réseau de neurones requiert beaucoup de calculs matriciels. Pour exécuter ces calculs plus rapidement, il est possible d'utiliser un processeur graphique (*GPU*) p. Si vous êtes sur Colab, vous pouvez utiliser un *GPU* en sélectionnant _Runtime -> Change runtime type -> GPU_.

In [None]:
if torch.cuda.is_available():
  print("GPU is available.")
  device = torch.cuda.current_device()
else:
  print("Will work on CPU.")

### To be submitted

Each exercise in this lab requires a response either in textual form or as code. Every answer must be written in one or more cells following the statement of each exercise.

You will submit a compressed directory named `tp_bert_nom1_nom2.zip` containing the contents of the `tp_bert.zip` directory, where the `tp_bert.ipynb` file has been updated with your responses.

## Data

We will train our model on a **multi-class classification task**. Specifically, the task is to classify texts into three sentiment categories:

1. Negative
2. Neutral
3. Positive

These data are collected in the [FinancialPhraseBank-v1.0 dataset](https://www.researchgate.net/publication/251231364_FinancialPhraseBank-v10), which you can find in the directory of this lab.

Here is the essential information to understand the task, as outlined in the README.txt file:

---

<em>The key arguments for the low utilization of statistical techniques in financial sentiment analysis have been the difficulty of implementation for practical applications and the lack of high quality training data for building such models. Especially in the case of finance and economic texts, annotated collections are a scarce resource and many are reserved for proprietary use only. To resolve the missing training data problem, we present a collection of ∼ 5000 sentences to establish human-annotated standards for benchmarking alternative modeling techniques.

<em>The objective of the phrase level annotation task was to classify each example sentence into a positive, negative or neutral category by considering only the information explicitly available in the given sentence. Since the study is focused only on financial and economic domains, the annotators were asked to consider the sentences from the view point of an investor only; i.e. whether the news may have positive, negative or neutral influence on the stock price. As a result, sentences which have a sentiment that is not relevant from an economic or financial perspective are considered neutral.

<em>This release of the financial phrase bank covers a collection of 4840 sentences. The selected collection of phrases was annotated by 16 people with adequate background knowledge on financial markets. Three of the annotators were researchers and the remaining 13 annotators were master’s students at Aalto University School of Business with majors primarily in finance, accounting, and economics.

<em>Given the large number of overlapping annotations (5 to 8 annotations per sentence), there are several ways to define a majority vote based gold standard. To provide an objective comparison, we have formed 4 alternative reference datasets based on the strength of majority agreement:

1. sentences with 100% agreement [file=Sentences_AllAgree.txt];
2. sentences with more than 75% agreement [file=Sentences_75Agree.txt];
3. sentences with more than 66% agreement [file=Sentences_66Agree.txt]; and
4. sentences with more than 50% agreement [file=Sentences_50Agree.txt].

<em>All reference datasets are included in the release. The files are in a machine-readable "@"-separated format:

<em>**sentence@sentiment**

<em>where sentiment is either "positive, neutral or negative".

<em>E.g.,  The operating margin came down to 2.4 % from 5.7 % .@negative<em>

---

In [None]:
## Colab
from google.colab import drive
drive.mount('/content/drive')
%cd drive/MyDrive/TP_BERT/
 ## your drive folder here up there

#### Exercise 1

We will use the sentences in _Sentences_75Agree.txt_ to train and test our model. To do this, download the file to the Colab machine (using the interface on the left) or, if you are working on a local _Jupyter Notebook_, place the _Sentences_75Agree.txt_ file in the same directory as the _Notebook_.

Next, write the `load_data()` function, which reads the sentences contained in this file and separates them from their labels. The result of this function, assigned to the variable `df_data`, should be a dataframe, where each entry contains a sentence as the first element and its label as the second element. For example:

````
data.iloc[0][0] == 'A high court in Finland has fined seven local asphalt companies more than   lion ( $ 117 million ) for operating a cartel .'
data.iloc[0][1] == 'negative'
````

> Note: The encoding of the _Sentences_75Agree.txt_ file is `iso-8859-1`.

In [None]:
# Function to load FinancialPhraseBank data
# Each line has the form: <sentence>@<label>
import pandas as pd

def load_data(filename, classes):
    """Load a FinancialPhraseBank 'Sentences_*.txt' file.

    Args:
        filename (str): path to the txt file.
        classes (list[str]): allowed labels (e.g., ['negative','neutral','positive']).

    Returns:
        pd.DataFrame: two columns [0]=sentence, [1]=label
    """
    sentences = []
    labels = []
    with open(filename, 'r', encoding='utf-8', errors='ignore') as f:
        for line in f:
            line = line.strip()
            if not line:
                continue
            # split on the last '@' (sentences may contain '@' rarely)
            if '@' not in line:
                continue
            sent, lab = line.rsplit('@', 1)
            lab = lab.strip()
            if lab in classes:
                sentences.append(sent.strip())
                labels.append(lab)
    return pd.DataFrame({0: sentences, 1: labels})

# Choose one of the provided datasets (66/75/AllAgree)
filename = "FinancialPhraseBank-v1.0/Sentences_75Agree.txt"
classes = ['negative', 'neutral', 'positive']

df_data = load_data(filename, classes)
df_data.head()


Test your code by running it on the following cell.

In [None]:
assert type(df_data.iloc[0][0]) == str, "The first column should be a sentence."
assert df_data.iloc[0][1] in classes, "The second column should belong to one of the three classes available."
assert len(df_data) == 3453, "The size of dataframe should be of 3453 sentences."
print("Test passed!")

Now, we will split our data for training in order to test our model.
**Question** : Write a function that assigns a Split column (see below)

In [None]:
df_data.iloc[2].name

In [None]:
import numpy as np

def assign_train_test_split(df, train_size=0.80, seed=42):
    """Create a 'Split' column with values 'Train' or 'Test'.

    The assignment is random but reproducible with the seed.
    """
    rng = np.random.default_rng(seed)
    mask = rng.random(len(df)) < train_size
    df.loc[:, 'Split'] = np.where(mask, 'Train', 'Test')
    return df

assign_train_test_split(df_data)
df_data['Split'].value_counts()

# Add numeric label column:
label_map = {'neutral': 0, 'positive': 1, 'negative': 2}
df_data.loc[:, 'label'] = df_data[1].map(label_map).astype(int)
df_data[['Split', 1, 'label']].head()


In [None]:
#Did your function work?
assert(np.fabs(len(df_data[df_data["Split"]=="Train"])-len(df_data)*0.8) < 100)
assert(np.fabs(len(df_data[df_data["Split"]=="Test"])-len(df_data)*0.2) < 100)

In [None]:
df_data.value_counts("Split", normalize=True)

#### Exercise 2

Before a classification task, it’s always a good idea to check that the training and test data are (more or less) evenly distributed concerning the existing classes.

1. Why?

2. What is the number of training and test data corresponding to each class?

3. Given the distribution of labels in the _test set_, what are the expectations regarding the performance of our model in terms of _accuracy_?

In [None]:
# WRITE CODE TO ANSWER THE QUESTIONS HERE,  PRINT ANSWERS OR WRITE THEM IN A TEXT CELL BELOW

### Baseline


#### Exercise 3

In the cell below, we train a [multiclass naive Bayes classification model](https://en.wikipedia.org/wiki/Naive_Bayes_classifier) to establish a baseline performance. Thus, the goal moving forward will be to outperform this naive model.

1. Why do we expect a BERT-type classification model to be more powerful than the naive Bayes model?
2. Under what conditions can non-neural models be more advantageous?

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import cross_val_score

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import f1_score


In [None]:
def get_tfidf_vectors_and_labels(df, split = "Train", max_features = 1000):
    #new TF-IDF vectorizer considering only the 1000 most common terms
    vectorizer = TfidfVectorizer(max_features=max_features)

    #compute the TF-IDF vectors for each text
    vectorizer.fit(df[df.Split==split][0])
    vectors = vectorizer.transform(df[df.Split==split][0])
    labels = df[df.Split==split][1]
    return vectors.toarray(), labels

In [None]:
def do_tfidf_prediction(df, max_features, model = 'nb'):
    vectors_train, labels_train = get_tfidf_vectors_and_labels(df, split="Train", max_features = max_features)
    vectors_test, labels_test = get_tfidf_vectors_and_labels(df, split="Test", max_features = max_features)

    if model == 'nb':
      # train a multinomial Naive-Bayes classifier on the tfidf vectors
      classifier = MultinomialNB().fit(vectors_train, labels_train)
    else:
      ## Here is another classifier:
      classifier = LogisticRegression().fit(vectors_train, labels_train)

    # use classifier to predict the labels of the test set
    predicted_train = classifier.predict(vectors_train)
    predicted_test = classifier.predict(vectors_test)

    f1_train = f1_score(labels_train, predicted_train, average='macro') #or weighted if you'd like!
    f1_test = f1_score(labels_test, predicted_test, average='macro') #or weighted if you'd like!
    accuracy = (predicted_test == labels_test).sum()/len(predicted_test)

    return f1_train, f1_test, accuracy

In [None]:
f1_train, f1_test, accuracy = do_tfidf_prediction(df_data, max_features = 100)
f1_train, f1_test, accuracy

#### Evaluation
We have several choices of metric to measure the performance of our classifier. The most obvious is accuracy - simply the number of correct predictions over the number of total predictions. However, this metric is less interesting for unabalanced classes (as we have in this case).

**Question**: give an example of a classification problem in which the accuracy metric is misleadingly optimistic.

Therefore, in NLP classification problems, we typically use the F1-score, which is the harmonic mean between precision and recall. This gives us a more nuanced picture of our classifier's performance over all the classes. There are several flavours of F1 score, most notably "macro" and "weighted" - the former weights each class equally, while the latter weights each class according to its prevalence, giving higher weight to more prevalent classes.

**Question**: Which type of F1 score would you like to use?

In [None]:
f1_scores_train = []
f1_scores_valid = []
max_features_array = [5, 10, 50, 100, 200,300,400,500,600,700,800,900,1000]
for max_features in max_features_array:
    f1_train, f1_valid, acc = do_tfidf_prediction(df_data, max_features = max_features)
    f1_scores_train.append(f1_train)
    f1_scores_valid.append(f1_valid)

plt.plot(np.arange(len(max_features_array)),f1_scores_train,label="F1 Train")
plt.plot(np.arange(len(max_features_array)),f1_scores_valid,label="F1 Test")
plt.xticks(np.arange(len(max_features_array)),max_features_array)
plt.xlabel("Number of Features")
plt.ylabel("F1 score")
plt.legend()

**Question**: What is the optimal value for max_features? Evaluate the classifier on the test set for that best value.

## Tokenization

The HuggingFace library provides tools for tokenizing text data according to the models we will use: [see the docs](https://huggingface.co/transformers/tokenizer_summary.html).

In this lab, we will use a pre-trained model called [DistilBERT](https://arxiv.org/pdf/1910.01108.pdf): an encoder with a [Transformer](https://arxiv.org/abs/1706.03762) architecture that is a distilled version of BERT, making it lighter in terms of memory and faster. The authors present it in their paper as follows:

<em>As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-the-edge and/or under constrained computational training or inference budgets remains challenging. In this work, we propose a method to pre-train a smaller general-purpose language representation model, called DistilBERT, which can then be fine-tuned with good performances on a wide range of tasks like its larger counterparts. While most prior work investigated the use of distillation for building task-specific models, we leverage knowledge distillation during the pre-training phase and show that it is possible to reduce the size of a BERT model by 40%, while retaining 97% of its language understanding capabilities and being 60% faster. To leverage the inductive biases learned by larger models during pre-training, we introduce a triple loss combining language modeling, distillation and cosine-distance losses. Our smaller, faster and lighter model is cheaper to pre-train and we demonstrate its capabilities for on-device computations in a proof-of-concept experiment and a comparative on-device study.

In [None]:
x_train = df_data[df_data.Split=='Train'][0].values
y_train = df_data[df_data.Split=='Train']['label'].values
x_test = df_data[df_data.Split=='Test'][0].values
y_test = df_data[df_data.Split=='Test']['label'].values

In [None]:
from transformers import DistilBertTokenizer

MAX_LEN = 512

tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased', padding=True, truncation=True)

# let's check out how the tokenizer works
for n in range(3):
    # tokenize sentences
    tokenizer_out = tokenizer(x_train[n])
    # convert numerical tokens to alphabetical tokens
    encoded_tok = tokenizer.convert_ids_to_tokens(tokenizer_out.input_ids)
    # decode tokens back to string
    decoded = tokenizer.decode(tokenizer_out.input_ids)
    print(tokenizer_out)
    print(encoded_tok, '\n')
    print(decoded, '\n')
    print('---------------- \n')


#### Exercise 4

1. What does the output of the tokenizer (`tokenizer_out`) represent?
2. What are the special tokens introduced by the tokenizer? What is their function?
3. Why do some tokens start with ## (for example, ##rea ##der)?
4. Note that the tokenizer we are using has been pre-trained. Why does a tokenizer like this require pre-training before its application?

#### Exercise 5

BERT (and DistilBERT) handles sequences with a maximum length of 512 (`MAX_LEN=512). Check if this length is optimal for our task. In other words, verify what the distribution of text lengths is that we need to classify, and whether it would be beneficial to reduce MAX_LEN in order to decrease the processing time of the sequences.

In [None]:
# WRITE CODE TO ANSWER THE QUESTION HERE.  PRINT THE ANSWER OR WRITE IT IN A TEXT CELL BELOW

Now that we understand how the tokenizer works, we can write a Dataset class that will serve us for training and testing. Indeed, the Python classes that handle batch creation and training require a Dataset-type class as input, like the one below.

**Note**: Change `MAX_LEN` if you found a value in the previous exercise that is more advantageous in terms of computation time.

In [None]:
from torch.utils.data import Dataset, DataLoader

# MAX_LEN =

class MyDataset(Dataset):
    def __init__(self, sentences, labels, tokenizer, max_len):
        # variables that are set when the class is instantiated
        self.sentences = sentences
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_len = max_len

    def __len__(self):
        return len(self.sentences)

    def __getitem__(self, item):
        # select the sentence and its class
        sentence = str(self.sentences[item])
        label = self.labels[item]
        # tokenize the sencence
        tokenizer_out = self.tokenizer(
            sentence,
            add_special_tokens=True,
            max_length=self.max_len,
            return_token_type_ids=False,
            pad_to_max_length=True,
            return_attention_mask=True,
            return_tensors='pt',
            truncation=True
            )
        # return a dictionary with the output of the tokenizer and the label
        return  {
            'input_ids': tokenizer_out['input_ids'].flatten(),
            'attention_mask': tokenizer_out['attention_mask'].flatten(),
            'label': torch.tensor(label, dtype=torch.long)
        }


# instantiate two MyDataset objects
train_dataset = MyDataset(x_train, y_train, tokenizer, MAX_LEN)
test_dataset = MyDataset(x_test, y_test, tokenizer, MAX_LEN)

In [None]:
train_dataset[0]

## Getting to know the model

It’s time to understand how DistilBERT works. Hugging Face offers us several pre-trained DistilBERT models; we will use [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased), which has a deep architecture with 66,362,880 parameters and was pre-trained on the BookCorpus dataset for 90 hours using eight 16 GB GPUs. This will allow us to achieve very good results by attaching a very lightweight classifier (a single layer) on top of DistilBERT, which will do most of the work, that is, encoding our texts. Our classifier will only require training for a few minutes.

We can find the names of all the other pre-trained models offered by Hugging Face [at this address](https://huggingface.co/models).

To download a pre-trained model, we use the `.from_pretrained` option:


In [None]:
from transformers import DistilBertModel

PRE_TRAINED_MODEL_NAME = 'distilbert-base-uncased'

distilbert = DistilBertModel.from_pretrained(PRE_TRAINED_MODEL_NAME)
distilbert

We can encode the text sequences with a forward pass through DistilBERT, which returns the hidden representations from the last layer.

**Question**: Please describe the model. What is the number of layer? What is the dimension of those layers?


**Note**: We use the `.unsqueeze(0)` function because normally we pass batches to the model, while this time it is just a single element from the *MyDataset* class. By using `.unsqueeze(0)`, we treat it as a batch of size 1.

In [None]:
first_sent = train_dataset[0]

hidden_state = distilbert(
    input_ids=first_sent['input_ids'].unsqueeze(0), attention_mask=first_sent['attention_mask'].unsqueeze(0)
    )

hidden_state[0].shape

In [None]:
distilbert.config

**Question**: What is the size of the vocabulary?

#### Exercise 6

We now have the necessary elements to build a multi-class classification model based on DistilBERT.

Regarding the code below:

1. What is the function of this code snippet?
   ```Python
   if freeze_encoder:
       for param in self.encoder.parameters():
           param.requires_grad = False
   ```

2. Why do we only keep the hidden representation of the first token of each sequence with the command `pooled_output = hidden_state[:, 0]`?

3. Complete the code under `if labels is not None:` in order to calculate the loss function of the model when targets are passed to the `forward` function.


In [None]:
# See: https://github.com/huggingface/transformers/blob/main/src/transformers/models/distilbert/modeling_distilbert.py#L1304

In [None]:
import torch
import torch.nn as nn
from transformers import DistilBertPreTrainedModel

PRE_TRAINED_MODEL_NAME = 'distilbert-base-uncased'
FREEZE_PRETRAINED_MODEL = True

class DistilBertForSentimentClassification(DistilBertPreTrainedModel):
    """A lightweight DistilBERT classifier (3-way sentiment).

    We reuse DistilBERT as an encoder and add a small classification head.
    """
    def __init__(self, config, num_labels, freeze_encoder=False):
        super().__init__(config)
        self.num_labels = num_labels

        # DistilBERT encoder (pretrained)
        self.encoder = DistilBertModel.from_pretrained(PRE_TRAINED_MODEL_NAME)

        # Optional: freeze encoder to train only the head
        if freeze_encoder:
            for p in self.encoder.parameters():
                p.requires_grad = False

        # Classification head (similar to HF's DistilBertForSequenceClassification)
        self.pre_classifier = nn.Linear(config.dim, config.dim)
        self.classifier = nn.Linear(config.dim, num_labels)
        self.dropout = nn.Dropout(config.seq_classif_dropout)

        self.post_init()

    def forward(
        self,
        input_ids=None,
        attention_mask=None,
        labels=None,
        output_attentions=None,
        output_hidden_states=None,
        return_dict=True,
        **kwargs,
    ):
        # Encode
        outputs = self.encoder(
            input_ids=input_ids,
            attention_mask=attention_mask,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=True,
        )

        hidden_state = outputs.last_hidden_state  # (bs, seq_len, dim)
        pooled = hidden_state[:, 0]               # CLS token representation
        pooled = self.pre_classifier(pooled)
        pooled = nn.ReLU()(pooled)
        pooled = self.dropout(pooled)
        logits = self.classifier(pooled)

        loss = None
        if labels is not None:
            loss_fct = nn.CrossEntropyLoss()
            loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))

        if not return_dict:
            output = (logits, outputs.hidden_states, outputs.attentions)
            return ((loss,) + output) if loss is not None else output

        return {
            'loss': loss,
            'logits': logits,
            'hidden_states': outputs.hidden_states,
            'attentions': outputs.attentions,
        }

# instantiate model
model = DistilBertForSentimentClassification(
    config=distilbert.config,
    num_labels=len(classes),
    freeze_encoder=FREEZE_PRETRAINED_MODEL,
)

# print info about model's parameters
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print('model total params: ', total_params)
print('model trainable params: ', trainable_params)
print('\n', model)


**Question**: what does the line `pooled_output = hidden_state[:, 0]` do? *Hint*: it selects a token. What is specific about it?

## Training

*Let's train our classifier!*

The time has come. Fortunately, *HuggingFace* provides us with the necessary Python class to handle the training: `Trainer`. We pass it the hyperparameters using the `TrainingArguments` class. We will also pass it the metrics we want for model evaluation during the test phase:

1. accuracy
2. precision
3. recall
4. f1


In [None]:
# clean logs and results directory from old files
# this could be useful if we were running the cells below multiple times
!rm -r ./logs ./results

In [None]:
from transformers import Trainer, TrainingArguments
from sklearn.metrics import precision_recall_fscore_support, accuracy_score

def compute_metrics(pred):
    labels = pred.label_ids
    preds = pred.predictions.argmax(-1)

    precision, recall, f1, _ = precision_recall_fscore_support(
        labels, preds, average='macro', zero_division=0
    )
    acc = accuracy_score(labels, preds)
    return {
        'accuracy': acc,
        'precision': precision,
        'recall': recall,
        'f1': f1,
    }

training_args = TrainingArguments(
    output_dir='./results',
    evaluation_strategy='epoch',
    save_strategy='epoch',
    logging_dir='./logs',
    logging_first_step=True,
    logging_steps=50,
    num_train_epochs=16,
    per_device_train_batch_size=8,
    learning_rate=5e-5,
    weight_decay=0.01,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    compute_metrics=compute_metrics,
)


To visualize the evolution of the *loss* throughout the training, we can use *TensorBoard*:

In [None]:
%tensorboard --logdir logs

In [None]:
train_results = trainer.train()

In [None]:
test_results = trainer.predict(test_dataset=test_dataset)

**Question**: complete the following.

In [None]:
print('Predictions: \n', test_results.predictions)
print('\nAccuracy: ', test_results.metrics.get('test_accuracy'))
print('Precision (macro): ', test_results.metrics.get('test_precision'))
print('Recall (macro): ', test_results.metrics.get('test_recall'))
print('F1 (macro): ', test_results.metrics.get('test_f1'))
print('\nClasses:', classes)


If we did things correctly, the performance (*accuracy*) of our model in classifying sentences it has never seen is around 70.0%. This is not very satisfying when we consider:

1. that the class distribution in our test set was not balanced (see Exercise 2)
2. that a naïve Bayes classification model achieves similar or even better accuracy (see Exercise 3)


**Question**: visualize the prediction of the model using a confusion matrix

### Enhance the performances

#### Exercise 7

1. How can we improve the model's result?
2. Retrain the model by tuning the hyperparameters and calculate its performance using the metrics previously used. What _accuracy_ and _F1_ scores do you achieve?

**Hint**: consider letting the model improve the sentence encoding...

In [None]:
# WRITE CODE TO ANSWER THE QUESTION HERE.  PRINT THE ANSWER OR WRITE IT IN A TEXT CELL BELOW

If we are satisfied with the performance of our model, we can save it to use in the next exercise:


In [None]:
MODEL_PATH = './my_model'
trainer.save_model(MODEL_PATH)

## Predictions

#### Exercise 8

Using the model we just saved, predict the class to which the following sentences belong:

````
  "CocaCola saw its share price dropping of more than 25% this semester.",
  "Despite most of the company's sales are taking place in China, the shareholders decided not to relocate the production there.",
  "This year's profits quintuplicated with respect to the last year."
````

1. What classes did the model predict?
2. With what probability?

In [None]:
# WRITE CODE TO ANSWER THE QUESTION HERE.  PRINT THE ANSWER OR WRITE IT IN A TEXT CELL BELOW


## Try another model

#### Exercise 9 (bonus)

Experiment with other pre-trained models. The list of pre-trained models is available here: https://huggingface.co/models.

- What did you try? Leave your code in the cells below.
- What did you learn?