###### Week 9: Sentence Level Classification with BERT

Your goal this week is to train a classifier that can predict the CEFR level of any given sentence. In this notebook we will guide you through the process of using 🤗[Hugging Face](https://huggingface.co/) and its transformers library as the training framework, with [Pytorch](https://pytorch.org/) as the deep learning backend, but feel free to use [TensorFlow](https://www.tensorflow.org) if that's what you are more familiar with.

For this assignment we will provide a dataset containing sentences with the corresponding CEFR level, and you have to use BERT and train a sentence classifier with this dataset.

## Prepare your environment

As always, we highly recommend that you install all packages with a virtual environment manager, like [venv](https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/) or [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html), to prevent version conflicts of different packages.  

### Install CUDA
Deep learning is a computionally extensive process. It takes lots of time if relying only on the CPU, especially when it's trained on a large dataset. That's why using GPU instead is generally recommended.  
To use GPU for computation, you have to install [CUDA toolkit](https://developer.nvidia.com/cuda-toolkit) as well as the [cuDNN library](https://developer.nvidia.com/cudnn) provided by NVIDIA.  

If you already had CUDA installed on your machine, then great! You're done here.  
If you don't, you can refer to [Appendix](#Appendix-1-Install-CUDA) to see how to do so.


### Install python packages
The following python packages will be used in this tutorial:

1. `numpy`: for matrix operation
2. `scikit-learn`: for label encoding
3. `datasets`: for data preparation
4. `transformers`: for model loading and finetuing
5. `pytorch`: the backend DL framework
  - Note that the pt version must support the CUDA version you've installed if you want to use GPU.

### Select GPU(s) for your backend

Skip this section if you have no intension of using GPU with tensorflow/pytorch.

In [1]:
import os

# select your GPU. Note that this should be set before you load tensorflow or pytorch.
os.environ["CUDA_VISIBLE_DEVICES"] = "1"

# To use multiple GPUs, combine all GPU ID with commas
# e.g. >>> os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,3"

In [2]:
import torch
# Check if any GPU is used
torch.cuda.is_available()

True

## Prepare the dataset

Before starting the training, we need to load and process our dataset - but wait, let's decide which model we want to use first.  

In the highly unlikely chance you've never heard of it, [BERT](https://arxiv.org/abs/1810.04805) (**B**idirectional **E**ncoder **R**epresentations from **T**ransformers) is a language model proposed by Google AI in 2018, and it's currently one of the most popular models used in NLP.  
You can learn more about it here:
- [BERT Explained: A Complete Guide with Theory and Tutorial](https://towardsml.com/2019/09/17/bert-explained-a-complete-guide-with-theory-and-tutorial/) by Samia, 2019.


However, we will not directly use BERT in this tutorial, because it's large and takes too long to train. Instead, we'll be using [DistilBert](https://medium.com/huggingface/distilbert-8cf3380435b5), a version of BERT that while light-weight, reserves 95% of its original accuracy.




In [3]:
# the model you want to use. Available models can be found here: https://huggingface.co/models
MODEL_NAME = "distilbert-base-uncased"

### Load data

Similar to the `transformers` library, `datasets` is also a package by huggingface. It contains many public datasets online and can help us with the data processing.  
We can use `load_dataset` function to read the input `.csv` file provided for this assignment.

Reference:
 - [Official datasets document](https://huggingface.co/docs/datasets)
 - [datasets.load_dataset](https://huggingface.co/docs/datasets/loading.html)

In [4]:
from datasets import load_dataset

dataset = load_dataset("csv", data_files="data")
dataset

Using custom data configuration default-942e063d8014240f
Found cached dataset csv (/home/bill/.cache/huggingface/datasets/csv/default-942e063d8014240f/0.0.0/6b34fb8fcf56f7c8ba51dc895bfa2bfbe43546f190a60fcf74bb5e8afdcc2317)


  0%|          | 0/1 [00:00<?, ?it/s]

DatasetDict({
    train: Dataset({
        features: ['text', 'level'],
        num_rows: 23020
    })
})

In [5]:
print(dataset["train"])
print(dataset["train"][1])

Dataset({
    features: ['text', 'level'],
    num_rows: 23020
})
{'text': "Unfortunately he was too fast and I couldn't keep up with him.", 'level': 'B2'}


In [6]:
print(dataset["train"]["text"][:5])
print(dataset["train"]["level"][:5])

['No longer a remote, backward, unimportant country, it became a force to be reckoned with in Europe.', "Unfortunately he was too fast and I couldn't keep up with him.", 'Most mushrooms are totally harmless, but some are poisonous.', 'This provided solid evidence that he committed the crime.', "You can't just accept everything you read in the newspapers at face value."]
['C2', 'B2', 'B2', 'C2', 'C1']


### Preprocessing

As always, texts should be tokenized, embedded, and padded before being put into the model.  
But not to worry, there are libraries from huggingface to help with this, too.

#### Sentence processing

Different pre-trained language models may have their own preprocessing models, and that's why we should use the tokenizers trained along with that model. In our case, we are using distilBERT, so we should use the distilBERT tokenizer.  

With huggingface, loading different tokenizers is extremely easy: just import the AutoTokenizer from `transformers` and tell it what model you plan to use, and it will handle everything for you.

Reference:
 - [transformers.AutoTokenizer](https://huggingface.co/docs/transformers/master/en/model_doc/auto#transformers.AutoTokenizer)

In [7]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
tokenizer("Unfortunately he was too fast and I couldn't keep up with him.")

{'input_ids': [101, 6854, 2002, 2001, 2205, 3435, 1998, 1045, 2481, 1005, 1056, 2562, 2039, 2007, 2032, 1012, 102], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}

#### Label processing

Our labels also need to be processed, so let's do that next.

For this tutorial, we'll use the OneHotEncoder provided by scikit-learn.

For now, just declare a new encoder and use `fit` to learn the data. Hint: you should still end up with 6 labels.

Documents:
 - [sklearn.preprocessing.OneHotEncoder](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder)

In [8]:
import pandas as pd

df = pd.DataFrame(dataset["train"])
df.head()

Unnamed: 0,text,level
0,"No longer a remote, backward, unimportant coun...",C2
1,Unfortunately he was too fast and I couldn't k...,B2
2,"Most mushrooms are totally harmless, but some ...",B2
3,This provided solid evidence that he committed...,C2
4,You can't just accept everything you read in t...,C1


In [9]:
df[["level"]].head()

Unnamed: 0,level
0,C2
1,B2
2,B2
3,C2
4,C1


In [10]:
from sklearn.preprocessing import OneHotEncoder

encoder = OneHotEncoder().fit(df[["level"]])
encoder.categories_

[array(['A1', 'A2', 'B1', 'B2', 'C1', 'C2'], dtype=object)]

In [11]:
# check if you still have 6 labels
LABELS_NUM = len(encoder.categories_[0])
LABELS_NUM

6

In [12]:
encoded_levels = encoder.transform(df[["level"]]).toarray()
encoded_levels[:5]

array([[0., 0., 0., 0., 0., 1.],
       [0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0., 1.],
       [0., 0., 0., 0., 1., 0.]])

In [13]:
encoder.transform(pd.DataFrame({"level": ["B2"]})).toarray()[0]

array([0., 0., 0., 1., 0., 0.])

#### Process the data

To make things easier, we can write a function to process our dataset in batches. 

In [14]:
def preprocess(dataslice):
  """ Input: a batch of your dataset
      Example: { 'text': [['sentence1'], ['sentence2'], ...],
                 'level': ['label1', 'label2', ...] }
  """
  tokenized_inputs = tokenizer(dataslice["text"])
  labels = []
  for level in dataslice["level"]:
    encoded_level = encoder.transform(pd.DataFrame({"level": [level]})).toarray()[0]
    labels.append(encoded_level)
  tokenized_inputs["label"] = labels
  return tokenized_inputs
  """ Output: a batch of processed dataset
      Example: { 'input_ids': ...,
                 'attention_mask': ...,
                 'label': ... }
  """

In [15]:
# map the function to the whole dataset
processed_data = dataset.map(preprocess, batched = True)
processed_data

Loading cached processed dataset at /home/bill/.cache/huggingface/datasets/csv/default-942e063d8014240f/0.0.0/6b34fb8fcf56f7c8ba51dc895bfa2bfbe43546f190a60fcf74bb5e8afdcc2317/cache-f6760c99273257a6.arrow


DatasetDict({
    train: Dataset({
        features: ['text', 'level', 'input_ids', 'attention_mask', 'label'],
        num_rows: 23020
    })
})

In [16]:
processed_data['train'][1]

{'text': "Unfortunately he was too fast and I couldn't keep up with him.",
 'level': 'B2',
 'input_ids': [101,
  6854,
  2002,
  2001,
  2205,
  3435,
  1998,
  1045,
  2481,
  1005,
  1056,
  2562,
  2039,
  2007,
  2032,
  1012,
  102],
 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
 'label': [0.0, 0.0, 0.0, 1.0, 0.0, 0.0]}

### DataCollator

You might have noticed that we skipped padding the sentences. That's because we are going to do it during training.  

To do training-time processing, we can use the DataCollator Class provided by `transformers`. And guess what - transformers has a class that will handle padding for us, too!

 - [transformers.DataCollatorWithPadding](https://huggingface.co/docs/transformers/master/en/main_classes/data_collator#transformers.DataCollatorWithPadding)

In [17]:
# declare a collator to do padding during traning
from transformers import DataCollatorWithPadding

data_collator = DataCollatorWithPadding(tokenizer)

## Training

Finally, we can move on to training.

### Preparation

We can load the pretrained model from `transformers`.  
Generally, you need to build your own model on top of BERT if you want to use BERT for some downstream tasks, but again, sequence classification is a popular topic. With the support from `transformers` library, it can be done in two lines of codes: 

1. Load `AutoModelForSequenceClassification` Class.
2. Load the pretrained model.

In [18]:
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME, num_labels = LABELS_NUM)

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_transform.bias']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'classifier.bias', 'classifier

#### Split train/val data

The `Dataset` class we prepared before has a `train_test_split` method. You can use it to split your (processed) dataset.

Document:
 - [datasets.Dataset - Sort, shuffle, select, split, and shard](https://huggingface.co/docs/datasets/process.html#sort-shuffle-select-split-and-shard)

In [19]:
# choose a validation size and split your data
train_val_dataset = processed_data["train"].train_test_split(test_size=0.1, seed=42)

Loading cached split indices for dataset at /home/bill/.cache/huggingface/datasets/csv/default-942e063d8014240f/0.0.0/6b34fb8fcf56f7c8ba51dc895bfa2bfbe43546f190a60fcf74bb5e8afdcc2317/cache-f50113bbaf795ae7.arrow and /home/bill/.cache/huggingface/datasets/csv/default-942e063d8014240f/0.0.0/6b34fb8fcf56f7c8ba51dc895bfa2bfbe43546f190a60fcf74bb5e8afdcc2317/cache-325c892c50db8cac.arrow


In [20]:
print(train_val_dataset)

DatasetDict({
    train: Dataset({
        features: ['text', 'level', 'input_ids', 'attention_mask', 'label'],
        num_rows: 20718
    })
    test: Dataset({
        features: ['text', 'level', 'input_ids', 'attention_mask', 'label'],
        num_rows: 2302
    })
})


#### Setup training parameters

We are using the TrainerAPI to do the training. Trainer is yet another utility provided by huggingface, which helps you train the model with ease.  

Document:
- [transformers.TrainingArguments](https://huggingface.co/docs/transformers/master/en/main_classes/trainer#transformers.TrainingArguments)
- [transformers.Trainer](https://huggingface.co/docs/transformers/master/en/main_classes/trainer#transformers.Trainer)

In [21]:
from transformers import TrainingArguments, Trainer

In [22]:
# set and tune your training properties
OUTPUT_DIR = "./trained_model/"
LEARNING_RATE = 1e-5
BATCH_SIZE = 128
EPOCH = 20
training_args = TrainingArguments(
  output_dir=OUTPUT_DIR,
  learning_rate=LEARNING_RATE,
  per_device_train_batch_size=BATCH_SIZE,
  per_device_eval_batch_size=BATCH_SIZE,
  num_train_epochs=EPOCH
)

# now give all the information to a trainer
trainer = Trainer(
  model=model,
  args=training_args,
  train_dataset=train_val_dataset["train"],
  eval_dataset=train_val_dataset["test"],
  tokenizer=tokenizer
)

### Training

This is the easy part. Simply ask the trainer to train the model for you!

In [23]:
train_model = True
if train_model:
  trainer.train()

The following columns in the training set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text, level. If text, level are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 20718
  Num Epochs = 20
  Instantaneous batch size per device = 128
  Total train batch size (w. parallel, distributed & accumulation) = 128
  Gradient Accumulation steps = 1
  Total optimization steps = 3240
  Number of trainable parameters = 66958086
You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
500,0.3864
1000,0.2996
1500,0.252
2000,0.2176
2500,0.193
3000,0.179


Saving model checkpoint to ./trained_model/checkpoint-500
Configuration saved in ./trained_model/checkpoint-500/config.json
Model weights saved in ./trained_model/checkpoint-500/pytorch_model.bin
tokenizer config file saved in ./trained_model/checkpoint-500/tokenizer_config.json
Special tokens file saved in ./trained_model/checkpoint-500/special_tokens_map.json
Saving model checkpoint to ./trained_model/checkpoint-1000
Configuration saved in ./trained_model/checkpoint-1000/config.json
Model weights saved in ./trained_model/checkpoint-1000/pytorch_model.bin
tokenizer config file saved in ./trained_model/checkpoint-1000/tokenizer_config.json
Special tokens file saved in ./trained_model/checkpoint-1000/special_tokens_map.json
Saving model checkpoint to ./trained_model/checkpoint-1500
Configuration saved in ./trained_model/checkpoint-1500/config.json
Model weights saved in ./trained_model/checkpoint-1500/pytorch_model.bin
tokenizer config file saved in ./trained_model/checkpoint-1500/token

### Save for future use

Hint: try using `save_pretrained`

In [24]:
save_model = False
if save_model:
  model.save_pretrained(OUTPUT_DIR)

Configuration saved in ./trained_model/config.json
Model weights saved in ./trained_model/pytorch_model.bin


## Prediction

Now we know exactly how to train a model, but how do we use it for predicting results?

### Load finetuned model

In [25]:
# load the model that you saved

mymodel = AutoModelForSequenceClassification.from_pretrained(OUTPUT_DIR)

loading configuration file ./trained_model/config.json
Model config DistilBertConfig {
  "_name_or_path": "./trained_model/",
  "activation": "gelu",
  "architectures": [
    "DistilBertForSequenceClassification"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3",
    "4": "LABEL_4",
    "5": "LABEL_5"
  },
  "initializer_range": 0.02,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2,
    "LABEL_3": 3,
    "LABEL_4": 4,
    "LABEL_5": 5
  },
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "pad_token_id": 0,
  "problem_type": "multi_label_classification",
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "tie_weights_": true,
  "torch_dtype": "float32",
  "transformers_version": "4.24.0",
  "vocab_size": 30522
}

loading weights file ./trained_model/py

### Get the prediction

Here are a few example sentences:

In [26]:
examples = [
    # A2
    "Remember to write me a letter.",
    # B2
    "Strawberries and cream - a perfect combination.",
    "This so-called \"Perfect Evening\" was so disappointing, as well as discouraging us from coming to your Circle Theatre again.",
    # C1
    "Some may altogether give up their studies, which I think is a disastrous move.",
]

All we need to do is to transform them to embeddings, and then we can get predictions by calling your finetuned model.  

Since we don't have a DataCollator to pad the sentence and do the matrix transformation this time, we have to pad and transform the matrice on our own.

In [27]:
# Transform the sentences into embeddings
model_input = tokenizer(examples, truncation=True, padding=True, return_tensors="pt")
# Get the output
logits = mymodel(**model_input).logits
logits

tensor([[-0.7418,  0.7113, -3.2214, -4.4236, -4.0650, -4.0229],
        [-5.7855, -5.3816, -3.4287,  1.6458, -2.5465, -2.3510],
        [-5.5406, -4.9264, -1.6752,  1.4860, -3.0862, -4.9147],
        [-5.9225, -5.8372, -4.5457,  1.4580, -1.6507, -3.1034]],
       grad_fn=<AddmmBackward0>)

Logits aren't very readable for us. Let's use softmax 
activation to transform them into more probability-like numbers.

In [28]:
from torch import nn

predicted_probabilities = nn.functional.softmax(logits, dim=-1)
predicted_probabilities

tensor([[1.8318e-01, 7.8337e-01, 1.5347e-02, 4.6124e-03, 6.6013e-03, 6.8857e-03],
        [5.6897e-04, 8.5212e-04, 6.0065e-03, 9.6041e-01, 1.4514e-02, 1.7646e-02],
        [8.4011e-04, 1.5527e-03, 4.0092e-02, 9.4616e-01, 9.7791e-03, 1.5710e-03],
        [5.8867e-04, 6.4106e-04, 2.3323e-03, 9.4439e-01, 4.2179e-02, 9.8666e-03]],
       grad_fn=<SoftmaxBackward0>)

#### Transform logits back to labels

Now you've got the output. Write a function to map it back into labels!

In [29]:
categories = encoder.categories_[0]
categories

array(['A1', 'A2', 'B1', 'B2', 'C1', 'C2'], dtype=object)

In [30]:
import numpy as np

def get_labels(predicted_probabilities):
  return [categories[label_idx] for label_idx in np.argmax(predicted_probabilities.detach().numpy(), axis=1)]

get_labels(predicted_probabilities)

['A2', 'B2', 'B2', 'B2']

## Evaluation

Let's see how you did!  
Load the testing data and calculate your accuracy.

We want you to calculate the three kinds of accuracy mentioned in the lecture, which will also be explained in the following section.

In [31]:
# load test data
examples = train_val_dataset["test"]["text"]
examples[:5]

['She is of French nationality.',
 'Ice skating, as an Olympic competition, was introduced in 1908.',
 'So, these two experiences from my childhood taught me a lot of real truth about life and since that time they have been serving me as a measure of my affection or attachment, sorrow or disappointment.',
 'Thankfully, no one was harmed in the accident.',
 'He had a narrow escape when a falling tree crushed his car.']

In [32]:
# preprocess
model_input = tokenizer(examples, truncation=True, padding=True, return_tensors="pt")
logits = mymodel(**model_input).logits

In [33]:
# get predictions
predicted_probabilities = nn.functional.softmax(logits, dim=-1)

In [34]:
# transform predictions back into labels
predicted_labels = get_labels(predicted_probabilities)

In [35]:
#  try printing out some predictions to check if the outputs are reasonable and if you need to adjust your model at the end of every step.
for idx, (sent, level) in enumerate(zip(examples, predicted_labels)):
  if idx >= 10:
    break
  print(f'{level}: {sent}') 

B1: She is of French nationality.
B2: Ice skating, as an Olympic competition, was introduced in 1908.
C2: So, these two experiences from my childhood taught me a lot of real truth about life and since that time they have been serving me as a measure of my affection or attachment, sorrow or disappointment.
C1: Thankfully, no one was harmed in the accident.
C2: He had a narrow escape when a falling tree crushed his car.
B2: The theme of loss runs through most of his novels.
C2: First of all, some people treasure certain possessions since they are valuable in its literal meaning, like diamonds or gold.
A2: I must look like the typical tourist with my shorts and my camera.
C1: It might be that the object in question reminds the owner of a beloved person, a deceased relative, a lost love, or a trip with his or her spouse.
C2: Undoubtedly, the human rights' defenders would protest.


### Six Level Accuracy

Exact accuracy is probably what you're most familiar with:

$
accuracy = \frac{\#exactly\:the\:same\:levels}{\#total}
$

Example:
```
Prediction:   A1 A2 B1 B2 C1 C2
Ground truth: A2 B1 B1 B2 B2 C2
                    ^  ^     ^
```

The six level accuracy is $\frac{3}{6} = 0.5$

As the requirement, <u>your exact accuracy should be higher than $0.5$</u>.

In [36]:
# calculate accuracy
corrects_num = 0
for idx in range(len(predicted_labels)):
  if predicted_labels[idx] == train_val_dataset["test"]["level"][idx]:
    corrects_num = corrects_num + 1
corrects_num / len(predicted_labels)

0.5195482189400521

### Three Level Accuracy

Three Level Accuracy is used when you only want a more general sense of right or wrong.

$
accuracy = \frac{\#the\:same\:ABC\:levels}{\#total}
$

Example:
```
Prediction:   A1 A2 B1 B2 C1 C2
Ground truth: A2 B1 B1 B2 B2 C2
              ^     ^  ^     ^
```

The three level accuracy is $\frac{4}{6} = 0.667$

As the requirement, <u>your exact accuracy should be higher than $0.6$</u>.

In [37]:
# calculate accuracy
corrects_num = 0
for idx in range(len(predicted_labels)):
  if predicted_labels[idx][0] == train_val_dataset["test"]["level"][idx][0]:
    corrects_num = corrects_num + 1
corrects_num / len(predicted_labels)

0.7145960034752389

### Fuzzy accuracy

However, the level of a sentence is relatively subjective. Generally speaking, $\pm1$ errors are allowed in the real evaluation in linguistic area.  

For example, if the actual label is 'B1', we'll also consider the prediction 'right' if the model predicts 'B2' or 'A2'.

Hence, the fuzzy accuracy is

$
accuracy = \frac{\#good\:enough\:answers}{\#total}
$

Example:
```
Prediction:   0 1 2 3 4 5
Ground truth: 0 1 1 3 3 3
              ^ ^ ^ ^ ^
```

The fuzzy accuracy is $\frac{5}{6} = 0.833$

As the requirement, <u>your accuracy should be higher than $0.8$</u>.

In [38]:
fuzz = {}
for idx in range(len(categories)):
  candidates = [categories[idx]]
  if idx > 0:
    candidates.append(categories[idx - 1])
  if idx < len(categories) - 1:
    candidates.append(categories[idx + 1])
  fuzz[categories[idx]] = candidates
fuzz

{'A1': ['A1', 'A2'],
 'A2': ['A2', 'A1', 'B1'],
 'B1': ['B1', 'A2', 'B2'],
 'B2': ['B2', 'B1', 'C1'],
 'C1': ['C1', 'B2', 'C2'],
 'C2': ['C2', 'C1']}

In [39]:
corrects_num = 0
for idx in range(len(predicted_labels)):
  if predicted_labels[idx] in fuzz[train_val_dataset["test"]["level"][idx]]:
    corrects_num = corrects_num + 1
corrects_num / len(predicted_labels)

0.8566463944396178

## TA's Note

Congratulations, you made it to the end of the tutorial! Make sure you make an appointment to show your work and turn in your finished assignment before next week's lesson. We will ask you to run your code, so double check that everything is working and that your model is saved. Don't worry if you didn't pass the evaluation requirements, you'll still get partial points for trying.

## Appendix 


<a name="Appendix-1-Install-CUDA"></a>

### Appendix 1 - Install CUDA

1. Check your GPU vs. CUDA compatibility:
   - [NVIDIA -> Your GPU Compute Capability](https://developer.nvidia.com/cuda-gpus) -> GeForce and TITAN Products
2. Check library vs. CUDA compatibility: 
   - Pytorch: [Previous PyTorch Versions](https://pytorch.org/get-started/previous-versions/)
   - Tensorflow: [Linux/MacOX](https://www.tensorflow.org/install/source#tested_build_configurations) or [Windows](https://www.tensorflow.org/install/source_windows#tested_build_configurations)
3. Note the highest CUDA version that fits your system.

#### >> for conda/mamba users

You can directly install CUDA library with the selected CUDA version.
1. Get [the driver for NVIDIA GPU](https://www.nvidia.com/download/index.aspx)
2. `conda/mamba install -c conda-forge cudatoolkit=${VERSION}`

#### >> for non-conda users

1. Get [the driver for NVIDIA GPU](https://www.nvidia.com/download/index.aspx)
2. Download and install [CUDA Toolkit](https://developer.nvidia.com/cuda-toolkit-archive)
3. Download and install [cuDNN Library](https://developer.nvidia.com/rdp/cudnn-archive)

### Appendix 2 - Further Readings

1. [Huggingface Official Tutorials](https://github.com/huggingface/notebooks/tree/master/examples)
2. How to use Bert with other downstream tasks: [How to use BERT from the Hugging Face transformer library](https://towardsdatascience.com/how-to-use-bert-from-the-hugging-face-transformer-library-d373a22b0209): 
3. Training with pytorch backend: [transformers-tutorials](https://github.com/abhimishra91/transformers-tutorials)
4. A more complicated example that include manual data/training processing with Pytorch: [Transformers for Multi-Label Classification made simple](https://towardsdatascience.com/transformers-for-multilabel-classification-71a1a0daf5e1)
5. [Text Classification with tensorflow](https://github.com/huggingface/notebooks/blob/master/examples/text_classification-tf.ipynb): tensorflow example