In [5]:
# Load Augmented Datset
import numpy as np
import pandas as pd
#change data path as necessary
aug_df = pd.read_csv('/content/french_hw2.csv',sep=',')
aug_df = aug_df.dropna()
# fi_df = pd.read_csv('/content/final_data.tsv',sep='\t')

In [6]:
aug_df

Unnamed: 0,sentence,label_ID
0,Les mariages SAM-SEX deviennent courants dans ...,15
1,L'acceptation par le gouvernement du mariage g...,11
2,Des règles adéquates de santé et de sécurité d...,9
3,La stabilité financière et l'accès aux ressour...,10
4,Les dirigeants politiques ont discuté des avan...,13
...,...,...
295,L'arrivée des immigrants dans un nouveau pays ...,7
296,Les implications en matière de santé et de séc...,9
297,L'autorité publique de diverses nations a pris...,7
298,Le mariage égal-sexe fournit la même liberté e...,10


# Part 2: Model Training and Testing

Now we'll move onto fine-tuning  pretrained language models specifically on your dataset. This part of the homework is meant to be an introduction to the HuggingFace library, and it contains code that will potentially be useful for your final projects. Since we're dealing with large models, the first step is to change to a GPU runtime.

## Adding a hardware accelerator

Please go to the menu and add a GPU as follows:

`Edit > Notebook Settings > Hardware accelerator > (GPU)`

Run the following cell to confirm that the GPU is detected.

In [7]:
import torch
torch.cuda.empty_cache()

# Confirm that the GPU is detected

assert torch.cuda.is_available()

# Get the GPU device name.
device_name = torch.cuda.get_device_name()
n_gpu = torch.cuda.device_count()
print(f"Found device: {device_name}, n_gpu: {n_gpu}")
device = torch.device("cuda")

Found device: Tesla T4, n_gpu: 1


## Installing Hugging Face's Transformers library
We will use Hugging Face's Transformers (https://github.com/huggingface/transformers), an open-source library that provides general-purpose architectures for natural language understanding and generation with a collection of various pretrained models made by the NLP community. This library will allow us to easily use pretrained models like `BERT` and perform experiments on top of them. We can use these models to solve downstream target tasks, such as text classification, question answering, and sequence labeling.

Run the following cell to install Hugging Face's Transformers library and download a sample data file called seed.tsv that contains 250 sentences in English, annotated with their frame.

In [None]:
!pip install transformers
!pip install -U -q PyDrive

The cell below imports some helper functions we wrote to demonstrate the task on the sample seed dataset.

In [8]:
from helpers import tokenize_and_format, flat_accuracy

# Part 1: Data Prep and Model Specifications

Upload your data using the file explorer to the left. We have provided a function below to tokenize and format your data as BERT requires. Make sure that your tsv file, titled final_data.tsv, has one column "sentence" and another column "labels_ID" containing integers/float.

If you run the cell below without modifications, it will run on the seed.tsv example data we have provided. It imports some helper functions we wrote to demonstrate the task on the sample dataset. You should first run all of the following cells with seed.tsv just to see how everything works. Then, once you understand the whole preprocessing / fine-tuning process, change the tsv in the below cell to your final_data.tsv file, add any extra preprocessing code you wish, and then run the cells again on your own data.

In [9]:
from helpers import tokenize_and_format, flat_accuracy
import pandas as pd
import numpy as np
aug_df["label_ID"] = aug_df["label_ID"].astype(float)
# fi_df["label_id"] = fi_df["label_id"].astype(float)
# df = fi_df
# df = pd.read_csv('seed.tsv')
df = aug_df

df = df.sample(frac=1).reset_index(drop=True)

texts = df.sentence.values
labels = df.label_ID.values

### tokenize_and_format() is a helper function provided in helpers.py ###
input_ids, attention_masks = tokenize_and_format(texts)

label_list = []
for l in labels:
  label_array = np.zeros(len(set(labels)))
  label_array[int(l)-1] = 1
  label_list.append(label_array)

# Convert the lists into tensors.
input_ids = torch.cat(input_ids, dim=0)
attention_masks = torch.cat(attention_masks, dim=0)
labels = torch.tensor(np.array(label_list))

# Print sentence 0, now as a list of IDs.
print('Original: ', texts[0])
print('Token IDs:', input_ids[0])

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/872k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

Original:  L'immigration est un élément important de l'histoire humaine, et il peut avoir un impact significatif sur l'identité culturelle d'une personne.
Token IDs: tensor([  101,   154,   112, 38451, 10182, 10119, 16644, 12652, 10102,   154,
          112, 13119, 53310,   117, 10137, 10145, 12835, 13810, 10119, 17796,
        17929, 21369, 10344,   154,   112, 59321, 76385,   146,   112, 10249,
        28161,   119,   102,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0])


## Create train/test/validation splits

Here we split your dataset into 3 parts: a training set, a validation set, and a testing set. Each item in your dataset will be a 3-tuple containing an input_id tensor, an attention_mask tensor, and a label tensor.



In [10]:

total = len(df)

num_train = int(total * .8)
num_val = int(total * .1)
num_test = total - num_train - num_val

# make lists of 3-tuples (already shuffled the dataframe in cell above)

train_set = [(input_ids[i], attention_masks[i], labels[i]) for i in range(num_train)]
val_set = [(input_ids[i], attention_masks[i], labels[i]) for i in range(num_train, num_val+num_train)]
test_set = [(input_ids[i], attention_masks[i], labels[i]) for i in range(num_val + num_train, total)]

train_text = [texts[i] for i in range(num_train)]
val_text = [texts[i] for i in range(num_train, num_val+num_train)]
test_text = [texts[i] for i in range(num_val + num_train, total)]


Here we choose the model we want to finetune from https://huggingface.co/transformers/pretrained_models.html. Because the task requires us to label sentences, we wil be using BertForSequenceClassification below. You may see a warning that states that `some weights of the model checkpoint at [model name] were not used when initializing. . .` This warning is expected and means that you should fine-tune your pre-trained model before using it on your downstream task. See [here](https://github.com/huggingface/transformers/issues/5421#issuecomment-652582854) for more info.

In [11]:
from transformers import BertForSequenceClassification, AdamW, BertConfig

model = BertForSequenceClassification.from_pretrained(
    "sandeepvarma99/tacl-french", # model
    num_labels = 15, # The number of output labels.   
    output_attentions = False, # Whether the model returns attentions weights.
    output_hidden_states = False, # Whether the model returns all hidden-states.
)
# model.dropout = nn.Dropout(p=0.1)
# Tell pytorch to run this model on the GPU.
model.cuda()
# print(model.config)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at sandeepvarma99/tacl-french and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(105879, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12

# ACTION REQUIRED #

Define your fine-tuning hyperparameters in the cell below (we have randomly picked some values to start with). We want you to experiment with different configurations to find the one that works best (i.e., highest accuracy) on your validation set. Feel free to also change pretrained models to others available in the HuggingFace library (you'll have to modify the cell above to do this). You might find papers on BERT fine-tuning stability (e.g., [Mosbach et al., ICLR 2021](https://openreview.net/pdf?id=nzpLWnVAyah)) to be of interest.

In [17]:
batch_size = 64
optimizer = AdamW(model.parameters(),lr=5e-05) #with default values of learning rate and epsilon value
epochs = 20

# Fine-tune your model
Here we provide code for fine-tuning your model, monitoring the loss, and checking your validation accuracy. Rerun both of the below cells when you change your hyperparameters above.

In [13]:
# function to get validation accuracy
def get_validation_performance(val_set):
    # Put the model in evaluation mode
    model.eval()

    # Tracking variables 
    total_eval_accuracy = 0
    total_eval_loss = 0

    num_batches = int(len(val_set)/batch_size) + 1

    total_correct = 0

    for i in range(num_batches):

      end_index = min(batch_size * (i+1), len(val_set))

      batch = val_set[i*batch_size:end_index]
      
      if len(batch) == 0: continue

      input_id_tensors = torch.stack([data[0] for data in batch])
      input_mask_tensors = torch.stack([data[1] for data in batch])
      label_tensors = torch.stack([data[2] for data in batch])
      
      # Move tensors to the GPU
      b_input_ids = input_id_tensors.to(device)
      b_input_mask = input_mask_tensors.to(device)
      b_labels = label_tensors.to(device)
        
      # Tell pytorch not to bother with constructing the compute graph during
      # the forward pass, since this is only needed for backprop (training).
      with torch.no_grad():        

        # Forward pass, calculate logit predictions.
        outputs = model(b_input_ids, 
                                token_type_ids=None, 
                                attention_mask=b_input_mask,
                                labels=b_labels)
        loss = outputs.loss
        logits = outputs.logits
            
        # Accumulate the validation loss.
        total_eval_loss += loss.item()
        
        # Move logits and labels to CPU
        logits = (logits).detach().cpu().numpy()
        label_ids = b_labels.to('cpu').numpy()


        # Calculate the number of correctly labeled examples in batch
        pred_flat = np.argmax(logits, axis=1).flatten()
        labels_flat = np.argmax(label_ids, axis=1).flatten()

        num_correct = np.sum(pred_flat == labels_flat)
        total_correct += num_correct
        
    # Report the final accuracy for this validation run.
    print("Num of correct predictions =", total_correct)
    avg_val_accuracy = total_correct / len(val_set)
    return avg_val_accuracy



In [18]:
import random

# training loop

# For each epoch...
for epoch_i in range(0, epochs):
    # Perform one full pass over the training set.

    print("")
    print('======== Epoch {:} / {:} ========'.format(epoch_i + 1, epochs))
    print('Training...')

    # Reset the total loss for this epoch.
    total_train_loss = 0

    # Put the model into training mode.
    model.train()

    # For each batch of training data...
    num_batches = int(len(train_set)/batch_size) + 1

    for i in range(num_batches):
      end_index = min(batch_size * (i+1), len(train_set))

      batch = train_set[i*batch_size:end_index]

      if len(batch) == 0: continue

      input_id_tensors = torch.stack([data[0] for data in batch])
      input_mask_tensors = torch.stack([data[1] for data in batch])
      label_tensors = torch.stack([data[2] for data in batch])

      # Move tensors to the GPU
      b_input_ids = input_id_tensors.to(device)
      b_input_mask = input_mask_tensors.to(device)
      b_labels = label_tensors.to(device) 

      # Perform a forward pass (evaluate the model on this training batch).
      outputs = model(b_input_ids, token_type_ids=None, attention_mask=b_input_mask, labels=b_labels)
      loss = outputs.loss
      logits = outputs.logits

      total_train_loss += loss.item()

      # Clear the previously calculated gradient
      model.zero_grad()     

      # Perform a backward pass to calculate the gradients.
      loss.backward()

      # Update parameters and take a step using the computed gradient.
      optimizer.step()
        
    # ========================================
    #               Validation
    # ========================================
    # After the completion of each training epoch, measure our performance on
    # our validation set. Implement this function in the cell above.
    print(f"Total loss: {total_train_loss}")
    val_acc = get_validation_performance(val_set)
    print(f"Validation accuracy: {val_acc}")
    
print("")
print("Training complete!")



Training...
Total loss: 0.2585902260027878
Num of correct predictions = 22
Validation accuracy: 0.7333333333333333

Training...
Total loss: 0.2457738999067159
Num of correct predictions = 22
Validation accuracy: 0.7333333333333333

Training...
Total loss: 0.2306817524080139
Num of correct predictions = 22
Validation accuracy: 0.7333333333333333

Training...
Total loss: 0.21507233057442743
Num of correct predictions = 22
Validation accuracy: 0.7333333333333333

Training...
Total loss: 0.2029948530786593
Num of correct predictions = 22
Validation accuracy: 0.7333333333333333

Training...
Total loss: 0.19144185050487675
Num of correct predictions = 22
Validation accuracy: 0.7333333333333333

Training...
Total loss: 0.1806408211808755
Num of correct predictions = 21
Validation accuracy: 0.7

Training...
Total loss: 0.17144745302438322
Num of correct predictions = 22
Validation accuracy: 0.7333333333333333

Training...
Total loss: 0.161642719121624
Num of correct predictions = 22
Validatio

# Evaluate your model on the test set
After you're satisfied with your hyperparameters (i.e., you're unable to achieve higher validation accuracy by modifying them further), it's time to evaluate your model on the test set! Run the below cell to compute test set accuracy.


In [19]:
get_validation_performance(test_set)

Num of correct predictions = 23


0.7666666666666667

## Error Analysis

In [20]:
# Evaluate the model on the test set
test_acc = get_validation_performance(test_set)

# Put the model in evaluation mode
model.eval()

# Tracking variables
total_test_loss = 0
total_correct = 0
wrong_examples = []

num_batches = int(len(test_set)/batch_size) + 1

for i in range(num_batches):
    end_index = min(batch_size * (i+1), len(test_set))
    batch = test_set[i*batch_size:end_index]

    if len(batch) == 0:
        continue

    input_id_tensors = torch.stack([data[0] for data in batch])
    input_mask_tensors = torch.stack([data[1] for data in batch])
    label_tensors = torch.stack([data[2] for data in batch])

    # Move tensors to the GPU
    b_input_ids = input_id_tensors.to(device)
    b_input_mask = input_mask_tensors.to(device)
    b_labels = label_tensors.to(device)

    with torch.no_grad():
        # Forward pass, calculate logit predictions
        outputs = model(b_input_ids,
                        token_type_ids=None,
                        attention_mask=b_input_mask,
                        labels=b_labels)
        loss = outputs.loss
        logits = outputs.logits

        # Accumulate the test loss
        total_test_loss += loss.item()

        # Move logits and labels to CPU
        logits = (logits).detach().cpu().numpy()
        label_ids = b_labels.to('cpu').numpy()

        # Calculate the number of correctly labeled examples in batch
        pred_flat = np.argmax(logits, axis=1).flatten()
        labels_flat = np.argmax(label_ids, axis=1).flatten()

        num_correct = np.sum(pred_flat == labels_flat)
        total_correct += num_correct

        # Find examples that the model gets wrong
        for j in range(len(batch)):
            if pred_flat[j] != labels_flat[j]:
                text = test_text[i*batch_size+j]
                predicted_label = pred_flat[j] + 1
                true_label = labels_flat[j] + 1
                wrong_examples.append((text, predicted_label, true_label))

# Print some of the examples that the model gets wrong
print("Test accuracy:", test_acc)
print("Number of wrong examples:", len(wrong_examples))
print("wrong examples:")
for i, (text, predicted_label, true_label) in enumerate(wrong_examples[:5]):
    print(f"\nExample {i+1}:")
    print("Text:", text)
    print("Predicted label:", predicted_label)
    print("True label:", true_label)


Num of correct predictions = 23
Test accuracy: 0.7666666666666667
Number of wrong examples: 7
wrong examples:

Example 1:
Text: Divers pays ont constaté une augmentation de la migration illégale qui a mis un poids excessif sur leurs propriétés.
Predicted label: 15
True label: 7

Example 2:
Text: Il est important de considérer les implications morales des politiques d'immigration, ainsi que la façon dont elles peuvent affecter les personnes touchées.
Predicted label: 13
True label: 3

Example 3:
Text: Atteindre une meilleure éducation, des soins de santé et des possibilités d'emploi peut augmenter considérablement la qualité de vie pour les immigrants.
Predicted label: 9
True label: 10

Example 4:
Text: Les conséquences légales et constitutionnelles du mariage gay ont reçu une attention considérable ces dernières années.
Predicted label: 15
True label: 5

Example 5:
Text: Nous devons nous tenir contre toute tentative de créer un système de justice à deuxtières, où les immigrants sont tr