# Behavioural Training 🤖⚙️

The second experiment consists on a "classic" fine-tuning. This is, unfreezing BERT's weights and train it along with the FC layer. By doing so, we are also adapting BERT to our task (and, in a minor part, to our domain).

Important points:
* Dataset: [medical_questions_pairs](https://huggingface.co/datasets/medical_questions_pairs)
* Model: [bert-base-cased](https://huggingface.co/bert-base-cased)
* We will define auxiliar functions in auxiliar.py file
* We will be logging the results in Weight&Biases.
<br>

In [6]:
import torch
import config

if torch.cuda.is_available():
   device = torch.device("cuda:0")
else:
    device = torch.device("cpu")

In [7]:
device

device(type='cuda', index=0)

## 1. Data preparation

### 1.1. Import and set creation

Import data and create partitions.

In [8]:
from datasets import load_dataset

# Download and extract data
data = load_dataset("medical_questions_pairs")
data = data['train']

# Split it
data = data.train_test_split(test_size=0.07, seed=config.SEED)

Downloading builder script:   0%|          | 0.00/2.83k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/1.22k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/8.00k [00:00<?, ?B/s]

Downloading and preparing dataset medical_questions_pairs/default to /root/.cache/huggingface/datasets/medical_questions_pairs/default/0.0.0/db30a35b934dceb7abed5ef6b73a432bb59682d00e26f9a1acd960635333bc80...


Downloading data:   0%|          | 0.00/174k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/3048 [00:00<?, ? examples/s]

Dataset medical_questions_pairs downloaded and prepared to /root/.cache/huggingface/datasets/medical_questions_pairs/default/0.0.0/db30a35b934dceb7abed5ef6b73a432bb59682d00e26f9a1acd960635333bc80. Subsequent calls will reuse this data.


  0%|          | 0/1 [00:00<?, ?it/s]

In [None]:
data

DatasetDict({
    train: Dataset({
        features: ['dr_id', 'question_1', 'question_2', 'label'],
        num_rows: 2834
    })
    test: Dataset({
        features: ['dr_id', 'question_1', 'question_2', 'label'],
        num_rows: 214
    })
})

As we can see, there is not that much ammount of samples. We will have to take that into consideration when training the models.

### 1.2. Tokenize and encode data

As mentioned, we will use **bert-base-cased** tokenizer

In [9]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(config.checkpoint, use_fast=True)

Downloading:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/436k [00:00<?, ?B/s]

In [10]:
data = data.map(lambda x: tokenizer(x['question_1'], x['question_2'], truncation=True, padding='max_length'), batched=True)

  0%|          | 0/3 [00:00<?, ?ba/s]

  0%|          | 0/1 [00:00<?, ?ba/s]

In [11]:
from transformers import DataCollatorWithPadding

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

## 2. Behavioural finetuning

Next thing to test, we want to train the whole model (FCL + BERT) so it is adapted to our specific task.

In this case, we will let BERT's weights unfrozen.

In [12]:
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(config.checkpoint, num_labels=2)

Downloading:   0%|          | 0.00/436M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at b

### 3.1. Init WandB

In [13]:
import wandb

wandb.login()

ERROR:wandb.jupyter:Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit: 

··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

In [14]:
run_name = 'behavioural_training'
notes = "This experiment consists on a behavioural finetuning. We want to adapt the model to our target task by training also the encoder's weights."
run = wandb.init(project='fine-tuning-mlms',
           name=run_name,
           notes=notes,
           job_type='train')


[34m[1mwandb[0m: Currently logged in as: [33mjjceamoran[0m. Use [1m`wandb login --relogin`[0m to force relogin


In [15]:
from transformers import Trainer, TrainingArguments
from training_aux import compute_metrics
import sklearn

training_args = TrainingArguments(
    output_dir="./experiments/" + run_name,
    learning_rate=3e-5, # low learning rate.
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=8,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    report_to='wandb',
    run_name=run_name
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=data['train'],
    eval_dataset=data['test'],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics
)

In [16]:
trainer.train()

The following columns in the training set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: question_1, question_2, dr_id. If question_1, question_2, dr_id are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 2834
  Num Epochs = 8
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 1
  Total optimization steps = 2840
  Number of trainable parameters = 108311810
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.493627,0.79,1
2,0.502200,0.520202,0.81,1
3,0.267700,0.768769,0.82,1
4,0.267700,0.972102,0.82,1
5,0.118500,1.013869,0.84,1
6,0.032900,1.261037,0.8,1
7,0.032900,1.179059,0.85,1
8,0.012000,1.199661,0.83,1


The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: question_1, question_2, dr_id. If question_1, question_2, dr_id are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 214
  Batch size = 8
Saving model checkpoint to ./experiments/behavioural_training/checkpoint-355
Configuration saved in ./experiments/behavioural_training/checkpoint-355/config.json
Model weights saved in ./experiments/behavioural_training/checkpoint-355/pytorch_model.bin
tokenizer config file saved in ./experiments/behavioural_training/checkpoint-355/tokenizer_config.json
Special tokens file saved in ./experiments/behavioural_training/checkpoint-355/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: question_1, questi

TrainOutput(global_step=2840, training_loss=0.16475957799965227, metrics={'train_runtime': 2124.0411, 'train_samples_per_second': 10.674, 'train_steps_per_second': 1.337, 'total_flos': 5965253847121920.0, 'train_loss': 0.16475957799965227, 'epoch': 8.0})

In [17]:
# Log model

artifact = wandb.Artifact('classifier', type='model')
artifact.add_dir('./experiments/behavioural_training/checkpoint-2485')
wandb.log_artifact(artifact)

[34m[1mwandb[0m: Adding directory to artifact (./experiments/behavioural_training/checkpoint-2485)... Done. 6.7s


<wandb.sdk.wandb_artifacts.Artifact at 0x7fc602499d90>