# Adaptative + Behavioural fine-tuning 🤖⚙️

Okay, at this point we have already trained the base-bert-cased on our own dataset following the adaptative framework (MLM objective). Now, our model is supposed to be more adjusted to our specific medical domain in terms of semantics.

In this experiment, we will further adjust our model to the objective task. This is, a round of behavioural fine-tuning over an already adapted backbone! Lets see what happens.

<figure style='text-align:center';>
  <img src="../data/images/A+BFT.png">
  
  <figcaption>
  Adaptative + Behavioural fine-tuning schema 
  </figcaption>
</figure>

Important points:
* Dataset: [medical_questions_pairs](https://huggingface.co/datasets/medical_questions_pairs)
* Model: [bert-base-cased](https://huggingface.co/bert-base-cased)
* We will define auxiliar functions in auxiliar.py file
* We will be logging the results in Weight&Biases.
<br>

In [1]:
import torch
import config

if torch.cuda.is_available():
   device = torch.device("cuda:0")
else:
    device = torch.device("cpu")

In [None]:
device

device(type='cuda', index=0)

## 1. Data preparation

### 1.1. Import and set creation

Import data and create partitions.

In [4]:
from datasets import load_dataset

# Download and extract data
data = load_dataset("medical_questions_pairs")
data = data['train']

# Split it
data = data.train_test_split(test_size=0.07, seed=config.SEED)

Downloading builder script:   0%|          | 0.00/2.83k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/1.22k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/8.00k [00:00<?, ?B/s]

Downloading and preparing dataset medical_questions_pairs/default to /root/.cache/huggingface/datasets/medical_questions_pairs/default/0.0.0/db30a35b934dceb7abed5ef6b73a432bb59682d00e26f9a1acd960635333bc80...


Downloading data:   0%|          | 0.00/174k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/3048 [00:00<?, ? examples/s]

Dataset medical_questions_pairs downloaded and prepared to /root/.cache/huggingface/datasets/medical_questions_pairs/default/0.0.0/db30a35b934dceb7abed5ef6b73a432bb59682d00e26f9a1acd960635333bc80. Subsequent calls will reuse this data.


  0%|          | 0/1 [00:00<?, ?it/s]

In [5]:
data

DatasetDict({
    train: Dataset({
        features: ['dr_id', 'question_1', 'question_2', 'label'],
        num_rows: 2834
    })
    test: Dataset({
        features: ['dr_id', 'question_1', 'question_2', 'label'],
        num_rows: 214
    })
})

As we can see, there is not that much ammount of samples. We will have to take that into consideration when training the models.

### 1.2. Tokenize and encode data

As mentioned, we will use **bert-base-cased** tokenizer.

**NOTE: If we had created a new version of the tokenizer during our adaptative fine-tuning (by adding new tokens to the vocab), we would need to use our new models checkpoint to load this tokenizer. We didn't so we can still use bert-base-cased.**

In [6]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(config.checkpoint, use_fast=True)

Downloading:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/436k [00:00<?, ?B/s]

In [7]:
data = data.map(lambda x: tokenizer(x['question_1'], x['question_2'], truncation=True, padding='max_length'), batched=True)

  0%|          | 0/3 [00:00<?, ?ba/s]

  0%|          | 0/1 [00:00<?, ?ba/s]

In [8]:
from transformers import DataCollatorWithPadding

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

## 2. Adapted backbone + Behavioural fine-tuning 

Okay! Time to train the model.

Now, we will download our previously trained backbone from Weight&Biases and load it with an untrained classification head. 

For this experiment we are not freezing the encoder's parameters. We will train everything in order to adapt the whole thing to the objetive task.

In [None]:
import wandb
from tempfile import TemporaryDirectory
from transformers import AutoModelForSequenceClassification

# Download artifact to a temp dir
with TemporaryDirectory() as temp_dir:
  run = wandb.init()
  artifact = run.use_artifact('jjceamoran/fine-tuning-mlms/encoder:v0', type='model')
  artifact_dir = artifact.download(temp_dir)

  model = AutoModelForSequenceClassification.from_pretrained(artifact_dir, num_labels=2)

### 3.1. Init WandB

In [10]:
import wandb

wandb.login()



True

In [None]:
run_name = 'adaptative_and_behavioural_training'
notes = "This experiment consists on a full head + backbone training on the objective task. We will use a domain-adapted backbone."
run = wandb.init(project='fine-tuning-mlms',
           name=run_name,
           notes=notes,
           job_type='train')


In [13]:
from transformers import Trainer, TrainingArguments
from training_aux import compute_metrics
import sklearn

training_args = TrainingArguments(
    output_dir="./experiments/" + run_name,
    learning_rate=2e-5, # low learning rate.
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=8,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    metric_for_best_model="accuracy",
    report_to='wandb',
    run_name=run_name
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=data['train'],
    eval_dataset=data['test'],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics
)

In [14]:
trainer.train()

The following columns in the training set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: question_2, dr_id, question_1. If question_2, dr_id, question_1 are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 2834
  Num Epochs = 8
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 1
  Total optimization steps = 2840
  Number of trainable parameters = 108311810
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.473525,77.57
2,0.480200,0.587687,82.71
3,0.227400,0.719719,82.24
4,0.227400,0.939398,84.11
5,0.078400,0.978745,85.05
6,0.022700,1.112461,84.11
7,0.022700,1.198448,83.64
8,0.010400,1.239758,83.18


The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: question_2, dr_id, question_1. If question_2, dr_id, question_1 are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 214
  Batch size = 8
Saving model checkpoint to ./experiments/adaptative_and_behavioural_training/checkpoint-355
Configuration saved in ./experiments/adaptative_and_behavioural_training/checkpoint-355/config.json
Model weights saved in ./experiments/adaptative_and_behavioural_training/checkpoint-355/pytorch_model.bin
tokenizer config file saved in ./experiments/adaptative_and_behavioural_training/checkpoint-355/tokenizer_config.json
Special tokens file saved in ./experiments/adaptative_and_behavioural_training/checkpoint-355/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `BertF

TrainOutput(global_step=2840, training_loss=0.14489131322209264, metrics={'train_runtime': 2266.7972, 'train_samples_per_second': 10.002, 'train_steps_per_second': 1.253, 'total_flos': 5965253847121920.0, 'train_loss': 0.14489131322209264, 'epoch': 8.0})

In [15]:
# Log model

artifact = wandb.Artifact('classifier', type='model')
artifact.add_dir('./experiments/adaptative_and_behavioural_training/checkpoint-1775')
wandb.log_artifact(artifact)

[34m[1mwandb[0m: Adding directory to artifact (./experiments/adaptative_and_behavioural_training/checkpoint-1775)... Done. 9.5s


<wandb.sdk.wandb_artifacts.Artifact at 0x7f9fd1fe1b50>