# Adaptative fine-tuning + Head Training 🤖⚙️

Okay, at this point we have already trained the base-bert-cased on our own dataset following the adaptative framework (MLM objective). Now, our model is supposed to be more adjusted to our specific medical domain in terms of semantics.

Let's try and train again our baseline classifier but this time, using our fine-tuned backbone!

<figure style='text-align:center';>
  <img src="../data/images/A+HT.png">
  
  <figcaption>
  Adaptative fine-tuning + (just) head training schema 
  </figcaption>
</figure>

Important points:
* Dataset: [medical_questions_pairs](https://huggingface.co/datasets/medical_questions_pairs)
* Model: [bert-base-cased](https://huggingface.co/bert-base-cased)
* We will define auxiliar functions in auxiliar.py file
* We will be logging the results in Weight&Biases.
<br>

In [1]:
import torch
import config

if torch.cuda.is_available():
   device = torch.device("cuda:0")
else:
    device = torch.device("cpu")

In [2]:
device

device(type='cuda', index=0)

## 1. Data preparation

### 1.1. Import and set creation

Import data and create partitions.

In [2]:
from datasets import load_dataset

# Download and extract data
data = load_dataset("medical_questions_pairs")
data = data['train']

# Split it
data = data.train_test_split(test_size=0.07, seed=config.SEED)



  0%|          | 0/1 [00:00<?, ?it/s]



In [3]:
data

DatasetDict({
    train: Dataset({
        features: ['dr_id', 'question_1', 'question_2', 'label'],
        num_rows: 2834
    })
    test: Dataset({
        features: ['dr_id', 'question_1', 'question_2', 'label'],
        num_rows: 214
    })
})

As we can see, there is not that much ammount of samples. We will have to take that into consideration when training the models.

### 1.2. Tokenize and encode data

As mentioned, we will use **bert-base-cased** tokenizer.

**NOTE: If we had created a new version of the tokenizer during our adaptative fine-tuning (by adding new tokens to the vocab), we would need to use our new models checkpoint to load this tokenizer. We didn't so we can still use bert-base-cased.**

In [4]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(config.checkpoint, use_fast=True)

In [5]:
data = data.map(lambda x: tokenizer(x['question_1'], x['question_2'], truncation=True, padding='max_length'), batched=True)



In [6]:
from transformers import DataCollatorWithPadding

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

## 2. Adapted backbone + Head training 

Okay! Time to train the model.

Now, we will download our previously trained backbone from Weight&Biases and load it with an untrained classification head. 

Remember that for this experiment we just want to train the classification head, so we will freeze all backbone's parameters.

In [7]:
import wandb
from tempfile import TemporaryDirectory
from transformers import AutoModelForSequenceClassification

# Download artifact to a temp dir
with TemporaryDirectory() as temp_dir:
  run = wandb.init()
  artifact = run.use_artifact('jjceamoran/fine-tuning-mlms/encoder:v0', type='model')
  artifact_dir = artifact.download(temp_dir)

  model = AutoModelForSequenceClassification.from_pretrained(artifact_dir, num_labels=2)


# Freeze its weihts
for param in model.bert.parameters():
    param.requires_grad = False

ERROR:wandb.jupyter:Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33mjjceamoran[0m. Use [1m`wandb login --relogin`[0m to force relogin


[34m[1mwandb[0m: Downloading large artifact encoder:v0, 1240.91MB. 11 files... 
[34m[1mwandb[0m:   11 of 11 files downloaded.  
Done. 0:0:4.5
Some weights of the model checkpoint at /tmp/tmp_w0la0tv were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenc

### 3.1. Init WandB

In [8]:
import wandb

wandb.login()



True

In [9]:
run_name = 'adaptative_and_head_training'
notes = "This experiment consists on a (just) head training. We will use a domain-adapted backbone."
run = wandb.init(project='fine-tuning-mlms',
           name=run_name,
           notes=notes,
           job_type='train')


VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

In [10]:
from transformers import Trainer, TrainingArguments
from training_aux import compute_metrics
import sklearn

training_args = TrainingArguments(
    output_dir="./experiments/" + run_name,
    learning_rate=3e-5, # low learning rate.
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=5,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    report_to='wandb',
    run_name=run_name
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=data['train'],
    eval_dataset=data['test'],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics
)

In [11]:
trainer.train()

The following columns in the training set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: dr_id, question_2, question_1. If dr_id, question_2, question_1 are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 2834
  Num Epochs = 5
  Instantaneous batch size per device = 16
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 890
  Number of trainable parameters = 1538
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.65828,0.672897
2,No log,0.641414,0.686916
3,0.666700,0.630341,0.682243
4,0.666700,0.62478,0.686916
5,0.666700,0.623056,0.686916


The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: dr_id, question_2, question_1. If dr_id, question_2, question_1 are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 214
  Batch size = 16
Saving model checkpoint to ./experiments/adaptative_and_head_training/checkpoint-178
Configuration saved in ./experiments/adaptative_and_head_training/checkpoint-178/config.json
Model weights saved in ./experiments/adaptative_and_head_training/checkpoint-178/pytorch_model.bin
tokenizer config file saved in ./experiments/adaptative_and_head_training/checkpoint-178/tokenizer_config.json
Special tokens file saved in ./experiments/adaptative_and_head_training/checkpoint-178/special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` 

TrainOutput(global_step=890, training_loss=0.6565721318962868, metrics={'train_runtime': 498.7547, 'train_samples_per_second': 28.411, 'train_steps_per_second': 1.784, 'total_flos': 3728283654451200.0, 'train_loss': 0.6565721318962868, 'epoch': 5.0})

In [None]:
# Log model

artifact = wandb.Artifact('classifier', type='model')
artifact.add_dir('./experiments/behavioural_training/checkpoint-2485')
wandb.log_artifact(artifact)

[34m[1mwandb[0m: Adding directory to artifact (./experiments/behavioural_training/checkpoint-2485)... Done. 6.7s


<wandb.sdk.wandb_artifacts.Artifact at 0x7fc602499d90>