# Parameter-Efficient Fine-Tuning (PEFT) of BERT base model to predict medical diagnosis - tutorial 
https://medium.com/@nubyra/parameter-efficient-fine-tuning-peft-of-bert-base-model-to-predict-medical-diagnosis-5086a1828f4b

## Load Python Libraries

In [1]:
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline, DataCollatorWithPadding, Trainer, TrainingArguments, BertForSequenceClassification, pipeline
from peft import PeftModel, PeftConfig, LoraConfig, TaskType, get_peft_model
import torch
import pandas as pd
import numpy as np
import os




## Load medical diagnosis dataset

In [2]:
data_files = {"train": "train.jsonl", "test": "test.jsonl"}
dataset = load_dataset("gretelai/symptom_to_diagnosis", data_files=data_files)
dataset = dataset.rename_column("output_text", "label")
print(dataset)

DatasetDict({
    train: Dataset({
        features: ['label', 'input_text'],
        num_rows: 853
    })
    test: Dataset({
        features: ['label', 'input_text'],
        num_rows: 212
    })
})


In [3]:
for entry in dataset['train'].select(range(5)):
    print('INPUT: {} \nOUTPUT: {}\n'.format(entry['input_text'], entry['label']))

INPUT: I've been having a lot of pain in my neck and back. I've also been having trouble with my balance and coordination. I've been coughing a lot and my limbs feel weak. 
OUTPUT: cervical spondylosis

INPUT: I have a rash on my face that is getting worse. It is red, inflamed, and has blisters that are bleeding clear pus. It is really painful. 
OUTPUT: impetigo

INPUT: I have been urinating blood. I sometimes feel sick to my stomach when I urinate. I often feel like I have a fever. 
OUTPUT: urinary tract infection

INPUT: I have been having trouble with my muscles and joints. My neck is really tight and my muscles feel weak. I have swollen joints and it is hard to move around without becoming stiff. It is also really uncomfortable to walk. 
OUTPUT: arthritis

INPUT: I have been feeling really sick. My body hurts a lot and I have no appetite. I have also developed rashes on my arms and face. The back of my eyes hurt a lot. 
OUTPUT: dengue



In [4]:
train_counts = pd.DataFrame({'Diagnosis': dataset['train']['label']})
train_counts = train_counts.groupby('Diagnosis').size().reset_index(name='train_set')

test_counts = pd.DataFrame({'Diagnosis': dataset['test']['label']})
test_counts = test_counts.groupby('Diagnosis').size().reset_index(name='test_set')

display(train_counts.merge(test_counts, on='Diagnosis'))

Unnamed: 0,Diagnosis,train_set,test_set
0,allergy,40,10
1,arthritis,40,10
2,bronchial asthma,40,10
3,cervical spondylosis,40,10
4,chicken pox,40,10
5,common cold,39,10
6,dengue,40,10
7,diabetes,40,10
8,drug reaction,40,8
9,fungal infection,39,9


## Prediction using Pre-trained BERT model

In [5]:
sorted_labels = sorted(set(dataset['train']['label']))
label2id = dict(zip(sorted_labels, range(0, len(sorted_labels))))
id2label = dict(zip(range(0, len(sorted_labels)), sorted_labels))

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
foundation_model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased",
                                                                    num_labels=len(label2id),
                                                                    label2id=label2id,
                                                                    id2label=id2label)

classifier = pipeline("text-classification", model=foundation_model, tokenizer=tokenizer)
predicted_labels = classifier(dataset['test']['input_text'])

model.safetensors:  50%|####9     | 220M/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


In [6]:
test_array = np.asarray(dataset['test']['label'])
pred_array = np.asarray([item['label'] for item in predicted_labels])
foundation_accuracy = round(sum(test_array == pred_array)*100/len(test_array), 2)
print(f"Foundation Model Accuracy: {foundation_accuracy}%")

Foundation Model Accuracy: 7.08%


## PEFT-LORA configuration for BERT

In [7]:
lora_config = LoraConfig(
    task_type=TaskType.SEQ_CLS, r=64, lora_alpha=1, lora_dropout=0.1
)

peft_model = get_peft_model(foundation_model, lora_config)
print(peft_model.bert)

BertModel(
  (embeddings): BertEmbeddings(
    (word_embeddings): Embedding(30522, 768, padding_idx=0)
    (position_embeddings): Embedding(512, 768)
    (token_type_embeddings): Embedding(2, 768)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): BertEncoder(
    (layer): ModuleList(
      (0-11): 12 x BertLayer(
        (attention): BertAttention(
          (self): BertSdpaSelfAttention(
            (query): lora.Linear(
              (base_layer): Linear(in_features=768, out_features=768, bias=True)
              (lora_dropout): ModuleDict(
                (default): Dropout(p=0.1, inplace=False)
              )
              (lora_A): ModuleDict(
                (default): Linear(in_features=768, out_features=64, bias=False)
              )
              (lora_B): ModuleDict(
                (default): Linear(in_features=64, out_features=768, bias=False)
              )
              (lora_embedding_A

In [8]:
peft_model.print_trainable_parameters()

trainable params: 2,376,214 || all params: 111,875,372 || trainable%: 2.1240


## Train with PEFT-BERT model

### Preprocess train/test data with appropriate tokenizer

In [9]:
def preprocess_function(examples):
    """Preprocess the dataset by returning tokenized examples."""
    tokens = tokenizer(examples["input_text"], padding="max_length", truncation=True)
    tokens['label'] = [label2id[l] for l in examples["label"]]
    return tokens

splits = ['train', 'test']

tokenized_ds = {} 

for split in splits:
    tokenized_ds[split] = dataset[split].map(preprocess_function, batched=True)

print(tokenized_ds)

Map:   0%|          | 0/853 [00:00<?, ? examples/s]

Map:   0%|          | 0/212 [00:00<?, ? examples/s]

{'train': Dataset({
    features: ['label', 'input_text', 'input_ids', 'token_type_ids', 'attention_mask'],
    num_rows: 853
}), 'test': Dataset({
    features: ['label', 'input_text', 'input_ids', 'token_type_ids', 'attention_mask'],
    num_rows: 212
})}


In [10]:
print("A tokenized training input example:")
print(tokenized_ds["train"][0]["input_ids"])
print("\n")
print("A tokenized training label example:")
print(tokenized_ds["train"][0]["label"])

A tokenized training input example:
[101, 1045, 1005, 2310, 2042, 2383, 1037, 2843, 1997, 3255, 1999, 2026, 3300, 1998, 2067, 1012, 1045, 1005, 2310, 2036, 2042, 2383, 4390, 2007, 2026, 5703, 1998, 12016, 1012, 1045, 1005, 2310, 2042, 21454, 1037, 2843, 1998, 2026, 10726, 2514, 5410, 1012, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

### Train PEFT-BERT model

In [11]:
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()*100}


# The HuggingFace Trainer class handles the training and eval loop for PyTorch for us.
# Read more about it here https://huggingface.co/docs/transformers/main_classes/trainer
trainer = Trainer(
    model=peft_model,
    args=TrainingArguments(
        output_dir="bert-lora",
        learning_rate=2e-3,
        # Reduce the batch size if you don't have enough memory
        per_device_train_batch_size=4,
        per_device_eval_batch_size=4,
        num_train_epochs=15,
        weight_decay=0.01,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_ds["train"],
    eval_dataset=tokenized_ds["test"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

print("Starting to train...")
trainer.train()

  trainer = Trainer(


Starting to train...


  0%|          | 0/3210 [00:00<?, ?it/s]

  attn_output = torch.nn.functional.scaled_dot_product_attention(


  0%|          | 0/53 [00:00<?, ?it/s]

{'eval_loss': 2.2217342853546143, 'eval_accuracy': 28.77358490566038, 'eval_runtime': 7.5106, 'eval_samples_per_second': 28.227, 'eval_steps_per_second': 7.057, 'epoch': 1.0}


  0%|          | 0/53 [00:00<?, ?it/s]

{'eval_loss': 0.8072425127029419, 'eval_accuracy': 73.11320754716981, 'eval_runtime': 7.5486, 'eval_samples_per_second': 28.085, 'eval_steps_per_second': 7.021, 'epoch': 2.0}
{'loss': 2.0961, 'grad_norm': 4.281309127807617, 'learning_rate': 0.001688473520249221, 'epoch': 2.34}


  0%|          | 0/53 [00:00<?, ?it/s]

{'eval_loss': 0.6400379538536072, 'eval_accuracy': 79.24528301886792, 'eval_runtime': 7.8099, 'eval_samples_per_second': 27.145, 'eval_steps_per_second': 6.786, 'epoch': 3.0}


  0%|          | 0/53 [00:00<?, ?it/s]

{'eval_loss': 0.47103190422058105, 'eval_accuracy': 87.73584905660378, 'eval_runtime': 7.6676, 'eval_samples_per_second': 27.649, 'eval_steps_per_second': 6.912, 'epoch': 4.0}
{'loss': 0.504, 'grad_norm': 2.1039960384368896, 'learning_rate': 0.0013769470404984424, 'epoch': 4.67}


  0%|          | 0/53 [00:00<?, ?it/s]

{'eval_loss': 0.2670034170150757, 'eval_accuracy': 91.50943396226415, 'eval_runtime': 7.5753, 'eval_samples_per_second': 27.986, 'eval_steps_per_second': 6.996, 'epoch': 5.0}


  0%|          | 0/53 [00:00<?, ?it/s]

{'eval_loss': 0.37952834367752075, 'eval_accuracy': 88.67924528301887, 'eval_runtime': 7.5986, 'eval_samples_per_second': 27.9, 'eval_steps_per_second': 6.975, 'epoch': 6.0}


  0%|          | 0/53 [00:00<?, ?it/s]

{'eval_loss': 0.25924569368362427, 'eval_accuracy': 91.50943396226415, 'eval_runtime': 7.6764, 'eval_samples_per_second': 27.617, 'eval_steps_per_second': 6.904, 'epoch': 7.0}
{'loss': 0.1809, 'grad_norm': 2.9897329807281494, 'learning_rate': 0.0010654205607476634, 'epoch': 7.01}


  0%|          | 0/53 [00:00<?, ?it/s]

{'eval_loss': 0.274662047624588, 'eval_accuracy': 92.45283018867924, 'eval_runtime': 7.6683, 'eval_samples_per_second': 27.646, 'eval_steps_per_second': 6.912, 'epoch': 8.0}


  0%|          | 0/53 [00:00<?, ?it/s]

{'eval_loss': 0.15629740059375763, 'eval_accuracy': 95.28301886792453, 'eval_runtime': 7.751, 'eval_samples_per_second': 27.351, 'eval_steps_per_second': 6.838, 'epoch': 9.0}
{'loss': 0.0618, 'grad_norm': 0.28020086884498596, 'learning_rate': 0.0007538940809968847, 'epoch': 9.35}


  0%|          | 0/53 [00:00<?, ?it/s]

{'eval_loss': 0.17917802929878235, 'eval_accuracy': 95.75471698113208, 'eval_runtime': 7.6844, 'eval_samples_per_second': 27.588, 'eval_steps_per_second': 6.897, 'epoch': 10.0}


  0%|          | 0/53 [00:00<?, ?it/s]

{'eval_loss': 0.23992441594600677, 'eval_accuracy': 93.86792452830188, 'eval_runtime': 7.7859, 'eval_samples_per_second': 27.229, 'eval_steps_per_second': 6.807, 'epoch': 11.0}
{'loss': 0.0221, 'grad_norm': 0.061193205416202545, 'learning_rate': 0.0004423676012461059, 'epoch': 11.68}


  0%|          | 0/53 [00:00<?, ?it/s]

{'eval_loss': 0.17434239387512207, 'eval_accuracy': 95.75471698113208, 'eval_runtime': 7.8977, 'eval_samples_per_second': 26.843, 'eval_steps_per_second': 6.711, 'epoch': 12.0}


  0%|          | 0/53 [00:00<?, ?it/s]

{'eval_loss': 0.17112325131893158, 'eval_accuracy': 96.22641509433963, 'eval_runtime': 8.0524, 'eval_samples_per_second': 26.328, 'eval_steps_per_second': 6.582, 'epoch': 13.0}


  0%|          | 0/53 [00:00<?, ?it/s]

{'eval_loss': 0.1830337941646576, 'eval_accuracy': 95.28301886792453, 'eval_runtime': 8.194, 'eval_samples_per_second': 25.873, 'eval_steps_per_second': 6.468, 'epoch': 14.0}
{'loss': 0.011, 'grad_norm': 0.005021353252232075, 'learning_rate': 0.0001308411214953271, 'epoch': 14.02}


  0%|          | 0/53 [00:00<?, ?it/s]

{'eval_loss': 0.18218187987804413, 'eval_accuracy': 94.81132075471697, 'eval_runtime': 8.0089, 'eval_samples_per_second': 26.47, 'eval_steps_per_second': 6.618, 'epoch': 15.0}
{'train_runtime': 1236.0515, 'train_samples_per_second': 10.352, 'train_steps_per_second': 2.597, 'train_loss': 0.44819680712304755, 'epoch': 15.0}


TrainOutput(global_step=3210, training_loss=0.44819680712304755, metrics={'train_runtime': 1236.0515, 'train_samples_per_second': 10.352, 'train_steps_per_second': 2.597, 'total_flos': 3460510521077760.0, 'train_loss': 0.44819680712304755, 'epoch': 15.0})

## Save PEFT model

In [12]:
peft_bert_model_path = "fine-tuned-peft-model-weights/"
peft_model.save_pretrained(peft_bert_model_path)

# check the size of the saved model
for file_name in os.listdir(peft_bert_model_path):
    file_size = os.path.getsize(peft_bert_model_path + file_name)
    print(f"File Name: {file_name}; File Size: {file_size / 1024:.2f}KB")

File Name: adapter_config.json; File Size: 0.68KB
File Name: adapter_model.safetensors; File Size: 9288.92KB
File Name: README.md; File Size: 4.97KB


## Predict using saved PEFT model

In [13]:
# load PEFT model and predict 

config = PeftConfig.from_pretrained('fine-tuned-peft-model-weights/')
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", 
                                                    num_labels=22)
model = PeftModel.from_pretrained(model, 'fine-tuned-peft-model-weights/')

trainer = Trainer(
    model=model,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer)
)

test_predictions = trainer.predict(tokenized_ds['test'])
print(test_predictions)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  0%|          | 0/27 [00:00<?, ?it/s]

PredictionOutput(predictions=array([[ -8.259553  ,  -6.122034  , -13.978308  , ...,   0.65173936,
         -0.40767723,   4.227573  ],
       [ -2.1437595 ,  -5.514235  ,  -8.889157  , ...,  -1.5703596 ,
         -0.38314834,   3.6594245 ],
       [ -3.5587769 ,  -8.240544  ,  -8.518811  , ...,   5.849249  ,
         -3.198     ,  -5.308662  ],
       ...,
       [  6.2493873 ,  -6.7055526 ,  23.585546  , ...,   3.6332254 ,
         10.616182  ,  -2.0770226 ],
       [  3.6201847 ,  -8.221383  ,  20.452805  , ...,   3.485164  ,
         11.051069  ,  -0.9255245 ],
       [ -8.896778  ,  -4.8086658 ,  -6.2238364 , ...,   8.318126  ,
         -2.9648018 ,  -4.5387554 ]], dtype=float32), label_ids=array([16, 16,  8, 17,  9,  3, 10, 18,  1, 17,  1,  3, 14, 19, 16, 14, 15,
        2,  7, 13, 15, 14, 10, 14,  4,  1, 20,  4,  9, 13, 19,  3, 17,  2,
        4, 10, 12, 11, 19,  9,  5, 11,  1, 12, 15,  8,  7, 17, 20,  6,  6,
        4, 15,  3, 14, 13, 14,  5,  7, 17, 10,  2, 21,  5, 10, 18,  3, 