# Intent Detection LoRA

Sample notebook utilizing the `DORIE` package to fine tune an insurance intent detection, classification, model. This serves as an example demonstrating how to utilize the library for downstream classification tasks. This sequence classification task focus on classifying customer utterances into a multi-class intent to help the contact center route the phone call/ chat through a pre-defined or dynamic route to satify the customers needs.

In [1]:
from pathlib import Path
import sys

sys.path.insert(0, str(Path().resolve().parent))

from libs.dorie.intent.finetune import Intent
from libs.dorie.loader.adaptation import return_peft_model
from libs.dorie.loader.datatokenizer import MyDataset

from peft import TaskType

  from .autonotebook import tqdm as notebook_tqdm


## Load the dataset
See [Synthetic data notebook](./Synthetic_Data_Generation.ipynb) for more information on generating data utilizing OpenAI API. The dataset will be loaded from HuggingFace Data Hub. You can also navigate to HuggingFace to see the [synthetic_insurance_data](https://huggingface.co/datasets/stevenloaiza/synthetic_insurance_data).

In [2]:
# Set Data configuration, the intent class will load the data from the path specified here
datapath= "stevenloaiza/synthetic_insurance_data"
dataclass = MyDataset(path=datapath)

# The data parameter will not be used explicitly, since it will be loader in the finetuning modules implicitly. 
# Loading it here is just to show the data that will be used for the finetuning process
mydata = dataclass.loader()
mydata

Generating train split: 100%|██████████| 785/785 [00:00<00:00, 76540.17 examples/s]
Generating test split: 100%|██████████| 435/435 [00:00<00:00, 227865.90 examples/s]
Map: 100%|██████████| 785/785 [00:00<00:00, 4691.39 examples/s]
Map: 100%|██████████| 435/435 [00:00<00:00, 3302.01 examples/s]


DatasetDict({
    train: Dataset({
        features: ['label', 'text', 'input_ids', 'attention_mask'],
        num_rows: 785
    })
    test: Dataset({
        features: ['label', 'text', 'input_ids', 'attention_mask'],
        num_rows: 435
    })
})

## Load PEFT LoRA model

 * This section of the code is responsible for loading a PEFT (Parameter-Efficient Fine-Tuning) LoRA (Low-Rank Adaptation) model.
 * PEFT LoRA models are used to fine-tune large language models efficiently by adapting a small number of parameters.
 * This approach is particularly useful for tasks like intent detection, where the model needs to be adapted to understand specific intents from user inputs.
 * The model loading process typically involves specifying the model architecture, loading pre-trained weights, and preparing the model for inference or further training.

In [3]:
BASE_MODEL = "roberta-base"
LORA_CONFIG = {
    "task_type": TaskType.SEQ_CLS, 
    "inference_mode": False, 
    "r": 8, 
    "lora_alpha": 32, 
    "lora_dropout": 0.1
}
lora_model = return_peft_model(
    model_name_or_path = f"{BASE_MODEL}", 
    lora_config=LORA_CONFIG,
    num_labels=dataclass.numLabels,
    verbose=True
)

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 888,580 || all params: 125,537,288 || trainable%: 0.7078


## Set Up Intent Detection Class

In this section, we will set up the intent detection class using the `Intent` class from the `DORIE` package. We will configure the class with the dataset and the PEFT LoRA model that we have loaded in the previous steps. This setup will allow us to fine-tune the model for the intent detection task.


In [16]:
TRAINING_CONFIG = {   
    "baseModel": f"{BASE_MODEL}",
    "device": "cpu",
    "modelArgs": {
        "output_dir": "./results",
        "num_train_epochs": 10,
        "per_device_train_batch_size": 32,
        "per_device_eval_batch_size": 32,
        "eval_strategy": "epoch",
        "save_strategy": "epoch",
        "learning_rate": 5e-5,
        "save_total_limit": 2,
        "load_best_model_at_end": "true",
        "metric_for_best_model": "accuracy",
        "greater_is_better": "true",
        "save_on_each_node": "true"
    },
    "model": lora_model
}

intent_classifier = Intent(
    datapath = datapath,
    config = TRAINING_CONFIG,
    dataclass = dataclass,
    # Local model trainer
    trainer = None,
    inference_test = False
)

## Train
The `train` method of the `intent_classifier` object is called to start the training process. This method will fine-tune the model using the provided dataset and configuration settings.

In [17]:
intent_classifier.train()

 10%|█         | 25/250 [00:36<04:20,  1.16s/it]
 10%|█         | 25/250 [00:43<04:20,  1.16s/it]

{'eval_loss': 1.2358351945877075, 'eval_accuracy': 0.46436781609195404, 'eval_runtime': 7.2618, 'eval_samples_per_second': 59.903, 'eval_steps_per_second': 1.928, 'epoch': 1.0}


 20%|██        | 50/250 [01:16<04:12,  1.26s/it]
 20%|██        | 50/250 [01:24<04:12,  1.26s/it]

{'eval_loss': 1.1695953607559204, 'eval_accuracy': 0.6229885057471264, 'eval_runtime': 7.4215, 'eval_samples_per_second': 58.613, 'eval_steps_per_second': 1.886, 'epoch': 2.0}


 30%|███       | 75/250 [01:57<03:21,  1.15s/it]
 30%|███       | 75/250 [02:04<03:21,  1.15s/it]

{'eval_loss': 0.8723050355911255, 'eval_accuracy': 0.7839080459770115, 'eval_runtime': 7.2394, 'eval_samples_per_second': 60.088, 'eval_steps_per_second': 1.934, 'epoch': 3.0}


 40%|████      | 100/250 [02:38<02:38,  1.06s/it]
 40%|████      | 100/250 [02:45<02:38,  1.06s/it]

{'eval_loss': 0.5939724445343018, 'eval_accuracy': 0.8045977011494253, 'eval_runtime': 7.0745, 'eval_samples_per_second': 61.488, 'eval_steps_per_second': 1.979, 'epoch': 4.0}


 50%|█████     | 125/250 [03:24<02:41,  1.29s/it]
 50%|█████     | 125/250 [03:32<02:41,  1.29s/it]

{'eval_loss': 0.5081363916397095, 'eval_accuracy': 0.8068965517241379, 'eval_runtime': 7.3168, 'eval_samples_per_second': 59.452, 'eval_steps_per_second': 1.913, 'epoch': 5.0}


 60%|██████    | 150/250 [04:06<02:04,  1.24s/it]
 60%|██████    | 150/250 [04:13<02:04,  1.24s/it]

{'eval_loss': 0.47060272097587585, 'eval_accuracy': 0.8183908045977012, 'eval_runtime': 7.5607, 'eval_samples_per_second': 57.535, 'eval_steps_per_second': 1.852, 'epoch': 6.0}


 70%|███████   | 175/250 [04:55<01:56,  1.56s/it]
 70%|███████   | 175/250 [05:04<01:56,  1.56s/it]

{'eval_loss': 0.4596567451953888, 'eval_accuracy': 0.8137931034482758, 'eval_runtime': 9.2256, 'eval_samples_per_second': 47.151, 'eval_steps_per_second': 1.518, 'epoch': 7.0}


 80%|████████  | 200/250 [05:45<01:10,  1.41s/it]
 80%|████████  | 200/250 [05:54<01:10,  1.41s/it]

{'eval_loss': 0.44719111919403076, 'eval_accuracy': 0.8160919540229885, 'eval_runtime': 8.6623, 'eval_samples_per_second': 50.218, 'eval_steps_per_second': 1.616, 'epoch': 8.0}


 90%|█████████ | 225/250 [06:29<00:30,  1.24s/it]
 90%|█████████ | 225/250 [06:38<00:30,  1.24s/it]

{'eval_loss': 0.44470515847206116, 'eval_accuracy': 0.8206896551724138, 'eval_runtime': 8.3918, 'eval_samples_per_second': 51.836, 'eval_steps_per_second': 1.668, 'epoch': 9.0}


100%|██████████| 250/250 [07:17<00:00,  1.60s/it]
100%|██████████| 250/250 [07:26<00:00,  1.60s/it]

{'eval_loss': 0.44348734617233276, 'eval_accuracy': 0.8160919540229885, 'eval_runtime': 8.7468, 'eval_samples_per_second': 49.733, 'eval_steps_per_second': 1.601, 'epoch': 10.0}


100%|██████████| 250/250 [07:27<00:00,  1.60s/it]

{'train_runtime': 447.0512, 'train_samples_per_second': 17.56, 'train_steps_per_second': 0.559, 'train_loss': 0.6921998291015625, 'epoch': 10.0}


100%|██████████| 250/250 [07:28<00:00,  1.79s/it]


In [18]:
intent_classifier.config['model'].config

RobertaConfig {
  "_attn_implementation_autoset": true,
  "_name_or_path": "roberta-base",
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "payPrem",
    "1": "addDriver",
    "2": "saleQuote",
    "3": "startClaim"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "addDriver": 1,
    "payPrem": 0,
    "saleQuote": 2,
    "startClaim": 3
  },
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "problem_type": "single_label_classification",
  "transformers_version": "4.46.3",
  "type_vocab_size": 1,
  "use_cache": true,
  "vocab_size": 50265
}

# Load To Hub
This section demonstrates how to save the fine-tuned model to the Hugging Face Hub. This allows for easy sharing and deployment of the model. The `push_to_hub` method is used to upload the model.

In [19]:
# Save Model
intent_classifier.push_to_hub(model_name='stevenloaiza/dorie-intent-classifier')

adapter_model.safetensors: 100%|██████████| 3.56M/3.56M [00:00<00:00, 7.92MB/s]
2025-01-19 20:40:12,670 - __name__ - INFO - Model saved to Hugging Face Hub as stevenloaiza/dorie-intent-classifier


## Inference
Load the learned LoRA weight tensors `A*B` that represent the lower Rank decomposition. This will be combined with the unchanged base pre-train model (recall that LoRA learns the matrices `A*B` corresponding to the update of the weight matrix `W`).

In [20]:
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForSequenceClassification, AutoTokenizer

peft_model_id = "stevenloaiza/dorie-intent-classifier"
config = PeftConfig.from_pretrained(peft_model_id)
inference_model = AutoModelForSequenceClassification.from_pretrained(config.base_model_name_or_path, num_labels=4)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

# Load the Lora model
inference_model = PeftModel.from_pretrained(inference_model, peft_model_id)

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [50]:
inference_model.to("cpu")
_=inference_model.eval()

input_text = "Can you add someone who's just moved in with me to our policy?"

def run_inference(input_text, inference_model):
  inputs = tokenizer(input_text, return_tensors="pt", padding=True, truncation=True, max_length=512)
  inputs = {key: value.to("cpu") for key, value in inputs.items()}

  with torch.no_grad():
      outputs = inference_model(**inputs)
      predictions = torch.argmax(outputs.logits, dim=-1)

  labelMap = {
      0: "payPrem",
      1: "addDriver",
      2: "saleQuote",
      3: "startClaim"
    }
  predicted_intent = labelMap[predictions.item()]

  print(f"Input text: {input_text}")
  print(f"Predicted intent: {predicted_intent}")

run_inference(input_text, inference_model)

Input text: Can you add someone who's just moved in with me to our policy?
Predicted intent: addDriver


## Merging LoRA model with the base model
From the paper LoRA the forward pass yiled `W0x + BAx` which can be combined as `(W0 + BA)x => (W_LoRA * x)`, this ensure no additional overhead in latency during inference.

In [51]:
merged_model = inference_model.merge_and_unload()
run_inference(input_text, merged_model)

Input text: Can you add someone who's just moved in with me to our policy?
Predicted intent: addDriver
