# Intent Router

# Banking Intent Router Implementation

In this example, we will demonstrate LLM-based routing for a banking use case by classifying incoming prompts based on their intent. We'll utilize the [Banking dataset](https://github.com/PolyAI-LDN/task-specific-datasets/tree/master/banking_data), which provides an excellent foundation for developing an intent classification model in the banking domain.


## Dataset Information
* Source: Banking dataset from PolyAI-LDN
* Size: 13,000 customer service queries
* Split: 10,003 training examples, 3,080 test examples
* Categories: 77 distinct intents grouped into 10 main categories

## Intent Categories

To streamline the routing process, we will group the 77 detailed banking intents into 10 broader categories:

* Billing and Payments
* Account Management
* Security and Fraud Prevention
* Transaction Support
* Technical Support
* Financial Planning
* International Services
* Customer Education
* Dispute Resolution
* Product Information

## Implementation Approach

This hierarchical approach enables efficient routing and management of customer inquiries while maintaining the granularity needed for accurate response generation. By organizing queries into these logical categories, we can ensure consistent handling of similar requests while optimizing the routing process for large-scale customer service operations.

In [1]:
import pandas as pd
import random

In [2]:
def prepare_data(path):
    intent_mapping = {
        'billing': [
            'bill_pay', 'refund_not_showing_up', 'pending_card_payment', 'declined_transfer',
            'card_payment_fee_charged', 'card_payment_not_recognised', 'transfer_fee_charged',
            'transfer_not_received_by_recipient', 'declined_cash_withdrawal', 'pending_top_up',
            'pending_transfer', 'top_up_by_card_charge', 'top_up_by_bank_transfer_charge',
            'top_up_failed', 'balance_not_updated_after_deposit', 'request_refund',
            'reverted_transfer', 'failed_transfer', 'receiving_money', 'sending_money',
            'withdraw_money', 'pending_cash_withdrawal', 'card_swallowed', 'top_up_limits',
            'verify_top_up', 'top_up_by_bank_transfer', 'top_up_by_card',
            'balance_not_updated_after_cheque_or_cash_deposit', 'topping_up_by_card',
            'transfer_into_account', 'transfer_not_received', 'transfer_fee', 'card_payment_fee',
            'declined_card_payment', 'transaction_charged_twice', 'direct_debit_payment_not_recognised',
            'balance_not_updated_after_bank_transfer', 'top_up_by_cash_or_cheque',
            'extra_charge_on_statement', 'Refund_not_showing_up'
        ],
        'account_management': [
            'activate_my_card', 'closing_account', 'edit_personal_details', 'verify_my_identity',
            'change_pin', 'terminate_account', 'unable_to_verify_identity', 'passcode_forgotten',
            'pin_blocked', 'order_physical_card', '0', 'getting_virtual_card',
            'get_physical_card', 'card_arrival', 'card_about_to_expire', 'card_linking'
        ],
        'security_and_fraud_prevention': [
            'compromised_card', 'lost_or_stolen_card', 'verify_source_of_funds',
            'lost_or_stolen_phone', 'why_verify_identity'
        ],
        'transaction_support': [
            'cancel_transfer', 'exchange_rate', 'wrong_exchange_rate_applied',
            'card_payment_wrong_exchange_rate', 'exchange_via_app', 'transfer_timing',
            'wrong_amount_of_cash_received', 'wrong_exchange_rate_for_cash_withdrawal'
        ],
        'technical_support': [
            'apple_pay_or_google_pay', 'card_not_working', 'virtual_card_not_working',
            'contactless_not_working', 'automatic_top_up', 'top_up_reverted'
        ],
        'financial_planning': [
            'disposable_card_limits', 'exchange_charge', 'cash_withdrawal_charge'
        ],
        'international_services': [
            'country_support', 'fiat_currency_support', 'supported_cards_and_currencies',
            'visa_or_mastercard'
        ],
        'customer_education': [
            'atm_support', 'age_limit', 'card_acceptance', 'beneficiary_not_allowed'
        ],
        'dispute_resolution': [
            'reverted_card_payment?', 'cash_withdrawal_not_recognised'
        ],
        'product_information': [
            'get_disposable_virtual_card', 'card_delivery_estimate'
        ]
    }
    def map_category_to_intent(category):
        for intent, categories in intent_mapping.items():
            if category in categories:
                return intent
        return 'other'

    dataset_frame = pd.read_csv(path, names=['text', 'category'], header=0)

    # Map the categories to our intents
    dataset_frame['intent'] = dataset_frame['category'].apply(map_category_to_intent)

    return dataset_frame

### Train and Test Data

In [3]:
train_path = 'https://raw.githubusercontent.com/PolyAI-LDN/task-specific-datasets/master/banking_data/train.csv'
test_path = 'https://raw.githubusercontent.com/PolyAI-LDN/task-specific-datasets/master/banking_data/test.csv'

### Data Preparation

In [4]:
df = prepare_data(train_path)

print("\nCount of prompts for each intent:")
print(df['intent'].value_counts())

# Filter out 'other' intents
df_filtered = df[df['intent'] != 'other']

# Create a new dataframe with all prompts and their intents
result_df = df_filtered[['text', 'intent']]

# Shuffle the dataframe
result_df = result_df.sample(frac=1).reset_index(drop=True)

# Display the first few prompts
print("\nFirst few prompts:")
print(result_df.head(10))

# Save to CSV
result_df.to_csv('categorized_prompts_train.csv', index=False)
print(f"\nSaved {len(result_df)} prompts to 'categorized_prompts_train.csv'")

# Print a few example prompts for each category
for intent in ['billing', 'networking', 'sales', 'marketing']:
    print(f"\n{intent.capitalize()} prompts:")
    for prompt in result_df[result_df['intent'] == intent]['text'].head(5):
        print(f"- {prompt}")


Count of prompts for each intent:
intent
billing                          4178
account_management               1681
transaction_support              1025
technical_support                 587
security_and_fraud_prevention     523
international_services            519
financial_planning                419
customer_education                412
dispute_resolution                321
product_information               209
other                             129
Name: count, dtype: int64

First few prompts:
                                                text  \
0     Can someone assist me with activating my card?   
1  What is going on with it saying my card paymen...   
2  If my funds are running low, will the app top ...   
3                Why do I need to verify the top-up?   
4  Do I get any sort of discount on volume for a ...   
5                      Can I top up using my cheque?   
6  I live in the EU.  Can I order one of your cards?   
7  Can you tell me how long it would take, to 

In [5]:
df = prepare_data(test_path)

print("\nCount of prompts for each intent:")
print(df['intent'].value_counts())

# Filter out 'other' intents
df_filtered = df[df['intent'] != 'other']

# Create a new dataframe with all prompts and their intents
result_df = df_filtered[['text', 'intent']]

# Shuffle the dataframe
result_df = result_df.sample(frac=1).reset_index(drop=True)

# Display the first few prompts
print("\nFirst few prompts:")
print(result_df.head(10))

# Save to CSV
result_df.to_csv('categorized_prompts_test.csv', index=False)
print(f"\nSaved {len(result_df)} prompts to 'categorized_prompts_test.csv'")

# Print a few example prompts for each category
for intent in ['billing', 'networking', 'sales', 'marketing']:
    print(f"\n{intent.capitalize()} prompts:")
    for prompt in result_df[result_df['intent'] == intent]['text'].head(5):
        print(f"- {prompt}")


Count of prompts for each intent:
intent
billing                          1160
account_management                560
transaction_support               280
technical_support                 240
security_and_fraud_prevention     200
international_services            160
customer_education                160
financial_planning                120
product_information                80
dispute_resolution                 80
other                              40
Name: count, dtype: int64

First few prompts:
                                                text  \
0          What should I do if I forgot my passcode?   
1         The exchange rate on my purchase is wrong.   
2            lost phone, can i still access account?   
3    There is a payment in the App that is not mine.   
4                               About this card PIN?   
5  what's the process for getting a disposable vi...   
6                Why was I charged for card payment?   
7      I tried verifying my ID, but it won't l

# Training Router Model 

In this example we will use DeBERTa-v3 transformer for our intent classification model, the DeBERTa-v3 offers key advantages

### Performance
* DeBERTa-v3 demonstrates exceptional performance on natural language understanding tasks, achieving significant improvements over previous models:
* Surpasses previous state-of-the-art models on the GLUE benchmark by +1.37%
* Achieves 90.6% accuracy on MNLI-matched and 88.4% F1 score on SQuAD v2.0
* Outperforms RoBERTa-base, XLNet-base, and ELECTRA-base on key benchmarks


### Deployment
* DeBERTa-v3 offers practical advantages for deployment:
* Contains only 86M backbone parameters while maintaining a rich 128K token vocabulary
* Provides faster prediction times compared to larger language models
* Demonstrates strong performance even with limited training data

These characteristics make DeBERTa-v3 a good choice for our banking intent classification task, where we need both accurate understanding of customer queries and efficient processing for real-world applications.

### Key Components of the Router Model
* Model Architecture: DeBERTa-v3 transformer
* Training Parameters:
* Batch size: 64
* Maximum sequence length: 128
* Learning rate: 2e-5
* Data Processing: Custom dataset class for handling banking queries
* Model Export: TorchScript for deployment

In [6]:
import pandas as pd
import torch
from torch.utils.data import Dataset, DataLoader
from transformers import DebertaV2Tokenizer, DebertaV2ForSequenceClassification
from torch.optim import AdamW
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import classification_report
import numpy as np
from tqdm import tqdm
from torch.nn.utils import clip_grad_norm_
from torch.optim.lr_scheduler import ReduceLROnPlateau

In [8]:
# Custom dataset class
class CustomDataset(Dataset):
    def __init__(self, texts, labels, tokenizer, max_length):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_length = max_length

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        text = self.texts[idx]
        label = self.labels[idx]

        encoding = self.tokenizer.encode_plus(
            text,
            add_special_tokens=True,
            max_length=self.max_length,
            return_token_type_ids=False,
            padding='max_length',
            truncation=True,
            return_attention_mask=True,
            return_tensors='pt',
        )

        return {
            'input_ids': encoding['input_ids'].flatten(),
            'attention_mask': encoding['attention_mask'].flatten(),
            'labels': torch.tensor(label, dtype=torch.long)
        }

# Prepare data
train_path = 'categorized_prompts_train.csv'
test_path = 'categorized_prompts_test.csv'
train_df = pd.read_csv(train_path, names=['text', 'intent'], header=0)
test_df = pd.read_csv(test_path, names=['text', 'intent'], header=0)
le = LabelEncoder()
train_df['label'] = le.fit_transform(train_df['intent'])
test_df['label'] = le.transform(test_df['intent'])

In [7]:
train_df

NameError: name 'train_df' is not defined

In [9]:
le.classes_

array(['account_management', 'billing', 'customer_education',
       'dispute_resolution', 'financial_planning',
       'international_services', 'product_information',
       'security_and_fraud_prevention', 'technical_support',
       'transaction_support'], dtype=object)

## Training

### Model Initialization
The code initializes the DeBERTa-v3 model and tokenizer from Microsoft's pre-trained base model, configuring it for single-label classification with the number of labels matching our intent categories. The model uses a maximum sequence length of 128 tokens and a batch size of 64 for efficient training.

### Data Processing
The implementation creates custom datasets and dataloaders for both training and testing data:
Uses PyTorch's DataLoader with shuffle enabled for training data
Implements multi-worker data loading with 4 workers
Enables pin_memory for faster data transfer to GPU

### Training Configuration
The training setup includes:
AdamW optimizer with a learning rate of 2e-5
ReduceLROnPlateau scheduler for adaptive learning rate adjustment
Gradient clipping with a maximum norm of 1.0 to prevent exploding gradients
Automatic device selection (GPU/CPU) for training
We will also save the labels to do our post-processing.

### Training Loop
The training process runs for 10 epochs 

### Loss calculation and backpropagation
Regular evaluation of test data
F1-score monitoring for model improvement
Automatic saving of the best-performing model, tokenizer, and metadata
The model saves checkpoints when it achieves a better weighted average F1-score, storing the model weights, tokenizer configuration, label encoder, and performance metadata for later use.

In [10]:
# Initialize tokenizer and model
tokenizer = DebertaV2Tokenizer.from_pretrained('microsoft/DeBERTa-v3-base')
model = DebertaV2ForSequenceClassification.from_pretrained('microsoft/DeBERTa-v3-base', num_labels=len(le.classes_),problem_type="single_label_classification")

# Create datasets and dataloaders
max_length = 128
batch_size = 64

train_dataset = CustomDataset(train_df['text'].tolist(), train_df['label'].tolist(), tokenizer, max_length)
test_dataset = CustomDataset(test_df['text'].tolist(), test_df['label'].tolist(), tokenizer, max_length)

train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=4, pin_memory=True)
test_dataloader = DataLoader(test_dataset, batch_size=batch_size)

# Set up the optimizer
optimizer = AdamW(model.parameters(), lr=2e-5)
scheduler = ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=2, verbose=True)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

# Training loop
num_epochs = 10
best_f1 = 0
for epoch in range(num_epochs):
    model.train()
    total_loss = 0
    progress_bar = tqdm(train_dataloader, desc=f"Epoch {epoch + 1}/{num_epochs}")
    for batch in progress_bar:
        optimizer.zero_grad()
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        labels = batch['labels'].to(device)

        outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
        loss = outputs.loss
        total_loss += loss.item()
        loss.backward()
        clip_grad_norm_(model.parameters(), max_norm=1.0)
        optimizer.step()
        
        progress_bar.set_postfix({'loss': f"{loss.item():.4f}"})

    avg_loss = total_loss / len(train_dataloader)
    print(f"Epoch {epoch + 1}/{num_epochs}, Average Loss: {avg_loss:.4f}")

    # Evaluation
    model.eval()
    test_predictions = []
    test_true_labels = []

    with torch.no_grad():
        for batch in tqdm(test_dataloader, desc="Evaluating"):
            input_ids = batch['input_ids'].to(device)
            attention_mask = batch['attention_mask'].to(device)
            labels = batch['labels'].to(device)

            outputs = model(input_ids, attention_mask=attention_mask)
            _, preds = torch.max(outputs.logits, dim=1)
            
            test_predictions.extend(preds.cpu().tolist())
            test_true_labels.extend(labels.cpu().tolist())

    # Print classification report
    report = classification_report(test_true_labels, test_predictions, target_names=le.classes_, output_dict=True)
    print(f"Epoch {epoch + 1}/{num_epochs}")
    print(classification_report(test_true_labels, test_predictions, target_names=le.classes_))

    # Update the learning rate
    scheduler.step(avg_loss)

    # Save the best model
    if report['weighted avg']['f1-score'] > best_f1:
        best_f1 = report['weighted avg']['f1-score']
        # Save the entire model
        model.save_pretrained('intent_classfier/model')
        # Save the tokenizer
        tokenizer.save_pretrained('intent_classfier/model')
        # Save the label encoder
        import joblib
        joblib.dump(le, 'intent_classfier/label_encoder.joblib')

        # save some metadata
        import json
        metadata = {
            'f1_score': best_f1,
            'num_labels': len(le.classes_),
            'problem_type': 'single_label_classification'
        }
        with open('intent_classfier/metadata.json', 'w') as f:
            json.dump(metadata, f)

        print(f"New best model saved with F1-score: {best_f1:.4f}")

print("Training completed!")

tokenizer_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

spm.model:   0%|          | 0.00/2.46M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/579 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/371M [00:00<?, ?B/s]

Some weights of DebertaV2ForSequenceClassification were not initialized from the model checkpoint at microsoft/DeBERTa-v3-base and are newly initialized: ['classifier.bias', 'classifier.weight', 'pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/10: 100%|██████████| 155/155 [00:21<00:00,  7.32it/s, loss=0.9359]


Epoch 1/10, Average Loss: 1.2800


Evaluating: 100%|██████████| 48/48 [00:02<00:00, 17.94it/s]
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Epoch 1/10
                               precision    recall  f1-score   support

           account_management       0.79      0.98      0.88       560
                      billing       0.82      0.98      0.89      1160
           customer_education       0.78      0.22      0.34       160
           dispute_resolution       0.00      0.00      0.00        80
           financial_planning       0.75      0.45      0.56       120
       international_services       0.59      0.86      0.70       160
          product_information       0.00      0.00      0.00        80
security_and_fraud_prevention       0.91      0.54      0.68       200
            technical_support       0.92      0.55      0.69       240
          transaction_support       0.75      0.89      0.81       280

                     accuracy                           0.79      3040
                    macro avg       0.63      0.55      0.55      3040
                 weighted avg       0.76      0.79      0.75    

Epoch 2/10: 100%|██████████| 155/155 [00:20<00:00,  7.49it/s, loss=0.1234]


Epoch 2/10, Average Loss: 0.4709


Evaluating: 100%|██████████| 48/48 [00:02<00:00, 17.96it/s]


Epoch 2/10
                               precision    recall  f1-score   support

           account_management       0.95      0.97      0.96       560
                      billing       0.95      0.96      0.96      1160
           customer_education       0.87      0.91      0.89       160
           dispute_resolution       0.89      0.69      0.77        80
           financial_planning       0.94      0.84      0.89       120
       international_services       0.95      0.87      0.91       160
          product_information       0.85      0.84      0.84        80
security_and_fraud_prevention       0.96      0.90      0.93       200
            technical_support       0.88      0.89      0.89       240
          transaction_support       0.87      0.94      0.90       280

                     accuracy                           0.93      3040
                    macro avg       0.91      0.88      0.89      3040
                 weighted avg       0.93      0.93      0.93    

Epoch 3/10: 100%|██████████| 155/155 [00:20<00:00,  7.46it/s, loss=0.0819]


Epoch 3/10, Average Loss: 0.2227


Evaluating: 100%|██████████| 48/48 [00:02<00:00, 17.86it/s]


Epoch 3/10
                               precision    recall  f1-score   support

           account_management       0.97      0.97      0.97       560
                      billing       0.97      0.96      0.97      1160
           customer_education       0.86      0.97      0.91       160
           dispute_resolution       0.83      0.88      0.85        80
           financial_planning       0.91      0.82      0.86       120
       international_services       0.97      0.92      0.95       160
          product_information       0.77      0.89      0.83        80
security_and_fraud_prevention       0.96      0.98      0.97       200
            technical_support       1.00      0.84      0.91       240
          transaction_support       0.88      0.94      0.91       280

                     accuracy                           0.94      3040
                    macro avg       0.91      0.92      0.91      3040
                 weighted avg       0.94      0.94      0.94    

Epoch 4/10: 100%|██████████| 155/155 [00:20<00:00,  7.45it/s, loss=0.0170]


Epoch 4/10, Average Loss: 0.1470


Evaluating: 100%|██████████| 48/48 [00:02<00:00, 17.87it/s]


Epoch 4/10
                               precision    recall  f1-score   support

           account_management       0.96      0.97      0.96       560
                      billing       0.97      0.96      0.97      1160
           customer_education       0.95      0.95      0.95       160
           dispute_resolution       0.93      0.89      0.91        80
           financial_planning       0.98      0.82      0.90       120
       international_services       0.97      0.94      0.95       160
          product_information       0.74      0.93      0.82        80
security_and_fraud_prevention       0.91      0.96      0.93       200
            technical_support       0.99      0.89      0.94       240
          transaction_support       0.87      0.96      0.92       280

                     accuracy                           0.95      3040
                    macro avg       0.93      0.93      0.92      3040
                 weighted avg       0.95      0.95      0.95    

Epoch 5/10: 100%|██████████| 155/155 [00:20<00:00,  7.45it/s, loss=0.0708]


Epoch 5/10, Average Loss: 0.1052


Evaluating: 100%|██████████| 48/48 [00:02<00:00, 17.85it/s]


Epoch 5/10
                               precision    recall  f1-score   support

           account_management       0.94      0.99      0.96       560
                      billing       0.97      0.97      0.97      1160
           customer_education       0.97      0.94      0.96       160
           dispute_resolution       0.88      0.91      0.90        80
           financial_planning       0.98      0.84      0.91       120
       international_services       0.96      0.94      0.95       160
          product_information       0.94      0.81      0.87        80
security_and_fraud_prevention       0.96      0.94      0.95       200
            technical_support       0.99      0.93      0.95       240
          transaction_support       0.88      0.96      0.92       280

                     accuracy                           0.95      3040
                    macro avg       0.95      0.92      0.93      3040
                 weighted avg       0.95      0.95      0.95    

Epoch 6/10: 100%|██████████| 155/155 [00:20<00:00,  7.45it/s, loss=0.1994]


Epoch 6/10, Average Loss: 0.0821


Evaluating: 100%|██████████| 48/48 [00:02<00:00, 17.87it/s]


Epoch 6/10
                               precision    recall  f1-score   support

           account_management       0.96      0.98      0.97       560
                      billing       0.97      0.97      0.97      1160
           customer_education       0.95      0.97      0.96       160
           dispute_resolution       0.88      0.88      0.88        80
           financial_planning       0.97      0.88      0.92       120
       international_services       0.97      0.93      0.95       160
          product_information       0.87      0.93      0.90        80
security_and_fraud_prevention       0.95      0.98      0.97       200
            technical_support       0.98      0.93      0.95       240
          transaction_support       0.92      0.95      0.94       280

                     accuracy                           0.96      3040
                    macro avg       0.94      0.94      0.94      3040
                 weighted avg       0.96      0.96      0.96    

Epoch 7/10: 100%|██████████| 155/155 [00:20<00:00,  7.46it/s, loss=0.0044]


Epoch 7/10, Average Loss: 0.0569


Evaluating: 100%|██████████| 48/48 [00:02<00:00, 17.85it/s]


Epoch 7/10
                               precision    recall  f1-score   support

           account_management       0.97      0.98      0.98       560
                      billing       0.98      0.96      0.97      1160
           customer_education       0.98      0.90      0.94       160
           dispute_resolution       0.85      0.94      0.89        80
           financial_planning       0.96      0.89      0.93       120
       international_services       0.89      0.96      0.92       160
          product_information       0.86      0.93      0.89        80
security_and_fraud_prevention       0.95      0.98      0.97       200
            technical_support       1.00      0.93      0.96       240
          transaction_support       0.90      0.95      0.93       280

                     accuracy                           0.96      3040
                    macro avg       0.93      0.94      0.94      3040
                 weighted avg       0.96      0.96      0.96    

Epoch 8/10: 100%|██████████| 155/155 [00:20<00:00,  7.46it/s, loss=0.0033]


Epoch 8/10, Average Loss: 0.0539


Evaluating: 100%|██████████| 48/48 [00:02<00:00, 17.84it/s]


Epoch 8/10
                               precision    recall  f1-score   support

           account_management       0.96      0.99      0.97       560
                      billing       0.97      0.97      0.97      1160
           customer_education       0.97      0.93      0.95       160
           dispute_resolution       0.88      0.90      0.89        80
           financial_planning       0.98      0.84      0.91       120
       international_services       0.94      0.93      0.93       160
          product_information       0.90      0.90      0.90        80
security_and_fraud_prevention       0.97      0.96      0.97       200
            technical_support       0.99      0.92      0.95       240
          transaction_support       0.88      0.94      0.91       280

                     accuracy                           0.95      3040
                    macro avg       0.94      0.93      0.94      3040
                 weighted avg       0.96      0.95      0.95    

Epoch 9/10: 100%|██████████| 155/155 [00:20<00:00,  7.46it/s, loss=0.0055]


Epoch 9/10, Average Loss: 0.0356


Evaluating: 100%|██████████| 48/48 [00:02<00:00, 17.79it/s]


Epoch 9/10
                               precision    recall  f1-score   support

           account_management       0.97      0.98      0.98       560
                      billing       0.97      0.96      0.96      1160
           customer_education       0.93      0.94      0.94       160
           dispute_resolution       0.87      0.95      0.91        80
           financial_planning       0.97      0.85      0.91       120
       international_services       0.96      0.93      0.94       160
          product_information       0.91      0.90      0.91        80
security_and_fraud_prevention       0.96      0.97      0.97       200
            technical_support       0.93      0.93      0.93       240
          transaction_support       0.91      0.94      0.92       280

                     accuracy                           0.95      3040
                    macro avg       0.94      0.94      0.94      3040
                 weighted avg       0.95      0.95      0.95    

Epoch 10/10: 100%|██████████| 155/155 [00:20<00:00,  7.45it/s, loss=0.0039]


Epoch 10/10, Average Loss: 0.0349


Evaluating: 100%|██████████| 48/48 [00:02<00:00, 17.74it/s]

Epoch 10/10
                               precision    recall  f1-score   support

           account_management       0.98      0.98      0.98       560
                      billing       0.95      0.97      0.96      1160
           customer_education       0.99      0.87      0.92       160
           dispute_resolution       0.86      0.90      0.88        80
           financial_planning       0.95      0.90      0.92       120
       international_services       0.97      0.91      0.94       160
          product_information       0.86      0.91      0.88        80
security_and_fraud_prevention       0.96      0.98      0.97       200
            technical_support       0.99      0.88      0.93       240
          transaction_support       0.89      0.96      0.92       280

                     accuracy                           0.95      3040
                    macro avg       0.94      0.93      0.93      3040
                 weighted avg       0.95      0.95      0.95   




# Model Evaluation

In [11]:
model = DebertaV2ForSequenceClassification.from_pretrained('intent_classfier/model')
tokenizer = DebertaV2Tokenizer.from_pretrained('intent_classfier/model')
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
max_length = 128

In [12]:
def classify_prompt(prompt):
    model.eval()
    encoding = tokenizer.encode_plus(
        prompt,
        add_special_tokens=True,
        max_length=max_length,
        return_token_type_ids=False,
        padding='max_length',
        truncation=True,
        return_attention_mask=True,
        return_tensors='pt',
    )
    
    input_ids = encoding['input_ids'].to(device)
    attention_mask = encoding['attention_mask'].to(device)
    
    with torch.no_grad():
        outputs = model(input_ids, attention_mask=attention_mask)
        print(outputs)
        _, preds = torch.max(outputs.logits, dim=1)
    
    return le.inverse_transform([preds.item()])[0]

In [13]:
new_prompt = "I want to open a new account, do you any new opening bonus ?"
predicted_category = classify_prompt(new_prompt)
print(f"The prompt '{new_prompt}' is classified as: {predicted_category}")

SequenceClassifierOutput(loss=None, logits=tensor([[ 4.6592,  1.2768, -1.5765, -2.6721, -0.5963, -1.7226, -0.2149,  0.0449,
          1.6722, -2.9001]], device='cuda:0'), hidden_states=None, attentions=None)
The prompt 'I want to open a new account, do you any new opening bonus ?' is classified as: account_management


In [14]:
# Create label json for decoding and post processing
le = joblib.load('intent_classfier/label_encoder.joblib')
label_map = {i: label for i, label in enumerate(le.classes_)}

with open('labels.json', 'w') as f:
    json.dump(label_map, f)

# Convert to TorchScript

Lets convert the traced models to torchscript, so that we cand add the uoter model to triton server to serve

In [16]:
import torch
from transformers import DebertaV2Tokenizer, DebertaV2ForSequenceClassification
import json
import joblib

# Wrapper module to convert dict output to tuple
class WrapperModule(torch.nn.Module):
    def __init__(self, model):
        super().__init__()
        self.model = model

    def forward(self, input_ids, attention_mask):
        outputs = self.model(input_ids, attention_mask=attention_mask)
        return (outputs.logits,)

# Check for GPU availability
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

# Load the saved model and tokenizer
model_path = 'intent_classfier/model'
tokenizer = DebertaV2Tokenizer.from_pretrained(model_path)
model = DebertaV2ForSequenceClassification.from_pretrained(model_path)

# Move model to the appropriate device
model = model.to(device)

# Wrap the model
wrapped_model = WrapperModule(model)
wrapped_model.eval()

# Load metadata
with open('intent_classfier/metadata.json', 'r') as f:
    metadata = json.load(f)

# Create dummy input
max_length = 128
dummy_text = "This is a dummy input for tracing"
dummy_input = tokenizer(dummy_text, return_tensors="pt", padding="max_length", max_length=max_length, truncation=True)

# Move input tensors to the appropriate device
dummy_input = {k: v.to(device) for k, v in dummy_input.items()}

# Get output from the original model
with torch.no_grad():
    original_output = model(**dummy_input)

# Trace the wrapped model
traced_model = torch.jit.trace(wrapped_model, (dummy_input['input_ids'], dummy_input['attention_mask']))

# Save the traced model
torch.jit.save(traced_model, 'triton_template/intent_router/1/model.pt')
print("Model traced and saved as traced_model.pt")

# Load the traced model
loaded_traced_model = torch.jit.load('triton_template/intent_router/1/model.pt')
loaded_traced_model = loaded_traced_model.to(device)

# Get output from the traced model
with torch.no_grad():
    traced_output = loaded_traced_model(dummy_input['input_ids'], dummy_input['attention_mask'])

# Compare outputs
assert torch.allclose(original_output.logits, traced_output[0], atol=1e-5), "Outputs do not match!"
print("Validation successful: Original and traced model outputs match.")

# Test with a new input
test_text = "I want to open a new account, do you any new opening bonus ?"
test_input = tokenizer(test_text, return_tensors="pt", padding="max_length", max_length=max_length, truncation=True)
test_input = {k: v.to(device) for k, v in test_input.items()}

with torch.no_grad():
    original_test_output = model(**test_input)
    traced_test_output = loaded_traced_model(test_input['input_ids'], test_input['attention_mask'])

assert torch.allclose(original_test_output.logits, traced_test_output[0], atol=1e-5), "Test outputs do not match!"
print("Test successful: Original and traced model outputs match for new input.")

# Print predicted class
le = joblib.load('intent_classfier/label_encoder.joblib')
predicted_class = le.classes_[traced_test_output[0].cpu().argmax().item()]
print(f"Predicted class for '{test_text}': {predicted_class}")

Using device: cuda
Model traced and saved as traced_model.pt
Validation successful: Original and traced model outputs match.
Test successful: Original and traced model outputs match for new input.
Predicted class for 'I want to open a new account, do you any new opening bonus ?': account_management


In [17]:
traced_output

(tensor([[ 5.6651,  1.1302, -1.6061, -2.5253, -1.1188, -2.1029, -0.4598,  0.2260,
           0.5146, -1.8105]], device='cuda:0'),)

# Triton Deployment

Now that we have the traced model in torch script, we can add this to the [Triton Inference Server](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/contents.html) and use the [ensemble pipeline feature](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/ensemble_models.html) to set up the pre and post processing pipeline. 

The pre and post-processing code is available under the `triton_template/preprocessing_intent_router/` and `triton_template/postprocessing_intent_router/` directories and the `triton_template/intent_router_ensemble/` contains the config on how the pre-processing, model and post-processing are linked together. 

This will be the same as the code downloaded from NGC when setting up the default task router.

This is organized in the following structure in the `/routers` directory with the following format

```
model_repository/
├── intent_router/
│   ├── 1/
│   │   └── model.pt
│   └── config.pbtxt
├── intent_router_ensemble/
│   ├── 1/
│   └── config.pbtxt
├── postprocessing_intent_router/
│   ├── 1/
│   │   ├── labels.json
│   │   ├── model.py
│   │   └── __pycache__/
│   │       └── model.cpython-310.pyc
│   └── config.pbtxt
└── preprocessing_intent_router/
    ├── 1/
    │   ├── model.py
    │   └── __pycache__/
    │       └── model.cpython-310.pyc
    └── config.pbtxt
```

Now copy the contents of `triton_template/` folder to the `/model_repository` 

In [18]:
!cp -r triton_template/* /model_repository

On your original machine, not within the Docker JupyterLab notebook, start the router server by running `make up`. 

In [20]:
!curl -v http://router-server:8000/v2/models/intent_router_ensemble/ready

*   Trying 172.21.0.4:8000...
* Connected to router-server (172.21.0.4) port 8000 (#0)
> GET /v2/models/intent_router_ensemble/ready HTTP/1.1
> Host: router-server:8000
> User-Agent: curl/7.81.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Length: 0
< Content-Type: text/plain
< 
* Connection #0 to host router-server left intact


In [23]:
import tritonclient.http as httpclient
import numpy as np
import json

def load_labels():
    with open('labels.json', 'r') as f:
        return json.load(f)

def send_request(triton_client, text):
    input_text = np.array([[text]], dtype=object)
    inputs = [httpclient.InferInput("INPUT", input_text.shape, "BYTES")]
    inputs[0].set_data_from_numpy(input_text)

    outputs = [httpclient.InferRequestedOutput("OUTPUT")]

    response = triton_client.infer(model_name="intent_router_ensemble", inputs=inputs, outputs=outputs)
    return response

def map_vector_to_label(one_hot_vector, labels):
    predicted_index = np.argmax(one_hot_vector)
    return labels[str(predicted_index)]

# Load labels
labels = load_labels()

# Initialize Triton client
triton_client = httpclient.InferenceServerClient(url="router-server:8000")

# Example prompt
prompt = "What are the benefits of getting a disposable virtual card?"

# Send request
result = send_request(triton_client, prompt)

# # Get one-hot encoded vector from the response
one_hot_vector = result.as_numpy("OUTPUT")

# # Map the vector to a label
predicted_label = map_vector_to_label(one_hot_vector[0], labels)

print(f"Input prompt: '{prompt}'")
print(f"Predicted intent: {predicted_label}")

Input prompt: 'What are the benefits of getting a disposable virtual card?'
Predicted intent: product_information
