# Problem Statement: Multi-Intent Classification for Customer Support Chatbot

You are building a customer support chatbot for a retail company that sells products online. The goal of the chatbot is to assist customers in multiple ways, including answering product-related queries, tracking orders, handling refunds, and providing general information about store policies.

Each customer query can have multiple intents, such as requesting information about a product and also asking about its availability. The chatbot should be able to classify these queries into one or more intents simultaneously. For example, the query "What are the features of the latest phone, and can I return it?" has two intents: one related to product information and the other related to returns

### Objective:
* Create a model that can classify a given customer query into one or more intents from the following categories:

* Product Inquiry - Queries related to product details (e.g., features, pricing, availability).

* Order Tracking - Queries related to tracking orders (e.g., "Where is my order?").

* Refund Request - Queries related to requesting a refund (e.g., "How do I return this product?").

* Store Policy - Queries related to the store’s policies (e.g., return policies, delivery times).

The model should be able to classify one or more intents for each query.

# Approach:

I am using Pretrained BERT model as my choice for following reasons....
1. Model is better at understanding <b>context</b> more deeply.
2. You can get state-of-the-art results with relatively small datasets using <b>Transfer Learning.</b>
3. Since it's pre-trained, you don’t need to train it from scratch all you have to do is fine-tune the existing model for desired problem.


### Python version: 3.10.17
### libraries mainly used : scikit-learn, pytorch, transformers

In [1]:
#importing all necessary libraries
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import numpy as np
import re
from transformers import BertTokenizer
import torch
from torch.utils.data import Dataset, DataLoader
from transformers import BertTokenizer, BertModel
import torch.nn as nn
import torch.nn.functional as F

from sklearn.metrics import f1_score, precision_score, recall_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.utils.class_weight import compute_class_weight

from ast import literal_eval

from torch.optim import AdamW

In [2]:
#reading the prepared data....
df = pd.read_csv('prepared_data.csv')
df

Unnamed: 0,text,intents
0,could you explain what is your return policy??...,"['Store Policy', 'Product Inquiry']"
1,what are the different color options that are ...,"['Product Inquiry', 'Store Policy', 'Order Tra..."
2,can you explain your warranty terms? would lik...,"['Store Policy', 'Refund Request']"
3,would like to get details about can you provid...,"['Order Tracking', 'Store Policy', 'Refund Req..."
4,could you explain do you have this goods in st...,"['Product Inquiry', 'Refund Request', 'Order T..."
...,...,...
95,can i return my order? please help me understa...,"['Refund Request', 'Store Policy']"
96,what is the price of the new headphones? i do ...,"['Product Inquiry', 'Refund Request', 'Store P..."
97,can i cancel my shipment and get a money back?...,"['Refund Request', 'Order Tracking', 'Store Po..."
98,could you explain i do not want this goods any...,"['Refund Request', 'Product Inquiry']"


###  As we have multiple classes, MultiLabelBinarizer from scikit-learn is used for label transform

In [3]:
df['intents'] = df['intents'].apply(literal_eval)

mlb = MultiLabelBinarizer()
df['label_vector'] = mlb.fit_transform(df['intents']).tolist()
intent_labels = mlb.classes_
print(intent_labels)

['Order Tracking' 'Product Inquiry' 'Refund Request' 'Store Policy']


In [4]:
df.head()

Unnamed: 0,text,intents,label_vector
0,could you explain what is your return policy??...,"[Store Policy, Product Inquiry]","[0, 1, 0, 1]"
1,what are the different color options that are ...,"[Product Inquiry, Store Policy, Order Tracking]","[1, 1, 0, 1]"
2,can you explain your warranty terms? would lik...,"[Store Policy, Refund Request]","[0, 0, 1, 1]"
3,would like to get details about can you provid...,"[Order Tracking, Store Policy, Refund Request]","[1, 0, 1, 1]"
4,could you explain do you have this goods in st...,"[Product Inquiry, Refund Request, Order Tracking]","[1, 1, 1, 0]"


In [5]:
#set the device to use gpu if available or else cpu
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


### As we have a class imbalance, so computing class weights to avoid overfitting

In [6]:
# Compute weights using sklearn utility or normalized inverse frequency

labels = np.array(df['label_vector'].tolist())

# Sum over axis to get counts per class
label_counts = labels.sum(axis=0)

# Normalize inverse frequency
class_weights = len(labels) / (label_counts + 1e-6)
class_weights = class_weights / class_weights.sum() * len(class_weights)

# Convert to tensor
pos_weights = torch.tensor(class_weights, dtype=torch.float).to(device)
pos_weights


tensor([1.0027, 1.0350, 0.9437, 1.0186])

### splitting data into train,test and validation sets

In [7]:
train_texts, temp_texts, train_labels, temp_labels = train_test_split(df['text'], df['label_vector'], test_size=0.3, random_state=42)
val_texts, test_texts, val_labels, test_labels = train_test_split(temp_texts, temp_labels, test_size=0.5, random_state=42)
#using BERT tokenizer for tokenizing data.....
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

def encode(texts):
    return tokenizer(list(texts), padding=True, truncation=True, return_tensors="pt")

train_encodings = encode(train_texts)
val_encodings = encode(val_texts)
test_encodings = encode(test_texts)

In [8]:
#using custom class to transform the data for BERT model
class CustomDataset(Dataset):
    def __init__(self, encodings, labels):
        self.encodings = encodings
        self.labels = labels
    def __getitem__(self, idx):
        item = {key: val[idx] for key, val in self.encodings.items()}
        item['labels'] = torch.tensor(self.labels[idx], dtype=torch.float)
        return item
    def __len__(self):
        return len(self.labels)

train_dataset = CustomDataset(train_encodings, train_labels.tolist())
val_dataset = CustomDataset(val_encodings, val_labels.tolist())
test_dataset = CustomDataset(test_encodings, test_labels.tolist())

train_loader = DataLoader(train_dataset, batch_size=8, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=8)
test_loader = DataLoader(test_dataset, batch_size=8)

In [9]:
#custom class to use a pre-trained BertModel with a custom classification head:
class BertForMultiLabel(nn.Module):
    def __init__(self, num_labels):
        super(BertForMultiLabel, self).__init__()
        self.bert = BertModel.from_pretrained('bert-base-uncased')
        self.classifier = nn.Linear(self.bert.config.hidden_size, num_labels)

    def forward(self, input_ids, attention_mask, labels=None):
        outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)
        logits = self.classifier(outputs.pooler_output)  # [batch_size, num_labels]
        return logits

In [10]:
from transformers import BertConfig
bert_model = BertModel.from_pretrained('bert-base-uncased')
config = bert_model.config
print(config)

BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "torch_dtype": "float32",
  "transformers_version": "4.52.4",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}



In [11]:
model = BertForMultiLabel(num_labels=len(intent_labels))
model.to(device)
optimizer = AdamW(model.parameters(), lr=2e-5)
#BCEWithLogitsLoss for multi-label classification
loss_fn = nn.BCEWithLogitsLoss(pos_weight=pos_weights)

Preparing model for training....
using train data to train and validation data for validating the model

For each epoch, printing the training loss and validation loss... 

In [12]:
model.train()
for epoch in range(15):
    print(f"Epoch {epoch + 1}")
    total_train_loss = 0

    for batch in train_loader:
        model.train()
        optimizer.zero_grad()

        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        labels = batch['labels'].to(device)

        logits = model(input_ids=input_ids, attention_mask=attention_mask)
        loss = loss_fn(logits, labels)
        total_train_loss += loss.item()

        loss.backward()
        optimizer.step()

    avg_train_loss = total_train_loss / len(train_loader)
    print(f"Average Training Loss: {avg_train_loss:.4f}")

    # Validation phase
    model.eval()
    total_val_loss = 0

    with torch.no_grad():
        for batch in val_loader:
            input_ids = batch['input_ids'].to(device)
            attention_mask = batch['attention_mask'].to(device)
            labels = batch['labels'].to(device)

            logits = model(input_ids=input_ids, attention_mask=attention_mask)
            loss = loss_fn(logits, labels)
            total_val_loss += loss.item()

    avg_val_loss = total_val_loss / len(val_loader)
    print(f"Validation Loss: {avg_val_loss:.4f}")


Epoch 1
Average Training Loss: 0.6703
Validation Loss: 0.6756
Epoch 2
Average Training Loss: 0.6143
Validation Loss: 0.6275
Epoch 3
Average Training Loss: 0.5702
Validation Loss: 0.6184
Epoch 4
Average Training Loss: 0.5353
Validation Loss: 0.6400
Epoch 5
Average Training Loss: 0.5196
Validation Loss: 0.5969
Epoch 6
Average Training Loss: 0.4736
Validation Loss: 0.6305
Epoch 7
Average Training Loss: 0.4527
Validation Loss: 0.5669
Epoch 8
Average Training Loss: 0.4085
Validation Loss: 0.5823
Epoch 9
Average Training Loss: 0.3721
Validation Loss: 0.5590
Epoch 10
Average Training Loss: 0.3290
Validation Loss: 0.5287
Epoch 11
Average Training Loss: 0.2913
Validation Loss: 0.4949
Epoch 12
Average Training Loss: 0.2512
Validation Loss: 0.4815
Epoch 13
Average Training Loss: 0.2227
Validation Loss: 0.4672
Epoch 14
Average Training Loss: 0.1951
Validation Loss: 0.4650
Epoch 15
Average Training Loss: 0.1713
Validation Loss: 0.4425


Evaluating the trained model using following metrics...
1. hamming_loss - measures the fraction of incorrect labels to the total number of labels, across all instances.
2. precision_score.
3. recall_scor. 
4. f1_score.
5. subset accuracy_score - all the predicted labels exactly match all the true labels 

In [16]:
from sklearn.metrics import hamming_loss, precision_score, recall_score, f1_score, accuracy_score
import numpy as np

def evaluate(model, dataloader, device):
    model.eval()
    preds, true_labels = [], []

    with torch.no_grad():
        for batch in dataloader:
            input_ids = batch['input_ids'].to(device)
            attention_mask = batch['attention_mask'].to(device)
            labels = batch['labels'].to(device)

            outputs = model(input_ids, attention_mask=attention_mask)
            #logits = outputs.logits
            sigmoid_logits = torch.sigmoid(outputs)
            preds.extend(sigmoid_logits.cpu().numpy())
            true_labels.extend(labels.cpu().numpy())

    preds = np.array(preds)
    true_labels = np.array(true_labels)
    binarized_preds = (preds >= 0.5).astype(int)

    metrics = {
        'Hamming Loss': hamming_loss(true_labels, binarized_preds),
        'Precision': precision_score(true_labels, binarized_preds, average='macro', zero_division=0),
        'Recall': recall_score(true_labels, binarized_preds, average='macro', zero_division=0),
        'F1 Score': f1_score(true_labels, binarized_preds, average='macro', zero_division=0),
        'Subset Accuracy': accuracy_score(true_labels, binarized_preds)
    }
    return metrics


In [17]:
test_metrics = evaluate(model, test_loader, device)#device

print("Test Metrics:", test_metrics)

Test Metrics: {'Hamming Loss': 0.05, 'Precision': 0.9295454545454546, 'Recall': 1.0, 'F1 Score': 0.9618421052631578, 'Subset Accuracy': 0.8}


In [18]:
def predict(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
    with torch.no_grad():
        outputs = model(inputs["input_ids"], inputs["attention_mask"])
        probs = torch.sigmoid(outputs)
        preds = outputs[0].numpy()
        intents = [mlb.classes_[i] for i, p in enumerate(probs[0]) if p > 0.8]
    return intents

print(predict("I’d like to request a refund for my recent order and also get more details about your latest laptop models."))
print(predict('would like to get details about product i received is different from the one that i placed shipment, need help with repayment. i have been waiting for the order long time do you have this product in stock?'))
print(predict('How can I get a refund for my order #56789?'))

['Product Inquiry', 'Store Policy']
['Order Tracking', 'Product Inquiry', 'Refund Request']
['Refund Request']
