# Intent Classification

In this notebook, we will use a pre-trained BERT model to classify the intent of a question. We will use a dataset of questions and their corresponding intents to train the model. The model will be fine-tuned on the dataset and then used to classify new questions into one of the intents.

Creating an NLP-based framework to parse the input question to categorize the intent into one of the question types.

Question Types:
1. Why is action A not used in the plan, rather than being used?
2. Why is action A used in the plan, rather than not being used?
3. Why is action A used in state S, rather than action B?

## Data Aquistion

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
from datasets import Dataset
import torch
from sklearn.metrics import classification_report, accuracy_score

In [5]:
# Load the CSV file into a DataFrame
df = pd.read_csv('../../../data/extra/combined_dataset.csv')

# Display the first few rows of the DataFrame
print(f"Number of rows in the dataset: {df.shape[0]}")
df.head()

Number of rows in the dataset: 346


Unnamed: 0,text,label
0,Why was action A excluded from the plan?,1
1,What were the reasons for omitting action A fr...,1
2,Can you explain why action A was not considere...,1
3,Why didn't the plan include action A?,1
4,What is the rationale for not using action A i...,1


## Data Preprocessing

In [None]:
# Convert the intent labels to numbers
df['label'] = pd.factorize(df['intent'])[0] + 1

In [36]:
# Train/test split
train_df, test_df = train_test_split(df, test_size=0.2, random_state=13, stratify=df['label'])

# Convert DataFrame to Hugging Face Dataset
train_dataset = Dataset.from_pandas(train_df)
test_dataset = Dataset.from_pandas(test_df)

In [37]:
# Tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

# Tokenization function
def tokenize_function(examples):
    return tokenizer(examples['text'], padding="max_length", truncation=True, max_length=128)

# Tokenize datasets
tokenized_train = train_dataset.map(tokenize_function, batched=True)
tokenized_test = test_dataset.map(tokenize_function, batched=True)

Map:   0%|          | 0/276 [00:00<?, ? examples/s]

Map:   0%|          | 0/70 [00:00<?, ? examples/s]

## Model Training

In [6]:
from code.templates.torch_utils import get_best_available_device

device = get_best_available_device()
print(f"Using device: {device}")

Using device: mps


In [39]:
# Load pre-trained BERT models for sequence classification
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=3)
model.to(device);

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [40]:
# Training arguments
training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=3,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    eval_strategy="epoch",
    save_steps=10_000,
    save_total_limit=2,
)

# Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_test,
)

# Train the models
trainer.train()

Epoch,Training Loss,Validation Loss
1,No log,0.490747
2,No log,0.048875
3,No log,0.005299


TrainOutput(global_step=105, training_loss=0.22288242521740143, metrics={'train_runtime': 84.7918, 'train_samples_per_second': 9.765, 'train_steps_per_second': 1.238, 'total_flos': 54464477469696.0, 'train_loss': 0.22288242521740143, 'epoch': 3.0})

## Model Evaluation

In [41]:
# Evaluate the models
results = trainer.evaluate()
print(results)

{'eval_loss': 0.005299075040966272, 'eval_runtime': 1.4641, 'eval_samples_per_second': 47.809, 'eval_steps_per_second': 6.147, 'epoch': 3.0}


In [42]:
# Function to classify new questions
def classify_question(question):
    inputs = tokenizer(question, return_tensors="pt", padding=True, truncation=True, max_length=128).to(device)
    outputs = model(**inputs)
    prediction = torch.argmax(outputs.logits, dim=1)
    return prediction.item()

In [43]:
# Predict and compare
test_df['predicted_label'] = test_df['text'].apply(classify_question)
y_true = test_df['label']
y_pred = test_df['predicted_label']

In [44]:
# Print the classification report
print(classification_report(y_true, y_pred, zero_division=0.0))
print(f"Accuracy: {accuracy_score(y_true, y_pred):.2f}")

              precision    recall  f1-score   support

           1       0.77      1.00      0.87        24
           2       0.59      1.00      0.74        23
           3       0.00      0.00      0.00        23

    accuracy                           0.67        70
   macro avg       0.45      0.67      0.54        70
weighted avg       0.46      0.67      0.54        70

Accuracy: 0.67


## Result Analysis

In [45]:
# Display the rows in which the predictions didn't match the label
incorrect_predictions = test_df[test_df['label'] != test_df['predicted_label']]

print(f"{incorrect_predictions.shape[0]} incorrect predictions out of {test_df.shape[0]} test samples.")
incorrect_predictions

23 incorrect predictions out of 70 test samples.


Unnamed: 0,text,label,predicted_label
278,Why is 'push the box up' preferred over 'movin...,3,2
275,What's the rationale behind 'moving up' and no...,3,1
245,Why is 'move down' preferred over 'move up' in...,3,2
328,Can you explain why 'move the blank space up' ...,3,2
232,Why is 'move left' selected instead of 'push r...,3,2
287,Why is 'moving up' used rather than 'pushing d...,3,1
227,Why is 'move right' chosen instead of 'push up...,3,2
215,Why was 'push box to the left' chosen over 'mo...,3,2
32,Why is action A applied in state S instead of ...,3,1
218,Why did the plan opt for 'push box to the left...,3,2


All predictions for the 3rd label seem to be wrong.

In [46]:
# Get rows which were incorrectly classified but not originally labeled 3
incorrect_predictions[(incorrect_predictions['label'] != 3)]

Unnamed: 0,text,label,predicted_label
