# **Multilingual Intent Detection Using Transformer Models**

#### In an increasingly globalized digital world, the ability of machines to accurately understand user intent across languages is vital to building inclusive and intelligent systems. This project aims to solve the complex challenge of multilingual intent detection using state-of-the-art transformer-based models, focusing on the MASSIVE dataset a benchmark for multilingual SLU (Spoken Language Understanding) covering 51 languages and 60 intent classes.

#### We fine-tuned the XLM-RoBERTa base model, renowned for its cross-lingual capabilities, to classify English utterances by intent. The model was trained on 5,000 filtered and preprocessed English utterances, leveraging Hugging Face’s Trainer API, optimized with techniques like attention-based tokenization, max-length truncation (64 tokens), and dynamic padding to ensure memory efficiency and stable convergence.

#### The model's performance was evaluated using standard classification metrics:

- Accuracy: 86%

- Weighted F1-score: 0.855

- Evaluation loss: 0.606

- Evaluation throughput: ~487 samples/sec

#### During training, the model demonstrated strong scalability and computational efficiency, with:

- Training throughput: ~92 samples/sec

- Total training FLOPs: 493.6 trillion

- Average training loss: 0.787

#### These results position our solution as both highly accurate and computationally efficient, offering a robust intent detection backbone for multilingual digital assistants, voice interfaces, and smart automation platforms. The use of XLM-RoBERTa, combined with task-specific fine-tuning, ensures both linguistic generalization and domain-specific accuracy, making this project a standout in applied NLP innovation.

In [4]:
import tensorflow as tf
if tf.test.gpu_device_name():
    print('GPU Found')
    !nvidia-smi
else:
    print("GPU not Found")

GPU Found
Mon May 19 05:42:29 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla P100-PCIE-16GB           Off |   00000000:00:04.0 Off |                    0 |
| N/A   35C    P0             30W /  250W |     257MiB /  16384MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                      

I0000 00:00:1747633349.470304      35 gpu_device.cc:2022] Created device /device:GPU:0 with 15513 MB memory:  -> device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0


# 1. Import Libraries

In [25]:
!pip install -q transformers datasets scikit-learn

In [26]:
from datasets import load_dataset
from transformers import XLMRobertaTokenizer, XLMRobertaForSequenceClassification, Trainer, TrainingArguments
from sklearn.metrics import accuracy_score, f1_score
import numpy as np

# 2. Load MASSIVE dataset

### Dataset
#### MASSIVE (Multilingual Amazon SLURP for Slot-filling, Intent classification, and Virtual assistant Evaluation) is a robust dataset tailored for multilingual intent classification tasks. It comprises:
- Size: 1 million utterances
- Languages: 51 typologically diverse languages
- Domains: 18
- Intents: 60
- Slots: 55
- This dataset is ideal for training and evaluating multilingual models in intent detection tasks.

In [28]:
dataset = load_dataset("AmazonScience/massive")

# 3. Filter for English samples

In [29]:
en_train = dataset['train'].filter(lambda x: x['locale'] == 'en-US').shuffle(seed=42).select(range(5000))
en_test = dataset['test'].filter(lambda x: x['locale'] == 'en-US').shuffle(seed=42).select(range(1000))

In [8]:
print(en_train.column_names)

['id', 'locale', 'partition', 'scenario', 'intent', 'utt', 'annot_utt', 'worker_id', 'slot_method', 'judgments']


# 4. Load tokenizer and model

In [30]:
tokenizer = XLMRobertaTokenizer.from_pretrained('xlm-roberta-base')
model = XLMRobertaForSequenceClassification.from_pretrained('xlm-roberta-base', num_labels=60)

Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at xlm-roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


# 5. Tokenization

In [31]:
def preprocess(example):
    return tokenizer(example['utt'], truncation=True, padding='max_length', max_length=64)

In [34]:
train_enc = en_train.map(preprocess, batched=True)
test_enc = en_test.map(preprocess, batched=True)

train_enc = train_enc.rename_column("intent", "labels")
test_enc = test_enc.rename_column("intent", "labels")

train_enc.set_format(type='torch', columns=['input_ids', 'attention_mask', 'labels'])
test_enc.set_format(type='torch', columns=['input_ids', 'attention_mask', 'labels'])


In [35]:
def compute_metrics(pred):
    labels = pred.label_ids
    preds = np.argmax(pred.predictions, axis=1)
    return {
        "accuracy": accuracy_score(labels, preds),
        "f1": f1_score(labels, preds, average='weighted')
    }


In [39]:
from transformers import TrainingArguments
print(TrainingArguments.__module__)


transformers.training_args


# 6. Training Parameters

In [46]:
training_args = TrainingArguments(
    output_dir="./results",
    do_train=True,
    do_eval=True,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    logging_dir="./logs",
    eval_strategy="epoch",
    save_strategy="epoch",
    report_to=[]  # 👈 disables W&B logging
)


In [47]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_enc,
    eval_dataset=test_enc,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)

  trainer = Trainer(


In [48]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,0.884678,0.78,0.754111
2,1.044600,0.673678,0.835,0.827545
3,1.044600,0.606342,0.86,0.855545


TrainOutput(global_step=939, training_loss=0.7874292083187733, metrics={'train_runtime': 162.6467, 'train_samples_per_second': 92.224, 'train_steps_per_second': 5.773, 'total_flos': 493590136320000.0, 'train_loss': 0.7874292083187733, 'epoch': 3.0})

In [49]:
eval_results = trainer.evaluate()
print("Evaluation Results:", eval_results)

Evaluation Results: {'eval_loss': 0.6063419580459595, 'eval_accuracy': 0.86, 'eval_f1': 0.8555451589559256, 'eval_runtime': 2.0545, 'eval_samples_per_second': 486.73, 'eval_steps_per_second': 30.664, 'epoch': 3.0}


# 7. Resting on Coustom Data

In [60]:
def predict_intent(text, model, tokenizer, label_list):
    device = next(model.parameters()).device
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding='max_length', max_length=64)
    if "token_type_ids" in inputs:
        inputs.pop("token_type_ids")
    inputs = {k: v.to(device) for k, v in inputs.items()}
    model.eval()
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
    predicted_class_id = logits.argmax().item()
    return label_list[predicted_class_id]


In [73]:
# Get label names
label_list = train_enc.features['labels'].names

# Predict on a new test example
test_text = "Reproduz alguma música"
predicted_intent = predict_intent(test_text, model, tokenizer, label_list)
print(f"Input text: {test_text}")
print(f"Predicted intent: {predicted_intent}")

Input text: Reproduz alguma música
Predicted intent: play_music
