# Enhancing Singapore Airlines' Service Through Automated Sentiment Analysis of Customer Reviews



**Motivation**



## Singapore Airlines Customer Reviews Dataset Information

The [Singapore Airlines Customer Reviews Dataset](https://www.kaggle.com/datasets/kanchana1990/singapore-airlines-reviews) aggregates 10,000 anonymized customer reviews, providing a broad perspective on the passenger experience with Singapore Airlines. 

The dimensions are shown below:
- **`published_date`**: Date and time of review publication.
- **`published_platform`**: Platform where the review was posted.
- **`rating`**: Customer satisfaction rating, from 1 (lowest) to 5 (highest).
- **`type`**: Specifies the content as a review.
- **`text`**: Detailed customer feedback.
- **`title`**: Summary of the review.
- **`helpful_votes`**: Number of users finding the review helpful.

## Additional web scraping of online reviews

During our EDA, we noticed two main trends in the distribution of our dataset:
1. Less than 10% of our reviews were published from the years 2022 to 2024, making it hard for us to capture recent trends in sentiment.
2. Most of the reviews were highly positive, which could mean that SIA had mostly positive reviews, nevertheless we wanted to get more information on negative reviews to improve the robustness of our model.

### TripAdvisor

We scraped more data for airline reviews from TripAdvisor, specifically for the years 2022 to 2024. 
(https://www.tripadvisor.com.sg/Airline_Review-d8729151-Reviews-Singapore-Airlines)

The dimensions are shown below:
- **`Year`**: Year of review publication.
- **`Month`**: Month of review publication.
- **`Title`**: Title of review publication.
- **`Review Text`**: Main text content of review publication.
- **`Rating`**: Numerical rating provided by reviewer (Scale: 1 to 5)


### Skytrax

We also scraped from Skytrax, which is another data source for online reviews. 
(https://www.airlinequality.com/airline-reviews/singapore-airlines/?sortby=post_date%3ADesc&pagesize=100)

The dimensions are shown below:
- **`Year`**: Year of review publication.
- **`Month`**: Month of review publication.
- **`Title`**: Title of review publication.
- **`Review Text`**: Main text content of review publication.
- **`Rating`**: Numerical rating provided by reviewer (Scale: 1 to 10)

## Importing Libraries

Please uncomment the code box below to pip install relevant dependencies for this notebook.

In [1]:
!pip3 install -r requirements.txt



In [5]:
# Import necessary libraries

# Data manipulation
import pandas as pd
import numpy as np
from datetime import datetime 

# Statistical functions
from scipy.stats import zscore

# Text Preprocessing and NLP
import nltk
# Stopwords (common words to ignore) from NLTK
from nltk.corpus import stopwords

# Tokenizing sentences/words
from nltk.corpus import wordnet

# Tokenizing sentences/words
from nltk.tokenize import word_tokenize
# Lemmatization (converting words to their base form)
from nltk.stem import WordNetLemmatizer


# For generating n-grams
from nltk.util import ngrams
from collections import Counter

## Data Preparation (Loading CSV)

Load the three CSV files into a pandas DataFrame `data`.

In [6]:
data = pd.read_csv('final_df.csv')

In [7]:
data.head()

Unnamed: 0,year,month,sentiment,processed_full_review
0,2024,3,Neutral,ok use airlin go singapor london heathrow issu...
1,2024,3,Negative,don give money book paid receiv email confirm ...
2,2024,3,Positive,best airlin world best airlin world seat food ...
3,2024,3,Negative,premium economi seat singapor airlin not worth...
4,2024,3,Negative,imposs get promis refund book flight full mont...


In [8]:
data['sentiment'].value_counts()

sentiment
Positive    7913
Negative    2441
Neutral     1164
Name: count, dtype: int64

In [5]:
data['year'].value_counts()

year
2019    5129
2018    2596
2022    1184
2023    1111
2020     888
2024     514
2021      96
Name: count, dtype: int64

# Llama 3.1

The Meta Llama 3.1 collection of multilingual LLMs is a collection of pretrained and instruction tuned generative model in 8B. The Llama 3.1 instruction tuned text only models are optimised for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.

Llama 3.1 is an auto-regressive language model that uses an optimised transformer architecture. The tuned versions are supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

### Transformer-Based Architecture
- Llama uses a decoder-only transformer architecture, which is similar in structure to models like GPT. In a decoder-only architecture, the model generates text by predicting the next token in a sequence, making it suitable for tasks like text generation, question answering and summarisation.

- Llama uses self-attention with multi-headed attention layers, which allows the model to capture relationships between words and understand context over varying distances.

### Pre-Normalization and RMSNorm
- **Layer Normalization:** Llama uses RMSNorm (Root Mean Square Normalization) instead of LayerNorm, which is the normalization technique typically used in transformers. RMSNorm normalizes each layer by the root mean square of its values rather than the mean and variance.

- **Pre-Normalization:** The model applies normalization layers before the attention and feed-forward layers instead of after. This change has shown to stabilise training and reduce issues that can arise during deep transformer training.

### Rotary Positional Embeddings (RoPE)
- **Positional Encoding:** Llama uses Rotary Positional Embeddings (RoPE), which were also used in models like GPT-NeoX and GPT-J. Unlike absolute positional encodings (used in BERT) or learned positional encodings, RoPE is a form of relative positional encoding that enhances the model's ability to capture position-dependent relationships between tokens.

- **Extended Context Length:** RoPE also allows for efficient scaling of the context window, meaning Llama can handle longer sequences more effectively than traditional transformers with fixed positional encodings. This is useful for tasks that require understanding longer documents or paragraphs.

### GeLU Activation
- Llama uses the GeLU (Gaussian Error Linear Unit) activation function, which is common in modern transformers. GeLU is smoother than ReLU and has shown to improve convergence and overall performance in deep learning models.


In [None]:
# !pip install sentencepiece
# !pip install accelerate>=0.26.0

In [9]:
import torch
from torch.utils.data import TensorDataset, DataLoader
from torch.optim import AdamW
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_score, recall_score, f1_score
from tqdm import tqdm
import tensorflow as tf
import numpy as np
import random

# Set random seed for reproducibility
seed = 42
tf.random.set_seed(seed)
np.random.seed(seed)
random.seed(seed)

# Enable tensor cores for faster matrix multiplications
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.allow_tf32 = True

# Set number of threads for CPU operations
torch.set_num_threads(8)

# Initialize tokenizer and model
model_id = "meta-llama/Llama-3.2-1B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True, use_fast=True)

# Add padding token
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# Initialize model with proper pad token configuration
model = AutoModelForSequenceClassification.from_pretrained(
    model_id,
    num_labels=3,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    pad_token_id=tokenizer.pad_token_id
)

Some weights of LlamaForSequenceClassification were not initialized from the model checkpoint at meta-llama/Llama-3.2-1B-Instruct and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [10]:
# Update model config
model.config.pad_token_id = tokenizer.pad_token_id
model.config.use_cache = False  # Disable cache for training

In [11]:
# Labels mapping
sentiment_dict = {'Negative': 0, 'Neutral': 1, 'Positive': 2}
y = data['sentiment'].map(sentiment_dict).values

# Split dataset
train_d, val_d, train_labels, val_labels = train_test_split(
    data['processed_full_review'], 
    y, 
    test_size=0.2, 
    random_state=42
)
texts_train = list(train_d)
texts_val = list(val_d)

In [12]:
# Tokenize data with proper padding
max_length = 64
tokenized_texts_train = tokenizer(
    texts_train,
    padding=True,
    truncation=True,
    return_tensors="pt",
    max_length=max_length,
    return_attention_mask=True
)
tokenized_texts_val = tokenizer(
    texts_val,
    padding=True,
    truncation=True,
    return_tensors="pt",
    max_length=max_length,
    return_attention_mask=True
)

In [13]:
train_labels = torch.tensor(list(train_labels))
val_labels = torch.tensor(list(val_labels))

# Create TensorDataset
train_dataset = TensorDataset(
    tokenized_texts_train['input_ids'],
    tokenized_texts_train['attention_mask'],
    train_labels
)
val_dataset = TensorDataset(
    tokenized_texts_val['input_ids'],
    tokenized_texts_val['attention_mask'],
    val_labels
)

In [14]:
# Training configuration
batch_size = 8
num_epochs = 3
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
accumulation_steps = 8

# Set up optimizer with weight decay for regularization
optimizer = AdamW(model.parameters(), lr=2e-5, weight_decay=0.01)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.1)

In [15]:
# Create DataLoaders
train_loader = DataLoader(
    train_dataset,
    batch_size=batch_size,
    shuffle=True,
    num_workers=4,
    pin_memory=True,
    prefetch_factor=2,
    persistent_workers=True
)
val_loader = DataLoader(
    val_dataset,
    batch_size=batch_size,
    shuffle=False,
    num_workers=4,
    pin_memory=True,
    prefetch_factor=2,
    persistent_workers=True
)

In [16]:
# Initialize tracking variables
train_losses = []
val_losses = []
val_accuracies = []
train_accuracies = []

In [None]:
# Move model to device
model.to(device)

# Enable automatic mixed precision
scaler = torch.amp.GradScaler()

for epoch in range(num_epochs):
    # Training phase
    model.train()
    train_loss = 0.0
    tr_correct_preds = 0
    all_tr_labels = []
    all_tr_preds = []
    optimizer.zero_grad()

    for batch_idx, batch in enumerate(tqdm(train_loader, desc=f"Epoch {epoch + 1}/{num_epochs} - Training")):
        input_ids, attention_mask, labels = batch
        input_ids = input_ids.to(device)
        attention_mask = attention_mask.to(device)
        labels = labels.to(device)

        # Forward pass with mixed precision
        with torch.cuda.amp.autocast():
            outputs = model(
                input_ids=input_ids,
                attention_mask=attention_mask,
                labels=labels
            )
            tr_loss = outputs.loss / accumulation_steps
            train_loss += tr_loss.item() * accumulation_steps

        # Backward pass with scaled gradients
        scaler.scale(tr_loss).backward()

        # Gradient accumulation
        if (batch_idx + 1) % accumulation_steps == 0 or (batch_idx + 1) == len(train_loader):
            scaler.unscale_(optimizer)
            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
            scaler.step(optimizer)
            scaler.update()
            optimizer.zero_grad()

        # Get predictions
        tr_logits = outputs.logits.detach()
        tr_preds = torch.argmax(tr_logits, dim=1)
        tr_correct_preds += torch.sum(tr_preds == labels).item()

        # Collect predictions and true labels
        all_tr_labels.extend(labels.cpu().numpy())
        all_tr_preds.extend(tr_preds.cpu().numpy())

    scheduler.step()

    # Calculate training metrics
    avg_train_loss = train_loss / len(train_loader)
    train_losses.append(avg_train_loss)
    train_accuracy = tr_correct_preds / len(train_d)
    train_accuracies.append(train_accuracy)

    train_precision = precision_score(all_tr_labels, all_tr_preds, average='weighted')
    train_recall = recall_score(all_tr_labels, all_tr_preds, average='weighted')
    train_f1 = f1_score(all_tr_labels, all_tr_preds, average='weighted')

    # Validation phase
    model.eval()
    val_loss = 0.0
    correct_preds = 0
    all_val_labels = []
    all_val_preds = []

    with torch.no_grad():
        for batch in tqdm(val_loader, desc=f"Epoch {epoch + 1}/{num_epochs} - Validation"):
            input_ids, attention_mask, labels = batch
            input_ids = input_ids.to(device)
            attention_mask = attention_mask.to(device)
            labels = labels.to(device)

            outputs = model(
                input_ids=input_ids,
                attention_mask=attention_mask,
                labels=labels
            )
            
            loss = outputs.loss
            val_loss += loss.item()

            logits = outputs.logits
            preds = torch.argmax(logits, dim=1)
            correct_preds += torch.sum(preds == labels).item()

            all_val_labels.extend(labels.cpu().numpy())
            all_val_preds.extend(preds.cpu().numpy())

    # Calculate validation metrics
    avg_val_loss = val_loss / len(val_loader)
    val_losses.append(avg_val_loss)
    val_accuracy = correct_preds / len(val_d)
    val_accuracies.append(val_accuracy)

    val_precision = precision_score(all_val_labels, all_val_preds, average='weighted')
    val_recall = recall_score(all_val_labels, all_val_preds, average='weighted')
    val_f1 = f1_score(all_val_labels, all_val_preds, average='weighted')

    # Print metrics
    print(f"\nEpoch {epoch + 1}/{num_epochs}")
    print(f"Training Loss: {avg_train_loss:.4f}, Training Accuracy: {train_accuracy:.4f}")
    print(f"Training Precision: {train_precision:.4f}, Training Recall: {train_recall:.4f}, Training F1: {train_f1:.4f}")
    print(f"Validation Loss: {avg_val_loss:.4f}, Validation Accuracy: {val_accuracy:.4f}")
    print(f"Validation Precision: {val_precision:.4f}, Validation Recall: {val_recall:.4f}, Validation F1: {val_f1:.4f}")

# Save the model and tokenizer
model.save_pretrained("./llama-sentiment-model")
tokenizer.save_pretrained("./llama-sentiment-model")

  with torch.cuda.amp.autocast():
