# Enhancing Singapore Airlines' Service Through Automated Sentiment Analysis of Customer Reviews



## Singapore Airlines Customer Reviews Dataset Information

The [Singapore Airlines Customer Reviews Dataset](https://www.kaggle.com/datasets/kanchana1990/singapore-airlines-reviews) aggregates 10,000 anonymized customer reviews, providing a broad perspective on the passenger experience with Singapore Airlines. 

The dimensions are shown below:
- **`published_date`**: Date and time of review publication.
- **`published_platform`**: Platform where the review was posted.
- **`rating`**: Customer satisfaction rating, from 1 (lowest) to 5 (highest).
- **`type`**: Specifies the content as a review.
- **`text`**: Detailed customer feedback.
- **`title`**: Summary of the review.
- **`helpful_votes`**: Number of users finding the review helpful.

## Importing Libraries

Please uncomment the code box below to pip install relevant dependencies for this notebook.

In [None]:
# !pip3 install -r ../requirements.txt

In [1]:
# Import necessary libraries

# Data manipulation
import pandas as pd
import numpy as np

# Statistical functions
from scipy.stats import zscore

# For concurrency (running functions in parallel)
from concurrent.futures import ThreadPoolExecutor

# For caching (to speed up repeated function calls)
from functools import lru_cache

# For progress tracking
from tqdm import tqdm

# Plotting and Visualisation
import matplotlib.pyplot as plt
import seaborn as sns

# Language Detection packages
# `langdetect` for detecting language
from langdetect import detect as langdetect_detect, DetectorFactory
from langdetect.lang_detect_exception import LangDetectException
# `langid` for an alternative language detection method
from langid import classify as langid_classify

# Text Preprocessing and NLP
# Stopwords (common words to ignore) from NLTK
from nltk.corpus import stopwords

# Tokenizing sentences/words
from nltk.corpus import wordnet

# Tokenizing sentences/words
from nltk.tokenize import word_tokenize
# Lemmatization (converting words to their base form)
from nltk.stem import WordNetLemmatizer
import nltk
# Regular expressions for text pattern matching
import re

# Word Cloud generation
from wordcloud import WordCloud

# For generating n-grams
from nltk.util import ngrams
from collections import Counter

# Libraries for Word2Vec and Logistic Regression
from gensim.models import Word2Vec
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split, KFold, cross_val_score
from sklearn.metrics import accuracy_score
from sklearn.metrics import f1_score, make_scorer

In [None]:
data = pd.read_csv("../final_df.csv")

# Word2Vec + ComplementNB

You can see that Word2Vec doesn't work with Complement NB because NB cannot handle negative values in the input data. Word2Vec embeddings produces negative values.

In [6]:
from gensim.models import Word2Vec
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import ComplementNB
from sklearn.metrics import accuracy_score, classification_report
import numpy as np

# Tokenize the processed reviews for Word2Vec training
tokenized_reviews = [review.split() for review in data['processed_full_review']]

# Train the Word2Vec model
w2v_model = Word2Vec(sentences=tokenized_reviews, vector_size=100, window=5, min_count=1, sg=1, workers=4, seed=42)

# Function to compute the average word vectors for each review
def get_average_word2vec(review, model, vector_size):
    words = review.split()
    word_vecs = [model.wv[word] for word in words if word in model.wv]
    if word_vecs:
        return np.mean(word_vecs, axis=0)
    else:
        return np.zeros(vector_size)

# Create the feature matrix by averaging word vectors for each review
vector_size = w2v_model.vector_size
X = np.array([get_average_word2vec(review, w2v_model, vector_size) for review in data['processed_full_review']])

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, data['sentiment'], test_size=0.2, random_state=42)

# Initialize and train the Complement Naive Bayes model
nb_model = ComplementNB(alpha=5.0)
nb_model.fit(X_train, y_train)

# Make predictions
nb_predictions = nb_model.predict(X_test)

# Evaluate the model
print("Complement NB Accuracy:", accuracy_score(y_test, nb_predictions))
print("Complement NB Classification Report:\n", classification_report(y_test, nb_predictions, digits=4))

ValueError: Negative values in data passed to ComplementNB (input X)

# Word2Vec + RF

In [None]:
from gensim.models import FastText
from sklearn.model_selection import StratifiedKFold, train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
import numpy as np

# Tokenize the processed reviews for Word2Vec training
tokenized_reviews = [review.split() for review in data['processed_full_review']]

# Train the Word2Vec model
w2v_model = Word2Vec(sentences=tokenized_reviews, vector_size=100, window=5, min_count=1, sg=1, workers=4, seed=42)

# Function to compute the average word vectors for each review
def get_average_word2vec(review, model, vector_size):
    words = review.split()
    word_vecs = [model.wv[word] for word in words if word in model.wv]
    if word_vecs:
        return np.mean(word_vecs, axis=0)
    else:
        return np.zeros(vector_size)

# Create the feature matrix by averaging word vectors for each review
vector_size = w2v_model.vector_size
X = np.array([get_average_word2vec(review, w2v_model, vector_size) for review in data['processed_full_review']])
y = data['sentiment']

# Stratified 5-fold cross-validation
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
accuracy_scores = []
precision_scores = []
recall_scores = []
f1_scores = []

for train_index, test_index in skf.split(X, y):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    
    # Initialize and train the Random Forest model
    rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
    rf_model.fit(X_train, y_train)
    
    # Make predictions
    rf_predictions = rf_model.predict(X_test)
    
    # Evaluate the model
    accuracy = accuracy_score(y_test, rf_predictions)
    accuracy_scores.append(accuracy)
    
    report = classification_report(y_test, rf_predictions, digits=4, output_dict=True)
    precision_scores.append(report["weighted avg"]["precision"])
    recall_scores.append(report["weighted avg"]["recall"])
    f1_scores.append(report["weighted avg"]["f1-score"])

    print(f"Fold Accuracy: {accuracy}")
    print(f"Fold Classification Report:\n", classification_report(y_test, rf_predictions, digits=4))

# Print average scores across all folds
print("\nAverage Accuracy across folds:", np.mean(accuracy_scores))
print("Average Precision across folds:", np.mean(precision_scores))
print("Average Recall across folds:", np.mean(recall_scores))
print("Average F1 Score across folds:", np.mean(f1_scores))

Fold Accuracy: 0.8454861111111112
Fold Classification Report:
               precision    recall  f1-score   support

    Negative     0.7697    0.7807    0.7752       488
     Neutral     0.6333    0.1631    0.2594       233
    Positive     0.8742    0.9659    0.9178      1583

    accuracy                         0.8455      2304
   macro avg     0.7591    0.6366    0.6508      2304
weighted avg     0.8277    0.8455    0.8210      2304

Fold Accuracy: 0.8519965277777778
Fold Classification Report:
               precision    recall  f1-score   support

    Negative     0.7851    0.7787    0.7819       488
     Neutral     0.5591    0.2232    0.3190       233
    Positive     0.8865    0.9672    0.9251      1583

    accuracy                         0.8520      2304
   macro avg     0.7436    0.6563    0.6753      2304
weighted avg     0.8319    0.8520    0.8335      2304

Fold Accuracy: 0.8315972222222222
Fold Classification Report:
               precision    recall  f1-score   sup

# Word2Vec + log regression

In [8]:
from gensim.models import Word2Vec
from sklearn.model_selection import StratifiedKFold
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
import numpy as np

# Tokenize the processed reviews for Word2Vec training
tokenized_reviews = [review.split() for review in data['processed_full_review']]

# Train the Word2Vec model
w2v_model = Word2Vec(sentences=tokenized_reviews, vector_size=100, window=5, min_count=1, sg=1, workers=4, seed=42)

# Function to compute the average word vectors for each review
def get_average_word2vec(review, model, vector_size):
    words = review.split()
    word_vecs = [model.wv[word] for word in words if word in model.wv]
    if word_vecs:
        return np.mean(word_vecs, axis=0)
    else:
        return np.zeros(vector_size)

# Create the feature matrix by averaging word vectors for each review
vector_size = w2v_model.vector_size
X = np.array([get_average_word2vec(review, w2v_model, vector_size) for review in data['processed_full_review']])
y = data['sentiment']

# Stratified 5-fold cross-validation
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
accuracy_scores = []
precision_scores = []
recall_scores = []
f1_scores = []

for train_index, test_index in skf.split(X, y):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    
    # Initialize and train the Logistic Regression model
    clf = LogisticRegression(random_state=42, multi_class='multinomial', solver='lbfgs', max_iter=100)
    clf.fit(X_train, y_train)
    
    # Make predictions
    clf_predictions = clf.predict(X_test)
    
    # Evaluate the model
    accuracy = accuracy_score(y_test, clf_predictions)
    accuracy_scores.append(accuracy)
    
    report = classification_report(y_test, clf_predictions, digits=4, output_dict=True)
    precision_scores.append(report["weighted avg"]["precision"])
    recall_scores.append(report["weighted avg"]["recall"])
    f1_scores.append(report["weighted avg"]["f1-score"])

    print(f"Fold Accuracy: {accuracy}")
    print(f"Fold Classification Report:\n", classification_report(y_test, clf_predictions, digits=4))

# Print average scores across all folds
print("\nAverage Accuracy across folds:", np.mean(accuracy_scores))
print("Average Precision across folds:", np.mean(precision_scores))
print("Average Recall across folds:", np.mean(recall_scores))
print("Average F1 Score across folds:", np.mean(f1_scores))




Fold Accuracy: 0.8537326388888888
Fold Classification Report:
               precision    recall  f1-score   support

    Negative     0.7758    0.8012    0.7883       488
     Neutral     0.5225    0.2489    0.3372       233
    Positive     0.8988    0.9589    0.9279      1583

    accuracy                         0.8537      2304
   macro avg     0.7324    0.6697    0.6845      2304
weighted avg     0.8347    0.8537    0.8386      2304

Fold Accuracy: 0.8563368055555556
Fold Classification Report:
               precision    recall  f1-score   support

    Negative     0.7837    0.8094    0.7964       488
     Neutral     0.5000    0.2232    0.3086       233
    Positive     0.8998    0.9640    0.9308      1583

    accuracy                         0.8563      2304
   macro avg     0.7278    0.6655    0.6786      2304
weighted avg     0.8348    0.8563    0.8394      2304

Fold Accuracy: 0.8459201388888888




Fold Classification Report:
               precision    recall  f1-score   support

    Negative     0.7523    0.8217    0.7855       488
     Neutral     0.5000    0.2189    0.3045       233
    Positive     0.8969    0.9457    0.9207      1583

    accuracy                         0.8459      2304
   macro avg     0.7164    0.6621    0.6702      2304
weighted avg     0.8262    0.8459    0.8297      2304





Fold Accuracy: 0.8471558836300478
Fold Classification Report:
               precision    recall  f1-score   support

    Negative     0.7907    0.7971    0.7939       488
     Neutral     0.5000    0.2017    0.2875       233
    Positive     0.8824    0.9576    0.9185      1582

    accuracy                         0.8472      2303
   macro avg     0.7243    0.6522    0.6666      2303
weighted avg     0.8242    0.8472    0.8282      2303





Fold Accuracy: 0.8588797221016066
Fold Classification Report:
               precision    recall  f1-score   support

    Negative     0.7813    0.8548    0.8164       489
     Neutral     0.5392    0.2371    0.3293       232
    Positive     0.9034    0.9513    0.9267      1582

    accuracy                         0.8589      2303
   macro avg     0.7413    0.6811    0.6908      2303
weighted avg     0.8408    0.8589    0.8431      2303


Average Accuracy across folds: 0.8524050378129975
Average Precision across folds: 0.8321199062382114
Average Recall across folds: 0.8524050378129975
Average F1 Score across folds: 0.8358060729131429
