# Enhancing Singapore Airlines' Service Through Automated Sentiment Analysis of Customer Reviews



**Motivation**



## Singapore Airlines Customer Reviews Dataset Information

The [Singapore Airlines Customer Reviews Dataset](https://www.kaggle.com/datasets/kanchana1990/singapore-airlines-reviews) aggregates 10,000 anonymized customer reviews, providing a broad perspective on the passenger experience with Singapore Airlines. 

The dimensions are shown below:
- **`published_date`**: Date and time of review publication.
- **`published_platform`**: Platform where the review was posted.
- **`rating`**: Customer satisfaction rating, from 1 (lowest) to 5 (highest).
- **`type`**: Specifies the content as a review.
- **`text`**: Detailed customer feedback.
- **`title`**: Summary of the review.
- **`helpful_votes`**: Number of users finding the review helpful.

## Gated Recurrent Unit(GRU)

GRUs are primarily used neural networks in handling sequential data like text. They help the model to learn long-term dependencies in long texts, like remembering the context of earlier words in a sentence. This makes it especially effective in processing and predicting the sentiment of reviews. 

#### Common terms + definition:
Embeddings - vector of numbers that capture meaning and relationships between words. They also reduce high dimensionality of laguage, into something the computer can easily understand.
Embedding layer - Layer in deep learning models that learns these embeddings during training. 


In [1]:
# Imports specific to GRU
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.utils import class_weight
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, GRU, Dense, Dropout
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import EarlyStopping
from scikeras.wrappers import KerasClassifier
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.regularizers import L1L2
from sklearn.metrics import classification_report

In [3]:
data = pd.read_csv("final_df.csv")
data

Unnamed: 0,year,month,sentiment,processed_full_review
0,2024,3,Neutral,ok use airlin go singapor london heathrow issu...
1,2024,3,Negative,don give money book paid receiv email confirm ...
2,2024,3,Positive,best airlin world best airlin world seat food ...
3,2024,3,Negative,premium economi seat singapor airlin not worth...
4,2024,3,Negative,imposs get promis refund book flight full mont...
...,...,...,...,...
11513,2021,11,Negative,websit buggi paid first busi class ticket webs...
11514,2021,10,Negative,reduc level qualiti servic fear futur airlin t...
11515,2021,10,Negative,chang would cost usd book ticket singapor airl...
11516,2021,8,Negative,disappoint flight check secur check frankfurt ...


#### **Preparing data for GRU**

It is important to understand that GRUs are unable to understand words in the traditional sense; instead, the words need to be tokenized into numerical representations that the model can process(Here they are converted into integers). These tokenized words are typically mapped to a vocabulary where each unique word is assigned a specific integer value.

#### Tensorflow's Tokenizer or CountVectorization? :

Unlike CountVectorization, which uses a bag-of-words model(sparse matrix of word counts) where the order of words is ignored and not compatible with embedding layers, Tensorflow's Tokenizer is more suitable as it preserves the word order in sequences and is directly compatible with embedding layers.


In [4]:
# Extract features and labels
X = data['processed_full_review'].values  # The reviews (features)
y = data['sentiment'].values  # The sentiment labels

# Directly apply one-hot encoding
# Convert the categorical labels (e.g., "positive", "neutral", "negative") to one-hot encoded labels
y = pd.get_dummies(y).values  # Automatically converts to one-hot

# Tokenize and pad the text data
tokenizer = Tokenizer(num_words=5000)  # Limit to top 5000 words
tokenizer.fit_on_texts(X)
X_tokenized = tokenizer.texts_to_sequences(X)

# Set a manageable padding length (e.g., 300 words)
max_sequence_len = 300
X_padded = pad_sequences(X_tokenized, maxlen=max_sequence_len)

# Split the data into training and testing sets (70% training, 30% testing)
X_train, X_test, y_train, y_test = train_test_split(X_padded, y, test_size=0.3, random_state=42, stratify=np.argmax(y, axis=1))

# Compute class weights based on the training labels
y_train_labels = np.argmax(y_train, axis=1)  # Convert one-hot to single-label encoding for class weights
class_weights = class_weight.compute_class_weight('balanced', classes=np.unique(y_train_labels), y=y_train_labels)
class_weights_dict = dict(enumerate(class_weights))

print("Class Weights:", class_weights_dict)  # Check the weights calculated

Class Weights: {0: 1.5733801717408276, 1: 3.2973415132924337, 2: 0.4851657940663176}


#### **Building the GRU Model**
These integers are then fed into the GRU model. However, just converting words into integers is not enough to gain an understanding for their meaning. 

To give the model more context about the relationships between words, word embeddings (created by the Embedding layer in Keras) are used, which transform these integer tokens into dense vectors of real numbers that capture semantic meaning such as the similarity between words(Something the CountVectorizer cannot do).

#### Tensorflow's embedding or Word2Vec embedding? :
Word2Vec creates word embeddings __outside__ of the deep learning model (in this case we used Gensim), these embeddings are fixed and used directly in any downstream model. After training with Word2Vec, the embeddings are loaded into the model as __pre-trained embeddings__. This may potentially result in missing task-specific nuances(domain specific language).

The embedding layer in the tensorflow model is responsible for learning the word embeddings __during model training__. Hence the embeddings learned are __optimized for your dataset__ and the specific task at hand (such as sentiment classification). 


In [5]:
# Basic GRU model build
model = Sequential()

# Adding an Embedding layer to turn words into dense vectors
model.add(Embedding(input_dim=5000, output_dim=128))

# Add a GRU layer with 128 units(or neurons)
# The parameter units specifies the number of GRU neurons in this layer
# return_sequences = False tells the GRU to output only the final hidden state, more suitable for sentence classification
model.add(GRU(units=64, return_sequences=False,kernel_regularizer=L1L2(l1=0.001, l2=0.001))) #0.01 = 73.81%, loss: 0.8660

# Add a dropout layer to prevent overfitting
# By dropping neurons randomly while training, it ensures the model is not overly-reliant on a single neuron
model.add(Dropout(0.3))

# Output layer (for 3 classes: positive, neutral, negative)
model.add(Dense(3, activation='softmax'))

# Compile the model
# Categorical cross entropy is useful for multiclass problems (Calculates how far/close the predicted probability distribution is from the actual distribution of the target class)
# 'accuracy' is currently the key performance metric that is being tracked, perhaps can change to recall to make misclassifying negative more costly?

optimizer = Adam(learning_rate=0.001)
model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])


# Fit the model with specified number of epochs
history = model.fit(X_train, y_train, 
                    epochs=10,             # Number of epochs (you can adjust this number)
                    batch_size=32,          # Batch size
                    validation_split=0.2,   # 20% of the training data will be used for validation
                    class_weight=class_weights_dict)  # Optionally include class weights if the dataset is imbalanced


# Evaluate the optimized model
test_loss, test_accuracy = model.evaluate(X_test, y_test)

# Test loss has no inherent meaning unless compared to other models, lower the better
print(f"Test Loss: {test_loss}")
print(f"Test Accuracy: {test_accuracy * 100:.2f}%")

# Make predictions on the test set
y_pred = np.argmax(model.predict(X_test), axis=1)
y_true = np.argmax(y_test, axis=1)

# Print classification report for detailed metrics per class
print("\nClassification Report:")
print(classification_report(y_true, y_pred, target_names=["Negative", "Neutral", "Positive"], digits=4))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test Loss: 0.8983107805252075
Test Accuracy: 81.31%

Classification Report:
              precision    recall  f1-score   support

    Negative     0.7643    0.7299    0.7467       733
     Neutral     0.3388    0.4126    0.3721       349
    Positive     0.9142    0.8976    0.9058      2374

    accuracy                         0.8131      3456
   macro avg     0.6724    0.6800    0.6749      3456
weighted avg     0.8243    0.8131    0.8182      3456



In [6]:
# GRU model with dynamic models
max_sequence_len = max([len(x) for x in X_tokenized])
def create_gru_model(input_dimensions=5000, output_dimensions=128, gru_units=128, dropout_rate=0.2, learning_rate=0.001):
    # Build the GRU model
    model = Sequential()   

    # Adding an Embedding layer to turn words into dense vectors
    # input_dim - Vocabulary size for the Embedding layer
    # output_dim - dimension of dense embedding vectors/number of numbers each word will be represented by 
    model.add(Embedding(input_dim=input_dimensions, output_dim=output_dimensions))

    # Add a GRU layer with 128 units(or neurons)
    # The parameter units specifies the number of GRU neurons in this layer
    # return_sequences = False tells the GRU to output only the final hidden state, more suitable for sentence classification
    model.add(GRU(units=gru_units, return_sequences=False))

    # Add a dropout layer to prevent overfitting
    # By dropping neurons randomly while training, it ensures the model is not overly-reliant on a single neuron
    model.add(Dropout(dropout_rate))

    # Output layer (for 3 classes: positive, neutral, negative)
    model.add(Dense(3, activation='softmax'))

    # Compile the model
    # Categorical cross entropy is useful for multiclass problems(Calculates how far/close the predicted probability distribution is from the actual distribution of the target class)
    # 'accuracy' is currently the key performance metric that is being tracked, perhaps can change to recall to make misclassifying negative more costly?
    optimizer = Adam(learning_rate=learning_rate)
    model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])
 
    return model 


##### **Adam(Adaptive Moment Estimation):**
A popular optimization algorithm that adjusts the learning rate during training for each parameter. 

Key features:

1. Adaptive Learning Rates: Adam adjusts the learning rates for different parameters individually, allowing the model to converge faster and more effectively.

2. Momentum: Adam uses momentum to accelerate gradient descent, especially in the presence of noise or small gradients.

Adam is widely used because it combines the advantages of two other optimizers: AdaGrad(Algorithm that works well with spare gradients) and RMSProp(works well in non-stationary gradients).

#### GRU Cross Validation

In [9]:
# Parameter grid to test for cross validation 
param_grid = {
    'model__input_dimensions': [1000, 5000],
    'model__output_dimensions': [128],
    'model__gru_units': [128, 256],
    'model__dropout_rate': [0.1, 0.3],
    'model__learning_rate': [0.001, 0.01],
}

#### Finding the best parameters for the GRU Model

In [10]:
from sklearn.model_selection import GridSearchCV

# Wrap model function that was built above in a KerasClassifier
model = KerasClassifier(
    model=create_gru_model,
    input_dimensions=1000,  # default value
    output_dimensions=128,  # default value
    gru_units=128,         # default value
    dropout_rate=0.2,      # default value
    learning_rate=0.001,   # default value
    epochs=10,
    batch_size=128,
)

# Perform GridSearchCV using the Parameter Grid defined above 
grid = GridSearchCV(estimator=model, param_grid=param_grid, cv=3)
grid_result = grid.fit(X_train, y_train)

# Print the best results + use those results for training later 
print(f"Best Parameters: {grid_result.best_params_}")


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
E

Building the best model using gteh result of the best Parameters from GridSearchCV


In [None]:
best_params = grid_result.best_params_
# Unpack best parameters into create_gru_model function arguments
best_model = create_gru_model(
    input_dimensions=best_params['model__input_dimensions'],
    output_dimensions=best_params['model__output_dimensions'],
    gru_units=best_params['model__gru_units'],
    dropout_rate=best_params['model__dropout_rate'],
    learning_rate=best_params['model__learning_rate'],
)

optimizer = Adam(learning_rate=best_params['model__learning_rate'])
best_model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])

# Retrain model on the full training set
history = best_model.fit(
    X_train, y_train,
    epochs=10,          # Best epoch count
    batch_size=128,   # Best batch size
    validation_data=(X_test, y_test)
)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


#### Evaluating the model

In [18]:
# Evaluate the optimized model
test_loss, test_accuracy = best_model.evaluate(X_test, y_test)

# Test loss has no inherent meaning unless compared to other models, lower the better
print(f"Test Loss: {test_loss}")
print(f"Test Accuracy: {test_accuracy * 100:.2f}%")

Test Loss: 0.5247097611427307
Test Accuracy: 83.83%


#### Classification Report

In [20]:
from sklearn.metrics import classification_report

class_names = ['negative', 'neutral', 'positive']

# Get the predicted classes for X_test
y_pred = best_model.predict(X_test)
y_pred_classes = np.argmax(y_pred, axis=1)
y_true = np.argmax(y_test, axis=1)

# Print the classification report
print(classification_report(y_true, y_pred_classes, target_names=class_names, digits=4))

              precision    recall  f1-score   support

    negative     0.7539    0.7940    0.7734       733
     neutral     0.3594    0.2894    0.3206       349
    positive     0.9213    0.9326    0.9269      2374

    accuracy                         0.8383      3456
   macro avg     0.6782    0.6720    0.6737      3456
weighted avg     0.8291    0.8383    0.8332      3456

