<a href="https://colab.research.google.com/github/nathangtg/machine-learning/blob/main/AirlineReview.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Airline Review prediction Model**
Using various Python libraries and Machine Learning capabilities

In [3]:
import pandas as pd

# Getting the data
airline_reviews = pd.read_csv("/content/airlines_reviews.csv")
display(airline_reviews)

Unnamed: 0,Title,Name,Review Date,Airline,Verified,Reviews,Type of Traveller,Month Flown,Route,Class,Seat Comfort,Staff Service,Food & Beverages,Inflight Entertainment,Value For Money,Overall Rating,Recommended
0,Flight was amazing,Alison Soetantyo,2024-03-01,Singapore Airlines,True,Flight was amazing. The crew onboard this fl...,Solo Leisure,December 2023,Jakarta to Singapore,Business Class,4,4,4,4,4,9,yes
1,seats on this aircraft are dreadful,Robert Watson,2024-02-21,Singapore Airlines,True,Booking an emergency exit seat still meant h...,Solo Leisure,February 2024,Phuket to Singapore,Economy Class,5,3,4,4,1,3,no
2,Food was plentiful and tasty,S Han,2024-02-20,Singapore Airlines,True,Excellent performance on all fronts. I would...,Family Leisure,February 2024,Siem Reap to Singapore,Economy Class,1,5,2,1,5,10,yes
3,“how much food was available,D Laynes,2024-02-19,Singapore Airlines,True,Pretty comfortable flight considering I was f...,Solo Leisure,February 2024,Singapore to London Heathrow,Economy Class,5,5,5,5,5,10,yes
4,“service was consistently good”,A Othman,2024-02-19,Singapore Airlines,True,The service was consistently good from start ...,Family Leisure,February 2024,Singapore to Phnom Penh,Economy Class,5,5,5,5,5,10,yes
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8095,an uneventful flight,N Vickers,2016-06-20,Korean Air,True,"KE124, Brisbane to Incheon (A330) and KE867,...",Business,June 2016,BNE to ULN via ICN,Economy Class,5,4,5,3,4,7,yes
8096,Korean Air always impresses,Kim Holloway,2016-06-12,Korean Air,False,Our recent flight was our fourth trip to the...,Couple Leisure,June 2016,SYD to LHR via ICN,Economy Class,3,5,5,4,5,10,yes
8097,didn’t offer anything,C Clark,2016-06-06,Korean Air,True,I flew Korean Air from Bali to Seoul in Pres...,Business,April 2016,DPS to ICN,Business Class,4,5,5,5,1,2,no
8098,appreciated the service onboard,E Petan,2016-04-21,Korean Air,False,Seoul to Paris with Korean Air. I am traveli...,Business,April 2016,ICN to CDG,Business Class,5,1,3,4,5,10,yes


Set the variables of comparison

In [4]:
import pandas as pd

# Assuming airline_reviews is your DataFrame containing the airline reviews data
# If not, make sure to load your data into the airline_reviews DataFrame

# Define your class mapping
class_mapping = {'Economy Class': 1, 'Premium Economy': 2, 'Business Class': 3, 'First Class': 4}
travellers_mapping = {'Solo Leisure': 1, 'Family Leisure': 2, 'Business': 3, 'Couple Leisure': 4}

# Define your airline features
airline_features = [
    'Type of Traveller',
    'Class',
    'Seat Comfort',
    'Staff Service',
    'Food & Beverages',
    'Inflight Entertainment',
    'Value For Money'
]

# Assuming airline_reviews is your DataFrame
# Map the 'Class' column using class_mapping
airline_reviews['Class'] = airline_reviews['Class'].map(class_mapping)
airline_reviews['Type of Traveller'] = airline_reviews['Type of Traveller'].map(travellers_mapping)

# Define your features (X) and target variable (y)
X = airline_reviews[airline_features]
y = airline_reviews['Overall Rating']

# Check if everything looks fine
print(X.head())  # Print the first few rows of your features
print(y.head())  # Print the first few rows of your target variable


   Type of Traveller  Class  Seat Comfort  Staff Service  Food & Beverages  \
0                  1      3             4              4                 4   
1                  1      1             5              3                 4   
2                  2      1             1              5                 2   
3                  1      1             5              5                 5   
4                  2      1             5              5                 5   

   Inflight Entertainment  Value For Money  
0                       4                4  
1                       4                1  
2                       1                5  
3                       5                5  
4                       5                5  
0     9
1     3
2    10
3    10
4    10
Name: Overall Rating, dtype: int64


In this prediction model I have decided to use the Random Forest model Despite its complex visualisation, this approach can help create a more accurate prediction

In [5]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split, RandomizedSearchCV
import pandas as pd
import numpy as np

# Splitting the training data and values
train_X, val_X, train_y, val_y = train_test_split(X, y, train_size=0.8, random_state=62)

# Define the parameter distributions to search
param_dist = {
    'n_estimators': [100, 200, 300],
    'max_depth': [None] + list(np.arange(10, 31, 5)),
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}

# Creating the model
review_model = RandomForestRegressor(random_state=62)

# Randomized search
random_search = RandomizedSearchCV(estimator=review_model, param_distributions=param_dist, n_iter=50,
                                   cv=5, scoring='neg_mean_squared_error', n_jobs=-1, random_state=42)
random_search.fit(train_X, train_y)

# Get the best parameters
best_params = random_search.best_params_
print("Best parameters:", best_params)

# Creating the model with best parameters
review_model = RandomForestRegressor(**best_params, random_state=62)

# Fitting the model with training data
review_model.fit(train_X, train_y)

# Predicting on the validation set
y_pred = review_model.predict(val_X)

# Create a DataFrame to hold actual and predicted values
results_df = pd.DataFrame({'Actual': val_y, 'Predicted': y_pred})

# Print the DataFrame
print(results_df)

# Check the shapes of val_y and y_pred
print("Shape of val_y:", val_y.shape)
print("Shape of y_pred:", y_pred.shape)


Best parameters: {'n_estimators': 100, 'min_samples_split': 10, 'min_samples_leaf': 4, 'max_depth': 10}
      Actual  Predicted
8032       8   8.931991
4108      10   9.200882
5309       1   1.548616
4894       1   1.346960
5589       3   3.303084
...      ...        ...
5380       1   2.383094
3590       9   9.300519
4569      10   9.043071
4210       1   1.284449
819        5   1.563579

[1620 rows x 2 columns]
Shape of val_y: (1620,)
Shape of y_pred: (1620,)


In [10]:
# Count the errors
from sklearn.metrics import mean_absolute_error

mae = mean_absolute_error(val_y, y_pred)

print('The accuracy of the Data is: ', mae)

The accuracy of the Data is:  1.1675399747574329
