<a href="https://colab.research.google.com/github/enockmwizerwa123/My-task/blob/main/Exercise_on_python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h3> <center> Air Fare Prediction

# Introduction
Welcome to this data analysis notebook on flight fare prediction! In this notebook, we will explore and analyze the "Flight Fare Prediction" dataset, which contains valuable information about airline ticket prices. Whether you are a traveler looking to plan your next trip or a data enthusiast interested in predicting flight fares, this dataset offers a fascinating opportunity to gain insights and build predictive models.

#Dataset Overview:
The "Flight Fare Prediction" dataset is a comprehensive collection of flight-related data compiled from various sources. It encompasses a wide range of attributes that influence flight fares, including departure and arrival locations, travel durations, airline carriers, and more. By leveraging this dataset, we can uncover patterns, correlations, and factors affecting flight fares, ultimately enabling us to create a predictive model for estimating ticket prices.

#Objective:
The primary objective of this analysis is to understand the underlying factors contributing to flight fares and develop a reliable model for predicting fare prices accurately. By examining the dataset, we aim to extract meaningful insights and discover which variables play a significant role in determining the cost of airline tickets.

- Notebook Structure: To accomplish our objective, we will follow a systematic
  approach, breaking down the analysis into the following key steps:

- Data Understanding and Exploration: We will begin by gaining an in-depth
  understanding of the dataset's structure, the meaning behind each attribute, and the overall distribution of the data. Exploratory data analysis techniques will be employed to uncover initial insights and detect any notable trends or outliers.

# Feature Engineering:
 Next, we will preprocess and transform the dataset as necessary. This step may involve handling missing values, encoding categorical variables, normalizing data, and creating additional features to enhance the predictive power of our model.

# Data Visualization:
 Visualizations such as charts, graphs, and maps will be utilized to present the data in an intuitive and easily digestible manner. These visual representations will help us grasp the relationships between different variables and gain a holistic view of the dataset.

# Model Development:
 We will train and evaluate several machine learning models to predict flight fares accurately. This step will involve splitting the dataset into training and testing sets, selecting appropriate algorithms, tuning model parameters, and assessing the performance of each model.

#Conclusion and Future Work:
 Finally, we will summarize our findings, discuss the limitations of our analysis, and outline potential avenues for further exploration and improvement.


In [4]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")
import plotly.express as px
%matplotlib inline
import matplotlib
pd.set_option('display.max_columns', None) #Enable to show max columns in code cells
sns.set_style('darkgrid') #set sns plot background
matplotlib.rcParams['font.size'] = 14 #set the deafult plot font size for this notebook
matplotlib.rcParams['figure.figsize'] = (10, 6) #set the deafult plot size for this notebook
matplotlib.rcParams['figure.facecolor'] = '#00000000' #set matplotlib plot background
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True);
# pd.set_option('display.float_format', lambda x: '%.3f' % x)

# <h2> <center> Import the datasets

In [5]:
data_train = pd.read_excel("/content/Data_Train.xlsx")
data_test = pd.read_excel("/content/Test_set.xlsx")
data_sample = pd.read_excel("/content/Sample_submission.xlsx")

#<center> The data Overview

In [6]:
data_train.info(), data_test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10683 entries, 0 to 10682
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Airline          10683 non-null  object
 1   Date_of_Journey  10683 non-null  object
 2   Source           10683 non-null  object
 3   Destination      10683 non-null  object
 4   Route            10682 non-null  object
 5   Dep_Time         10683 non-null  object
 6   Arrival_Time     10683 non-null  object
 7   Duration         10683 non-null  object
 8   Total_Stops      10682 non-null  object
 9   Additional_Info  10683 non-null  object
 10  Price            10683 non-null  int64 
dtypes: int64(1), object(10)
memory usage: 918.2+ KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2671 entries, 0 to 2670
Data columns (total 10 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Airline          2671 non-null   object
 1   Date_o

(None, None)

In [7]:
data_train.isna().sum(),data_test.isna().sum()

(Airline            0
 Date_of_Journey    0
 Source             0
 Destination        0
 Route              1
 Dep_Time           0
 Arrival_Time       0
 Duration           0
 Total_Stops        1
 Additional_Info    0
 Price              0
 dtype: int64,
 Airline            0
 Date_of_Journey    0
 Source             0
 Destination        0
 Route              0
 Dep_Time           0
 Arrival_Time       0
 Duration           0
 Total_Stops        0
 Additional_Info    0
 dtype: int64)

In [8]:
data_train.describe(), data_test.describe()

(              Price
 count  10683.000000
 mean    9087.064121
 std     4611.359167
 min     1759.000000
 25%     5277.000000
 50%     8372.000000
 75%    12373.000000
 max    79512.000000,
             Airline Date_of_Journey Source Destination            Route  \
 count          2671            2671   2671        2671             2671   
 unique           11              44      5           6              100   
 top     Jet Airways       9/05/2019  Delhi      Cochin  DEL → BOM → COK   
 freq            897             144   1145        1145              624   
 
        Dep_Time Arrival_Time Duration Total_Stops Additional_Info  
 count      2671         2671     2671        2671            2671  
 unique      199          704      320           5               6  
 top       10:00        19:00   2h 50m      1 stop         No info  
 freq         62          113      122        1431            2148  )

<h1> Data cleaning and Feature Engineering

In [9]:
data_train.Total_Stops.value_counts(),data_test.Total_Stops.value_counts()

(1 stop      5625
 non-stop    3491
 2 stops     1520
 3 stops       45
 4 stops        1
 Name: Total_Stops, dtype: int64,
 1 stop      1431
 non-stop     849
 2 stops      379
 3 stops       11
 4 stops        1
 Name: Total_Stops, dtype: int64)

In [10]:
data_train['Total_Stops']=data_train['Total_Stops'].replace(['4 stops'], '3 stops')
data_test['Total_Stops']=data_test['Total_Stops'].replace(['4 stops'], '3 stops')

In [11]:
# data_train.groupby(by='Additional_Info')['Price'].describe().sort_values(by='mean',ascending=False)
data_train['Additional_Info'].value_counts(),data_test['Additional_Info'].value_counts()

(No info                         8345
 In-flight meal not included     1982
 No check-in baggage included     320
 1 Long layover                    19
 Change airports                    7
 Business class                     4
 No Info                            3
 1 Short layover                    1
 Red-eye flight                     1
 2 Long layover                     1
 Name: Additional_Info, dtype: int64,
 No info                         2148
 In-flight meal not included      444
 No check-in baggage included      76
 1 Long layover                     1
 Business class                     1
 Change airports                    1
 Name: Additional_Info, dtype: int64)

In [12]:
#Replacing Repeated no info in the dataset
data_train['Additional_Info']=data_train['Additional_Info'].replace(['No Info'], 'No info')

# Extractting date parts as a part of feature engineering

In [13]:
def split_date(df): #A helper function that takes a df and finds the Date column and extract the necessary info.
    df['Date_of_Journey'] = pd.to_datetime(df['Date_of_Journey'],format="%d/%m/%Y")
    df['Month'] = df.Date_of_Journey.dt.month
    df['Day'] = df.Date_of_Journey.dt.day
    df['WeekOfYear'] = df.Date_of_Journey.dt.isocalendar().week
    df['WeekOfYear'] = df['WeekOfYear'].astype(int)  #since we have day of the week already let's extrac the rest

In [14]:
split_date(data_train)
split_date(data_test)

In [15]:
data_train['Arrival_Time']=data_train['Arrival_Time'].replace(['01:10 22 Mar'], '01:10 25 Mar')

In [16]:
def next_day(df):
    for i in df.index:
        if len(df.loc[i, 'Arrival_Time']) <= 5:
            df.loc[i, 'Next_Day'] = 0
        else:
            df.loc[i, 'Next_Day'] = 1

In [17]:
next_day(data_train)
next_day(data_test)

In [18]:
def convert_time_columns(df):
    # Convert 'Dep_Time' column to time format
    df['Dep_Time'] = pd.to_datetime(df['Dep_Time'], format='%H:%M').dt.hour
    df['Dep_Time'] = df['Dep_Time'].astype(int)

    # Extract time from 'Arrival_Time' column and convert to hours and minutes
    for i in df.index:
        if len(df.loc[i, 'Arrival_Time']) <= 5:
            df.loc[i, 'Arrival_Time'] = pd.to_datetime(df.loc[i, 'Arrival_Time'], format='%H:%M').time().hour
        else:
            df.loc[i, 'Arrival_Time'] = df.loc[i, 'Arrival_Time'][:5]
            df.loc[i, 'Arrival_Time'] = pd.to_datetime(df.loc[i, 'Arrival_Time'], format='%H:%M').time().hour
    df['Arrival_Time'] = df['Arrival_Time'].astype(int)

In [19]:
convert_time_columns(data_train)
convert_time_columns(data_test)

#Calculating flight time in minutes

In [20]:
def calculate_total_minutes(df):
    for i in df.index:
        duration = df.loc[i, 'Duration']
        hours, minutes = 0, 0

        if 'h' in duration:
            hours = int(duration.split('h')[0])

        if 'm' in duration:
            minutes = int(duration.split('m')[0].split()[-1])

        total_minutes = hours * 60 + minutes
        df.loc[i, 'Duration'] = total_minutes

    df['Duration'] = df['Duration'].astype(int)

In [21]:
calculate_total_minutes(data_train)
calculate_total_minutes(data_test)

In [22]:
def red_eye_flight(df):
    mask = ((df['Arrival_Time'].between(0, 7) | df['Dep_Time'].between(22, 23)) & (df['Duration'] <= 600))
    df.loc[mask, 'Additional_Info'] = df.loc[mask, 'Additional_Info'].replace('No info', 'Red-eye flight')

In [23]:
red_eye_flight(data_train)
red_eye_flight(data_test)

#Grouping unpopular categories into 'Other'

In [24]:
def group_additional_info(df):
    specific_categories = [
        'In-flight meal not included',
        'Red-eye flight',
        'Business class',
        'No check-in baggage included',
        'No info'
    ]

    df['Additional_Info'] = df['Additional_Info'].apply(lambda x: x if x in specific_categories else 'Other')

In [25]:
group_additional_info(data_train)
group_additional_info(data_test)

In [26]:
data_train['Additional_Info'].value_counts()

No info                         7257
In-flight meal not included     1982
Red-eye flight                  1092
No check-in baggage included     320
Other                             28
Business class                     4
Name: Additional_Info, dtype: int64

In [27]:
data_train.groupby(by='Additional_Info')['Total_Stops'].value_counts()

Additional_Info               Total_Stops
Business class                1 stop            4
In-flight meal not included   1 stop         1432
                              non-stop        301
                              2 stops         249
No check-in baggage included  non-stop        304
                              1 stop           16
No info                       1 stop         3734
                              non-stop       2249
                              2 stops        1228
                              3 stops          45
Other                         1 stop           21
                              2 stops           6
                              3 stops           1
Red-eye flight                non-stop        637
                              1 stop          418
                              2 stops          37
Name: Total_Stops, dtype: int64

#Fixing Airline issues

In [28]:
def replace_airline_values(df):
    df['Airline'] = df['Airline'].replace(['Vistara Premium economy'], 'Vistara')
    df['Airline'] = df['Airline'].replace(['Multiple carriers Premium economy'], 'Multiple carriers')
    df['Airline'] = df['Airline'].replace(['Trujet'], 'Multiple carriers')

In [29]:
replace_airline_values(data_train)
replace_airline_values(data_test)

#Filling NA Values

In [30]:
data_train['Route']=data_train['Route'].fillna('DEL → MAA → COK')
data_train['Total_Stops']=data_train['Total_Stops'].fillna('1 stop')

In [31]:
def group_routes(df):
    specific_categories = [
        'DEL → BOM → COK',
        'BLR → DEL',
        'CCU → BOM → BLR',
        'CCU → BLR',
        'BOM → HYD',
        'CCU → DEL → BLR',
        'BLR → BOM → DEL',
        'MAA → CCU',
        'DEL → HYD → COK',
        'DEL → JAI → BOM → COK',
        'DEL → BLR → COK',
        'DEL → COK',
        'DEL → AMD → BOM → COK',
        'DEL → MAA → COK',
        'DEL → IDR → BOM → COK',
        'DEL → HYD → MAA → COK'
    ]

    df['Route'] = df['Route'].apply(lambda x: x if x in specific_categories else 'Others')

In [32]:
group_routes(data_train)
group_routes(data_test)

#Some best practises

If you have a categorical column with categories that have only a few instances, encoding those categories as-is might lead to a lower accuracy because the model may struggle to generalize well with limited data. In such cases, you can consider a few alternative approaches to handle the encoding of this column:

1. Group infrequent categories: You can group the infrequent categories into a
single category to increase their representation. For example, you can replace all categories with fewer than a certain threshold of instances with a new category called "Other." This way, you preserve some information about these infrequent categories while reducing the number of unique categories.

2. Frequency encoding: Instead of using one-hot encoding or label encoding, you can encode the categories based on their frequency in the dataset. Replace each category with the percentage of occurrences in the dataset. This way, the encoded value captures the relative representation of each category.

3. Target encoding: Target encoding, also known as mean encoding, replaces each category with the mean (or other aggregation) of the target variable for that category. This method leverages the target variable to encode the categories and can provide useful information if there is a relationship between the category and the target. However, be cautious to avoid overfitting or data leakage when using target encoding.

4. Combine categories based on domain knowledge: If you have domain knowledge or insights about the categories, you can merge similar or related categories into broader groups. This can help consolidate the information and improve the representation of infrequent categories.

It is essential to evaluate the impact of different encoding strategies on your specific dataset and model performance. Consider using cross-validation or other evaluation techniques to assess the effectiveness of each approach. Additionally, the choice of encoding method may depend on the nature of the data, the specific machine learning algorithm you are using, and the overall context of your problem.

#Pre-Processing
1. Frequency Encoding due to large categorical columns

In [33]:
def frequency_encode(df, column):
    frequency_map = df[column].value_counts(normalize=True).to_dict()
    df[column + '_freq_encoded'] = df[column].map(frequency_map)

In [34]:
frequency_encode(data_train,'Route')
#frequency_encode(train,'Source')
#frequency_encode(train,'Destination')

frequency_encode(data_test,'Route')
#frequency_encode(test,'Source')
#frequency_encode(test,'Destination')

The frequency_encode function takes a DataFrame (df) and a column name (column) as inputs and performs the following steps:

1. It calculates the frequency of each unique value in the specified column using the value_counts(normalize=True) method. Setting normalize=True ensures that the frequencies are relative proportions (between 0 and 1) rather than absolute counts.

2. The resulting frequency distribution is converted to a dictionary using the to_dict() method.

3. Using the map function, the original column values are replaced with their corresponding frequency values based on the frequency map.

4. A new column is added to the DataFrame, suffixed with '_freq_encoded', to store the frequency-encoded values.

By applying this function to a specific column in the DataFrame, you can create a new column containing the frequency-encoded values. Frequency encoding is useful when dealing with large categorical columns, as it captures the relative importance or prevalence of each category based on their frequencies.

Note: Make sure to apply this function separately for each categorical column you want to frequency encode.

# Selectinng inputs columns for modelling

In [35]:
data_train.columns

Index(['Airline', 'Date_of_Journey', 'Source', 'Destination', 'Route',
       'Dep_Time', 'Arrival_Time', 'Duration', 'Total_Stops',
       'Additional_Info', 'Price', 'Month', 'Day', 'WeekOfYear', 'Next_Day',
       'Route_freq_encoded'],
      dtype='object')

In [36]:
"""input_cols=['Airline',
       'Dep_Time', 'Arrival_Time', 'Duration', 'Total_Stops',
       'Additional_Info', 'Month', 'Day', 'WeekOfYear', 'Next_Day',
       'Route_freq_encoded', 'Source_freq_encoded',
       'Destination_freq_encoded']"""

input_cols=['Airline', 'Source', 'Destination', 'Route',
       'Dep_Time', 'Arrival_Time', 'Duration', 'Total_Stops',
       'Additional_Info', 'Month', 'Day', 'WeekOfYear', 'Next_Day',
       'Route_freq_encoded']

target_cols='Price'

In [38]:
# creating new dataframes as inputs and targets.
inputs=data_train[input_cols]
target=data_train[target_cols]
test_inputs=data_test[input_cols]
# test_target=submission[target_cols]

Selecting Numeric and Categorical Columns

In [39]:
"""numeric_cols=[
       'Dep_Time', 'Arrival_Time', 'Duration', 'Month', 'Day', 'WeekOfYear', 'Next_Day',
       'Route_freq_encoded', 'Source_freq_encoded',
       'Destination_freq_encoded']"""

numeric_cols=[
       'Dep_Time', 'Arrival_Time', 'Duration', 'Month', 'Day', 'WeekOfYear', 'Next_Day',
       'Route_freq_encoded']

categorical_cols=['Airline','Total_Stops','Additional_Info','Source','Destination']

Encoding categorical columns

In [40]:
from sklearn.preprocessing import OneHotEncoder

encoder = OneHotEncoder(sparse_output=False, handle_unknown='ignore').fit(inputs[categorical_cols]) #This will handle unknown categories as separate one while Encoding values

In [41]:
encoded_cols = list(encoder.get_feature_names_out(categorical_cols))

inputs[encoded_cols] = encoder.transform(inputs[categorical_cols])
test_inputs[encoded_cols] = encoder.transform(test_inputs[categorical_cols])

In [42]:
inputs

Unnamed: 0,Airline,Source,Destination,Route,Dep_Time,Arrival_Time,Duration,Total_Stops,Additional_Info,Month,Day,WeekOfYear,Next_Day,Route_freq_encoded,Airline_Air Asia,Airline_Air India,Airline_GoAir,Airline_IndiGo,Airline_Jet Airways,Airline_Jet Airways Business,Airline_Multiple carriers,Airline_SpiceJet,Airline_Vistara,Total_Stops_1 stop,Total_Stops_2 stops,Total_Stops_3 stops,Total_Stops_non-stop,Additional_Info_Business class,Additional_Info_In-flight meal not included,Additional_Info_No check-in baggage included,Additional_Info_No info,Additional_Info_Other,Additional_Info_Red-eye flight,Source_Banglore,Source_Chennai,Source_Delhi,Source_Kolkata,Source_Mumbai,Destination_Banglore,Destination_Cochin,Destination_Delhi,Destination_Hyderabad,Destination_Kolkata,Destination_New Delhi
0,IndiGo,Banglore,New Delhi,BLR → DEL,22,1,170,non-stop,Red-eye flight,3,24,12,1.0,0.145278,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
1,Air India,Kolkata,Banglore,Others,5,13,445,2 stops,No info,5,1,18,0.0,0.143967,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
2,Jet Airways,Delhi,Cochin,Others,9,4,1140,2 stops,No info,6,9,23,1.0,0.143967,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
3,IndiGo,Kolkata,Banglore,Others,18,23,325,1 stop,No info,5,12,19,0.0,0.143967,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
4,IndiGo,Banglore,New Delhi,Others,16,21,285,1 stop,No info,3,1,9,0.0,0.143967,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10678,Air Asia,Kolkata,Banglore,CCU → BLR,19,22,150,non-stop,No info,4,9,15,0.0,0.067771,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
10679,Air India,Kolkata,Banglore,CCU → BLR,20,23,155,non-stop,No info,4,27,17,0.0,0.067771,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
10680,Jet Airways,Banglore,Delhi,BLR → DEL,8,11,180,non-stop,No info,4,27,17,0.0,0.145278,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
10681,Vistara,Banglore,New Delhi,BLR → DEL,11,14,160,non-stop,No info,3,1,9,0.0,0.145278,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0


In [43]:
from sklearn.preprocessing import QuantileTransformer

scaler = QuantileTransformer()

In [44]:
scaler.fit(inputs[numeric_cols])
inputs[numeric_cols] = scaler.transform(inputs[numeric_cols])
scaler.fit(test_inputs[numeric_cols])
test_inputs[numeric_cols] = scaler.transform(test_inputs[numeric_cols])

#Final Dataframe for Model

In [45]:
X = inputs[numeric_cols + encoded_cols]
X_test = test_inputs[numeric_cols + encoded_cols]

#Splitting training and test set

In [46]:
from sklearn.model_selection import train_test_split
X_train, X_val, y_train, y_val = train_test_split(X,target,test_size=0.2, random_state=42)

# Baseline Model
Creating a basic Linear Regression model to act as a baseline checkpoint from which we can improve upon.

In [47]:
# A helper function that trains,predict and calculates the RMSE
def rmse(model):
    model.fit(X_train, y_train)
    train_preds=model.predict(X_train)
    val_preds=model.predict(X_val)
    train_rmse = mean_squared_error(y_train, train_preds, squared=False)
    val_rmse = mean_squared_error(y_val, val_preds, squared=False)
    return train_rmse,val_rmse

In [48]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

linear_regression = LinearRegression()

rmse(linear_regression)


(2513.207965418992, 2522.32877952565)

# XGBoost Regressor

In [49]:
from xgboost import XGBRegressor
gradient_boosting = XGBRegressor(random_state=42, n_jobs=-1, objective='reg:squarederror')

rmse(gradient_boosting)

(750.4638498938829, 1391.9423963511313)

Things that can be changed around:
Include Date_of_Journey in the model
Apply  
F
r
e
q
u
e
n
c
y
E
n
c
o
d
n
g
  to the 'Source' and 'Destination' column.
Differently group the Route column.
Hyper parameter Tuning. (I actually tried tuning some parameters but the score was worse than the stadard model. Hence using the Standard model here.)
Try different Scaling techniques.
Try Different Encoding Strategies