## FEATURES
The various features of the cleaned dataset are explained below:
1) Airline: The name of the airline company is stored in the airline column. It is a categorical feature having 6 different airlines.
2) Flight: Flight stores information regarding the plane's flight code. It is a categorical feature.
3) Source City: City from which the flight takes off. It is a categorical feature having 6 unique cities.
4) Departure Time: This is a derived categorical feature obtained created by grouping time periods into bins. It stores information about the departure time and have 6 unique time labels.
5) Stops: A categorical feature with 3 distinct values that stores the number of stops between the source and destination cities.
6) Arrival Time: This is a derived categorical feature created by grouping time intervals into bins. It has six distinct time labels and keeps information about the arrival time.
7) Destination City: City where the flight will land. It is a categorical feature having 6 unique cities.
8) Class: A categorical feature that contains information on seat class; it has two distinct values: Business and Economy.
9) Duration: A continuous feature that displays the overall amount of time it takes to travel between cities in hours.
10)Days Left: This is a derived characteristic that is calculated by subtracting the trip date by the booking date.
11) Price: Target variable stores information of the ticket price.

In [799]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [800]:
df = pd.read_excel('../dataset/flight_price.xlsx')
df

Unnamed: 0,Airline,Date_of_Journey,Source,Destination,Route,Dep_Time,Arrival_Time,Duration,Total_Stops,Additional_Info,Price
0,IndiGo,24/03/2019,Banglore,New Delhi,BLR → DEL,22:20,01:10 22 Mar,2h 50m,non-stop,No info,3897
1,Air India,1/05/2019,Kolkata,Banglore,CCU → IXR → BBI → BLR,05:50,13:15,7h 25m,2 stops,No info,7662
2,Jet Airways,9/06/2019,Delhi,Cochin,DEL → LKO → BOM → COK,09:25,04:25 10 Jun,19h,2 stops,No info,13882
3,IndiGo,12/05/2019,Kolkata,Banglore,CCU → NAG → BLR,18:05,23:30,5h 25m,1 stop,No info,6218
4,IndiGo,01/03/2019,Banglore,New Delhi,BLR → NAG → DEL,16:50,21:35,4h 45m,1 stop,No info,13302
...,...,...,...,...,...,...,...,...,...,...,...
10678,Air Asia,9/04/2019,Kolkata,Banglore,CCU → BLR,19:55,22:25,2h 30m,non-stop,No info,4107
10679,Air India,27/04/2019,Kolkata,Banglore,CCU → BLR,20:45,23:20,2h 35m,non-stop,No info,4145
10680,Jet Airways,27/04/2019,Banglore,Delhi,BLR → DEL,08:20,11:20,3h,non-stop,No info,7229
10681,Vistara,01/03/2019,Banglore,New Delhi,BLR → DEL,11:30,14:10,2h 40m,non-stop,No info,12648


In [801]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10683 entries, 0 to 10682
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Airline          10683 non-null  object
 1   Date_of_Journey  10683 non-null  object
 2   Source           10683 non-null  object
 3   Destination      10683 non-null  object
 4   Route            10682 non-null  object
 5   Dep_Time         10683 non-null  object
 6   Arrival_Time     10683 non-null  object
 7   Duration         10683 non-null  object
 8   Total_Stops      10682 non-null  object
 9   Additional_Info  10683 non-null  object
 10  Price            10683 non-null  int64 
dtypes: int64(1), object(10)
memory usage: 918.2+ KB


In [802]:
# check missing value
df.isnull().sum()

Airline            0
Date_of_Journey    0
Source             0
Destination        0
Route              1
Dep_Time           0
Arrival_Time       0
Duration           0
Total_Stops        1
Additional_Info    0
Price              0
dtype: int64

### Feature Engineering

In [803]:
# copy the dataset
df_temp = df.copy()

In [804]:
df_temp.head(3)

Unnamed: 0,Airline,Date_of_Journey,Source,Destination,Route,Dep_Time,Arrival_Time,Duration,Total_Stops,Additional_Info,Price
0,IndiGo,24/03/2019,Banglore,New Delhi,BLR → DEL,22:20,01:10 22 Mar,2h 50m,non-stop,No info,3897
1,Air India,1/05/2019,Kolkata,Banglore,CCU → IXR → BBI → BLR,05:50,13:15,7h 25m,2 stops,No info,7662
2,Jet Airways,9/06/2019,Delhi,Cochin,DEL → LKO → BOM → COK,09:25,04:25 10 Jun,19h,2 stops,No info,13882


#### Let working on date_of_journey 

In [805]:
## first way
df_temp['day'] = df_temp['Date_of_Journey'].str.split('/').str[0]
df_temp['month'] = df_temp['Date_of_Journey'].str.split('/').str[1]
df_temp['year'] = df_temp['Date_of_Journey'].str.split('/').str[2]

In [806]:
df_temp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10683 entries, 0 to 10682
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Airline          10683 non-null  object
 1   Date_of_Journey  10683 non-null  object
 2   Source           10683 non-null  object
 3   Destination      10683 non-null  object
 4   Route            10682 non-null  object
 5   Dep_Time         10683 non-null  object
 6   Arrival_Time     10683 non-null  object
 7   Duration         10683 non-null  object
 8   Total_Stops      10682 non-null  object
 9   Additional_Info  10683 non-null  object
 10  Price            10683 non-null  int64 
 11  day              10683 non-null  object
 12  month            10683 non-null  object
 13  year             10683 non-null  object
dtypes: int64(1), object(13)
memory usage: 1.1+ MB


In [807]:
df_temp['day'] = df_temp['day'].astype(int)
df_temp['month'] = df_temp['month'].astype(int)
df_temp['year'] = df_temp['month'].astype(int)

In [808]:
df_temp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10683 entries, 0 to 10682
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Airline          10683 non-null  object
 1   Date_of_Journey  10683 non-null  object
 2   Source           10683 non-null  object
 3   Destination      10683 non-null  object
 4   Route            10682 non-null  object
 5   Dep_Time         10683 non-null  object
 6   Arrival_Time     10683 non-null  object
 7   Duration         10683 non-null  object
 8   Total_Stops      10682 non-null  object
 9   Additional_Info  10683 non-null  object
 10  Price            10683 non-null  int64 
 11  day              10683 non-null  int32 
 12  month            10683 non-null  int32 
 13  year             10683 non-null  int32 
dtypes: int32(3), int64(1), object(10)
memory usage: 1.0+ MB


In [809]:
# now i don't need to Date_of_Journey column
df_temp.drop('Date_of_Journey', axis=1, inplace=True)

In [810]:
df_temp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10683 entries, 0 to 10682
Data columns (total 13 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Airline          10683 non-null  object
 1   Source           10683 non-null  object
 2   Destination      10683 non-null  object
 3   Route            10682 non-null  object
 4   Dep_Time         10683 non-null  object
 5   Arrival_Time     10683 non-null  object
 6   Duration         10683 non-null  object
 7   Total_Stops      10682 non-null  object
 8   Additional_Info  10683 non-null  object
 9   Price            10683 non-null  int64 
 10  day              10683 non-null  int32 
 11  month            10683 non-null  int32 
 12  year             10683 non-null  int32 
dtypes: int32(3), int64(1), object(9)
memory usage: 959.9+ KB


#### Now let't working on Arrival_Time columns

In [811]:
df_temp.head(5)

Unnamed: 0,Airline,Source,Destination,Route,Dep_Time,Arrival_Time,Duration,Total_Stops,Additional_Info,Price,day,month,year
0,IndiGo,Banglore,New Delhi,BLR → DEL,22:20,01:10 22 Mar,2h 50m,non-stop,No info,3897,24,3,3
1,Air India,Kolkata,Banglore,CCU → IXR → BBI → BLR,05:50,13:15,7h 25m,2 stops,No info,7662,1,5,5
2,Jet Airways,Delhi,Cochin,DEL → LKO → BOM → COK,09:25,04:25 10 Jun,19h,2 stops,No info,13882,9,6,6
3,IndiGo,Kolkata,Banglore,CCU → NAG → BLR,18:05,23:30,5h 25m,1 stop,No info,6218,12,5,5
4,IndiGo,Banglore,New Delhi,BLR → NAG → DEL,16:50,21:35,4h 45m,1 stop,No info,13302,1,3,3


In [812]:
df_temp['Arrival_Time'] = df_temp['Arrival_Time'].apply(lambda x:x.split(' ')[0])

In [813]:
df_temp.head(3)

Unnamed: 0,Airline,Source,Destination,Route,Dep_Time,Arrival_Time,Duration,Total_Stops,Additional_Info,Price,day,month,year
0,IndiGo,Banglore,New Delhi,BLR → DEL,22:20,01:10,2h 50m,non-stop,No info,3897,24,3,3
1,Air India,Kolkata,Banglore,CCU → IXR → BBI → BLR,05:50,13:15,7h 25m,2 stops,No info,7662,1,5,5
2,Jet Airways,Delhi,Cochin,DEL → LKO → BOM → COK,09:25,04:25,19h,2 stops,No info,13882,9,6,6


In [814]:
df_temp['hour'] = df_temp['Arrival_Time'].str.split(':').str[0]
df_temp['minute'] = df_temp['Arrival_Time'].str.split(':').str[1]

In [815]:
df_temp.head(2)

Unnamed: 0,Airline,Source,Destination,Route,Dep_Time,Arrival_Time,Duration,Total_Stops,Additional_Info,Price,day,month,year,hour,minute
0,IndiGo,Banglore,New Delhi,BLR → DEL,22:20,01:10,2h 50m,non-stop,No info,3897,24,3,3,1,10
1,Air India,Kolkata,Banglore,CCU → IXR → BBI → BLR,05:50,13:15,7h 25m,2 stops,No info,7662,1,5,5,13,15


In [816]:
# convert str to int
df_temp['hour'] = df_temp['hour'].astype(int)
df_temp['minute'] = df_temp['minute'].astype(int)

In [817]:
df_temp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10683 entries, 0 to 10682
Data columns (total 15 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Airline          10683 non-null  object
 1   Source           10683 non-null  object
 2   Destination      10683 non-null  object
 3   Route            10682 non-null  object
 4   Dep_Time         10683 non-null  object
 5   Arrival_Time     10683 non-null  object
 6   Duration         10683 non-null  object
 7   Total_Stops      10682 non-null  object
 8   Additional_Info  10683 non-null  object
 9   Price            10683 non-null  int64 
 10  day              10683 non-null  int32 
 11  month            10683 non-null  int32 
 12  year             10683 non-null  int32 
 13  hour             10683 non-null  int32 
 14  minute           10683 non-null  int32 
dtypes: int32(5), int64(1), object(9)
memory usage: 1.0+ MB


In [818]:
# now drop the Arrival_Time column
df_temp.drop('Arrival_Time', axis=1, inplace=True)

In [819]:
df_temp.head(2)

Unnamed: 0,Airline,Source,Destination,Route,Dep_Time,Duration,Total_Stops,Additional_Info,Price,day,month,year,hour,minute
0,IndiGo,Banglore,New Delhi,BLR → DEL,22:20,2h 50m,non-stop,No info,3897,24,3,3,1,10
1,Air India,Kolkata,Banglore,CCU → IXR → BBI → BLR,05:50,7h 25m,2 stops,No info,7662,1,5,5,13,15


#### Let's working on Dep_Time Columns

In [820]:
df_temp['dep_hour'] = df['Dep_Time'].str.split(':').str[0]
df_temp['dep_min'] = df['Dep_Time'].str.split(':').str[1]

In [821]:
df_temp['dep_hour'] = df_temp['dep_hour'].astype(int)
df_temp['dep_min'] = df_temp['dep_min'].astype(int)

In [822]:
df_temp.head(3)

Unnamed: 0,Airline,Source,Destination,Route,Dep_Time,Duration,Total_Stops,Additional_Info,Price,day,month,year,hour,minute,dep_hour,dep_min
0,IndiGo,Banglore,New Delhi,BLR → DEL,22:20,2h 50m,non-stop,No info,3897,24,3,3,1,10,22,20
1,Air India,Kolkata,Banglore,CCU → IXR → BBI → BLR,05:50,7h 25m,2 stops,No info,7662,1,5,5,13,15,5,50
2,Jet Airways,Delhi,Cochin,DEL → LKO → BOM → COK,09:25,19h,2 stops,No info,13882,9,6,6,4,25,9,25


In [823]:
df_temp.drop('Dep_Time', axis=1, inplace=True)

In [824]:
df_temp.head()

Unnamed: 0,Airline,Source,Destination,Route,Duration,Total_Stops,Additional_Info,Price,day,month,year,hour,minute,dep_hour,dep_min
0,IndiGo,Banglore,New Delhi,BLR → DEL,2h 50m,non-stop,No info,3897,24,3,3,1,10,22,20
1,Air India,Kolkata,Banglore,CCU → IXR → BBI → BLR,7h 25m,2 stops,No info,7662,1,5,5,13,15,5,50
2,Jet Airways,Delhi,Cochin,DEL → LKO → BOM → COK,19h,2 stops,No info,13882,9,6,6,4,25,9,25
3,IndiGo,Kolkata,Banglore,CCU → NAG → BLR,5h 25m,1 stop,No info,6218,12,5,5,23,30,18,5
4,IndiGo,Banglore,New Delhi,BLR → NAG → DEL,4h 45m,1 stop,No info,13302,1,3,3,21,35,16,50


#### Let's Working on Total Stops

In [825]:
df_temp['Total_Stops'].unique()

array(['non-stop', '2 stops', '1 stop', '3 stops', nan, '4 stops'],
      dtype=object)

In [826]:
custom_map = {'non-stop': 0, '1 stop':1, '2 stops': 2, '3 stops': 3, '4 stops': 4, np.nan: 1}

df_temp['Total_Stops'] = df_temp['Total_Stops'].map(custom_map)


In [827]:
df_temp['Total_Stops'].unique()

array([0, 2, 1, 3, 4], dtype=int64)

In [828]:
df_temp.Total_Stops.isnull().sum()

0

In [829]:
df_temp.head(3)

Unnamed: 0,Airline,Source,Destination,Route,Duration,Total_Stops,Additional_Info,Price,day,month,year,hour,minute,dep_hour,dep_min
0,IndiGo,Banglore,New Delhi,BLR → DEL,2h 50m,0,No info,3897,24,3,3,1,10,22,20
1,Air India,Kolkata,Banglore,CCU → IXR → BBI → BLR,7h 25m,2,No info,7662,1,5,5,13,15,5,50
2,Jet Airways,Delhi,Cochin,DEL → LKO → BOM → COK,19h,2,No info,13882,9,6,6,4,25,9,25


In [830]:
# dorp route column
df_temp.drop('Route', axis=1, inplace=True)

In [831]:
df_temp.head(3)

Unnamed: 0,Airline,Source,Destination,Duration,Total_Stops,Additional_Info,Price,day,month,year,hour,minute,dep_hour,dep_min
0,IndiGo,Banglore,New Delhi,2h 50m,0,No info,3897,24,3,3,1,10,22,20
1,Air India,Kolkata,Banglore,7h 25m,2,No info,7662,1,5,5,13,15,5,50
2,Jet Airways,Delhi,Cochin,19h,2,No info,13882,9,6,6,4,25,9,25


#### Let's working on Duration

In [832]:
df_temp['duration_hour'] = df_temp['Duration'].str.split(' ').str[0].str.split('h').str[0]
df_temp['duration_min'] = df_temp['Duration'].str.split(' ').str[1].str.split('m').str[0]

In [833]:
df_temp.head(3)

Unnamed: 0,Airline,Source,Destination,Duration,Total_Stops,Additional_Info,Price,day,month,year,hour,minute,dep_hour,dep_min,duration_hour,duration_min
0,IndiGo,Banglore,New Delhi,2h 50m,0,No info,3897,24,3,3,1,10,22,20,2,50.0
1,Air India,Kolkata,Banglore,7h 25m,2,No info,7662,1,5,5,13,15,5,50,7,25.0
2,Jet Airways,Delhi,Cochin,19h,2,No info,13882,9,6,6,4,25,9,25,19,


In [834]:
df_temp.drop('Duration', axis=1, inplace=True)

In [835]:
df_temp.head(2)

Unnamed: 0,Airline,Source,Destination,Total_Stops,Additional_Info,Price,day,month,year,hour,minute,dep_hour,dep_min,duration_hour,duration_min
0,IndiGo,Banglore,New Delhi,0,No info,3897,24,3,3,1,10,22,20,2,50
1,Air India,Kolkata,Banglore,2,No info,7662,1,5,5,13,15,5,50,7,25


In [836]:
df_temp['duration_hour'].isnull().sum()

0

In [837]:
df_temp['duration_min'].isnull().sum()

1032

In [838]:
df_temp['duration_min'].value_counts()

duration_min
30    1446
20     997
50     972
35     939
55     910
15     903
45     896
25     803
40     637
5      623
10     525
Name: count, dtype: int64

In [839]:
df_temp['duration_min'].mode()

0    30
Name: duration_min, dtype: object

In [840]:
df_temp['duration_min'] = df_temp['duration_min'].fillna(30)

In [841]:
df_temp['duration_min'].isnull().sum()

0

In [842]:
df_temp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10683 entries, 0 to 10682
Data columns (total 15 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Airline          10683 non-null  object
 1   Source           10683 non-null  object
 2   Destination      10683 non-null  object
 3   Total_Stops      10683 non-null  int64 
 4   Additional_Info  10683 non-null  object
 5   Price            10683 non-null  int64 
 6   day              10683 non-null  int32 
 7   month            10683 non-null  int32 
 8   year             10683 non-null  int32 
 9   hour             10683 non-null  int32 
 10  minute           10683 non-null  int32 
 11  dep_hour         10683 non-null  int32 
 12  dep_min          10683 non-null  int32 
 13  duration_hour    10683 non-null  object
 14  duration_min     10683 non-null  object
dtypes: int32(7), int64(2), object(6)
memory usage: 959.9+ KB


In [843]:
df_temp.head(5)

Unnamed: 0,Airline,Source,Destination,Total_Stops,Additional_Info,Price,day,month,year,hour,minute,dep_hour,dep_min,duration_hour,duration_min
0,IndiGo,Banglore,New Delhi,0,No info,3897,24,3,3,1,10,22,20,2,50
1,Air India,Kolkata,Banglore,2,No info,7662,1,5,5,13,15,5,50,7,25
2,Jet Airways,Delhi,Cochin,2,No info,13882,9,6,6,4,25,9,25,19,30
3,IndiGo,Kolkata,Banglore,1,No info,6218,12,5,5,23,30,18,5,5,25
4,IndiGo,Banglore,New Delhi,1,No info,13302,1,3,3,21,35,16,50,4,45


In [844]:
invalid_values = df_temp[~df_temp['duration_hour'].str.isnumeric()]
print(invalid_values)

        Airline  Source Destination  Total_Stops Additional_Info  Price  day  \
6474  Air India  Mumbai   Hyderabad            2         No info  17327    6   

      month  year  hour  minute  dep_hour  dep_min duration_hour duration_min  
6474      3     3    16      55        16       50            5m           30  


In [845]:
df_temp.drop(index=6474, inplace=True)

In [846]:
invalid_values = df_temp[~df_temp['duration_hour'].str.isnumeric()]
print(invalid_values)

Empty DataFrame
Columns: [Airline, Source, Destination, Total_Stops, Additional_Info, Price, day, month, year, hour, minute, dep_hour, dep_min, duration_hour, duration_min]
Index: []


In [847]:
df_temp['duration_hour'] = df_temp['duration_hour'].astype(int)
df_temp['duration_min'] = df_temp['duration_min'].astype(int)

In [848]:
# df_temp[df_temp['duration_min'].str.contains('5m', na=False)]

In [849]:
df_temp.info()

<class 'pandas.core.frame.DataFrame'>
Index: 10682 entries, 0 to 10682
Data columns (total 15 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Airline          10682 non-null  object
 1   Source           10682 non-null  object
 2   Destination      10682 non-null  object
 3   Total_Stops      10682 non-null  int64 
 4   Additional_Info  10682 non-null  object
 5   Price            10682 non-null  int64 
 6   day              10682 non-null  int32 
 7   month            10682 non-null  int32 
 8   year             10682 non-null  int32 
 9   hour             10682 non-null  int32 
 10  minute           10682 non-null  int32 
 11  dep_hour         10682 non-null  int32 
 12  dep_min          10682 non-null  int32 
 13  duration_hour    10682 non-null  int32 
 14  duration_min     10682 non-null  int32 
dtypes: int32(9), int64(2), object(4)
memory usage: 959.7+ KB


In [850]:
df_temp['duration_hour'].isnull().sum(), df_temp['duration_min'].isnull().sum() 

(0, 0)

In [851]:
df_temp.head(5)

Unnamed: 0,Airline,Source,Destination,Total_Stops,Additional_Info,Price,day,month,year,hour,minute,dep_hour,dep_min,duration_hour,duration_min
0,IndiGo,Banglore,New Delhi,0,No info,3897,24,3,3,1,10,22,20,2,50
1,Air India,Kolkata,Banglore,2,No info,7662,1,5,5,13,15,5,50,7,25
2,Jet Airways,Delhi,Cochin,2,No info,13882,9,6,6,4,25,9,25,19,30
3,IndiGo,Kolkata,Banglore,1,No info,6218,12,5,5,23,30,18,5,5,25
4,IndiGo,Banglore,New Delhi,1,No info,13302,1,3,3,21,35,16,50,4,45


#### Let's working on Airline, Source, Destination with OneHotEncoding

In [852]:
df['Additional_Info'].unique()

array(['No info', 'In-flight meal not included',
       'No check-in baggage included', '1 Short layover', 'No Info',
       '1 Long layover', 'Change airports', 'Business class',
       'Red-eye flight', '2 Long layover'], dtype=object)

In [853]:
df.Airline.unique()

array(['IndiGo', 'Air India', 'Jet Airways', 'SpiceJet',
       'Multiple carriers', 'GoAir', 'Vistara', 'Air Asia',
       'Vistara Premium economy', 'Jet Airways Business',
       'Multiple carriers Premium economy', 'Trujet'], dtype=object)

In [854]:
df.Destination.unique()

array(['New Delhi', 'Banglore', 'Cochin', 'Kolkata', 'Delhi', 'Hyderabad'],
      dtype=object)

In [855]:
from sklearn.preprocessing import OneHotEncoder 

# get categorical features
categorical_columns = ['Airline', 'Source', 'Destination', 'Additional_Info']
encoder = OneHotEncoder()
encoded_columns = encoder.fit_transform(df[categorical_columns]).toarray()

In [856]:
encoded_columns

array([[0., 0., 0., ..., 0., 1., 0.],
       [0., 1., 0., ..., 0., 1., 0.],
       [0., 0., 0., ..., 0., 1., 0.],
       ...,
       [0., 0., 0., ..., 0., 1., 0.],
       [0., 0., 0., ..., 0., 1., 0.],
       [0., 1., 0., ..., 0., 1., 0.]])

In [866]:
encoded_df = pd.DataFrame(encoded_columns, columns=encoder.get_feature_names_out())

In [867]:
encoded_df.drop(index=6474, inplace=True)

In [868]:
encoded_df.shape

(10682, 33)

In [871]:
df_temp.shape

(10682, 15)

In [872]:
df_temp.drop(categorical_columns, axis=1, inplace=True)

In [877]:
final_df = pd.concat([df_temp, encoded_df], axis=1) 
final_df.head(3)

Unnamed: 0,Total_Stops,Price,day,month,year,hour,minute,dep_hour,dep_min,duration_hour,duration_min,Airline_Air Asia,Airline_Air India,Airline_GoAir,Airline_IndiGo,Airline_Jet Airways,Airline_Jet Airways Business,Airline_Multiple carriers,Airline_Multiple carriers Premium economy,Airline_SpiceJet,Airline_Trujet,Airline_Vistara,Airline_Vistara Premium economy,Source_Banglore,Source_Chennai,Source_Delhi,Source_Kolkata,Source_Mumbai,Destination_Banglore,Destination_Cochin,Destination_Delhi,Destination_Hyderabad,Destination_Kolkata,Destination_New Delhi,Additional_Info_1 Long layover,Additional_Info_1 Short layover,Additional_Info_2 Long layover,Additional_Info_Business class,Additional_Info_Change airports,Additional_Info_In-flight meal not included,Additional_Info_No Info,Additional_Info_No check-in baggage included,Additional_Info_No info,Additional_Info_Red-eye flight
0,0,3897,24,3,3,1,10,22,20,2,50,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
1,2,7662,1,5,5,13,15,5,50,7,25,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
2,2,13882,9,6,6,4,25,9,25,19,30,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0


In [878]:
final_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 10682 entries, 0 to 10682
Data columns (total 44 columns):
 #   Column                                        Non-Null Count  Dtype  
---  ------                                        --------------  -----  
 0   Total_Stops                                   10682 non-null  int64  
 1   Price                                         10682 non-null  int64  
 2   day                                           10682 non-null  int32  
 3   month                                         10682 non-null  int32  
 4   year                                          10682 non-null  int32  
 5   hour                                          10682 non-null  int32  
 6   minute                                        10682 non-null  int32  
 7   dep_hour                                      10682 non-null  int32  
 8   dep_min                                       10682 non-null  int32  
 9   duration_hour                                 10682 non-null  int3