<h1>Flight Price Dataset of Bangladesh</h1>
<p>This dataset was inspired by real-world flight data from Bangladesh</p>
<h2>About Dataset</h2>
<h3>Introduction</h3>
<p>The "Bangladesh Flight Fare Dataset" is a synthetic dataset comprising 57,000 flight records tailored to represent air travel scenarios originating from Bangladesh. This dataset simulates realistic flight fare dynamics, capturing key factors such as airline operations, airport specifics, travel classes, booking behaviors, and seasonal variations specific to Bangladesh’s aviation market. It is designed for researchers, data scientists, and analysts interested in flight fare prediction, travel pattern analysis, or machine learning/deep learning applications. By combining real-world inspired statistical distributions and aviation industry standards, this dataset provides a robust foundation for exploring flight economics in a South Asian context.</p>
<h3>Dataset Purpose</h3>
<p>This dataset aims to:</p>
<ul>
<li>Facilitate predictive modeling of flight fares, with "Total Fare (BDT)" as the primary target variable.</li>
<li>Enable analysis of travel trends, including the impact of cultural festivals (e.g., Eid, Hajj) and booking timings on pricing.</li>
<li>Serve as a training resource for machine learning (ML) and deep learning (DL) models, with sufficient sample size (50,000) and feature diversity for generalization.</li>
<li>Provide a realistic yet synthetic representation of Bangladesh’s air travel ecosystem, blending domestic and international flight scenarios.</li>
</ul>
<h2>Data Source</h2>
<h3>Kaggle</h3>
<p><a href="https://www.kaggle.com/datasets/mahatiratusher/flight-price-dataset-of-bangladesh">Flight Price Dataset of Bangladesh</a></p>

In [28]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pickle
import warnings
warnings.filterwarnings('ignore')

In [20]:
df = pd.read_csv("./datasets/Flight_Price_Dataset_of_Bangladesh.csv")
df.head()

Unnamed: 0,Airline,Source,Source Name,Destination,Destination Name,Departure Date & Time,Arrival Date & Time,Duration (hrs),Stopovers,Aircraft Type,Class,Booking Source,Base Fare (BDT),Tax & Surcharge (BDT),Total Fare (BDT),Seasonality,Days Before Departure
0,Malaysian Airlines,CXB,Cox's Bazar Airport,CCU,Netaji Subhas Chandra Bose International Airpo...,2025-11-17 06:25:00,2025-11-17 07:38:10,1.219526,Direct,Airbus A320,Economy,Online Website,21131.225021,5169.683753,26300.908775,Regular,10
1,Cathay Pacific,BZL,Barisal Airport,CGP,"Shah Amanat International Airport, Chittagong",2025-03-16 00:17:00,2025-03-16 00:53:31,0.608638,Direct,Airbus A320,First Class,Travel Agency,11605.395471,200.0,11805.395471,Regular,14
2,British Airways,ZYL,"Osmani International Airport, Sylhet",KUL,Kuala Lumpur International Airport,2025-12-13 12:03:00,2025-12-13 14:44:22,2.689651,1 Stop,Boeing 787,Economy,Travel Agency,39882.499349,11982.374902,51864.874251,Winter Holidays,83
3,Singapore Airlines,RJH,"Shah Makhdum Airport, Rajshahi",DAC,"Hazrat Shahjalal International Airport, Dhaka",2025-05-30 03:21:00,2025-05-30 04:02:09,0.686054,Direct,Airbus A320,Economy,Direct Booking,4435.60734,200.0,4635.60734,Regular,56
4,British Airways,SPD,Saidpur Airport,YYZ,Toronto Pearson International Airport,2025-04-25 09:14:00,2025-04-25 23:17:20,14.055609,1 Stop,Airbus A350,Business,Direct Booking,59243.806146,14886.570922,74130.377068,Regular,90


In [21]:
df.shape

(57000, 17)

# 1. Preprocessing Data

## 1. Features Selection

In [22]:
df.head()

Unnamed: 0,Airline,Source,Source Name,Destination,Destination Name,Departure Date & Time,Arrival Date & Time,Duration (hrs),Stopovers,Aircraft Type,Class,Booking Source,Base Fare (BDT),Tax & Surcharge (BDT),Total Fare (BDT),Seasonality,Days Before Departure
0,Malaysian Airlines,CXB,Cox's Bazar Airport,CCU,Netaji Subhas Chandra Bose International Airpo...,2025-11-17 06:25:00,2025-11-17 07:38:10,1.219526,Direct,Airbus A320,Economy,Online Website,21131.225021,5169.683753,26300.908775,Regular,10
1,Cathay Pacific,BZL,Barisal Airport,CGP,"Shah Amanat International Airport, Chittagong",2025-03-16 00:17:00,2025-03-16 00:53:31,0.608638,Direct,Airbus A320,First Class,Travel Agency,11605.395471,200.0,11805.395471,Regular,14
2,British Airways,ZYL,"Osmani International Airport, Sylhet",KUL,Kuala Lumpur International Airport,2025-12-13 12:03:00,2025-12-13 14:44:22,2.689651,1 Stop,Boeing 787,Economy,Travel Agency,39882.499349,11982.374902,51864.874251,Winter Holidays,83
3,Singapore Airlines,RJH,"Shah Makhdum Airport, Rajshahi",DAC,"Hazrat Shahjalal International Airport, Dhaka",2025-05-30 03:21:00,2025-05-30 04:02:09,0.686054,Direct,Airbus A320,Economy,Direct Booking,4435.60734,200.0,4635.60734,Regular,56
4,British Airways,SPD,Saidpur Airport,YYZ,Toronto Pearson International Airport,2025-04-25 09:14:00,2025-04-25 23:17:20,14.055609,1 Stop,Airbus A350,Business,Direct Booking,59243.806146,14886.570922,74130.377068,Regular,90


In [23]:
df.drop(['Source Name','Destination Name','Arrival Date & Time'], axis=1, inplace=True)
df.head()

Unnamed: 0,Airline,Source,Destination,Departure Date & Time,Duration (hrs),Stopovers,Aircraft Type,Class,Booking Source,Base Fare (BDT),Tax & Surcharge (BDT),Total Fare (BDT),Seasonality,Days Before Departure
0,Malaysian Airlines,CXB,CCU,2025-11-17 06:25:00,1.219526,Direct,Airbus A320,Economy,Online Website,21131.225021,5169.683753,26300.908775,Regular,10
1,Cathay Pacific,BZL,CGP,2025-03-16 00:17:00,0.608638,Direct,Airbus A320,First Class,Travel Agency,11605.395471,200.0,11805.395471,Regular,14
2,British Airways,ZYL,KUL,2025-12-13 12:03:00,2.689651,1 Stop,Boeing 787,Economy,Travel Agency,39882.499349,11982.374902,51864.874251,Winter Holidays,83
3,Singapore Airlines,RJH,DAC,2025-05-30 03:21:00,0.686054,Direct,Airbus A320,Economy,Direct Booking,4435.60734,200.0,4635.60734,Regular,56
4,British Airways,SPD,YYZ,2025-04-25 09:14:00,14.055609,1 Stop,Airbus A350,Business,Direct Booking,59243.806146,14886.570922,74130.377068,Regular,90


In [24]:
df.shape

(57000, 14)

## 2. Missing Data

In [10]:
df.isnull().sum()

Airline                  0
Source                   0
Destination              0
Departure Date & Time    0
Duration (hrs)           0
Stopovers                0
Aircraft Type            0
Class                    0
Booking Source           0
Base Fare (BDT)          0
Tax & Surcharge (BDT)    0
Total Fare (BDT)         0
Seasonality              0
Days Before Departure    0
dtype: int64

## 3. Duplicates

In [11]:
df.duplicated().sum()

np.int64(0)

## 4. Data Type

In [25]:
df.dtypes

Airline                   object
Source                    object
Destination               object
Departure Date & Time     object
Duration (hrs)           float64
Stopovers                 object
Aircraft Type             object
Class                     object
Booking Source            object
Base Fare (BDT)          float64
Tax & Surcharge (BDT)    float64
Total Fare (BDT)         float64
Seasonality               object
Days Before Departure      int64
dtype: object

In [26]:
df['Departure Date & Time'] = pd.to_datetime(df['Departure Date & Time'])
df.dtypes

Airline                          object
Source                           object
Destination                      object
Departure Date & Time    datetime64[ns]
Duration (hrs)                  float64
Stopovers                        object
Aircraft Type                    object
Class                            object
Booking Source                   object
Base Fare (BDT)                 float64
Tax & Surcharge (BDT)           float64
Total Fare (BDT)                float64
Seasonality                      object
Days Before Departure             int64
dtype: object

In [27]:
df['Departure Date & Time'] = df['Departure Date & Time'].dt.date
df = df.rename(columns={'Departure Date & Time':'Date'})
df.head()

Unnamed: 0,Airline,Source,Destination,Date,Duration (hrs),Stopovers,Aircraft Type,Class,Booking Source,Base Fare (BDT),Tax & Surcharge (BDT),Total Fare (BDT),Seasonality,Days Before Departure
0,Malaysian Airlines,CXB,CCU,2025-11-17,1.219526,Direct,Airbus A320,Economy,Online Website,21131.225021,5169.683753,26300.908775,Regular,10
1,Cathay Pacific,BZL,CGP,2025-03-16,0.608638,Direct,Airbus A320,First Class,Travel Agency,11605.395471,200.0,11805.395471,Regular,14
2,British Airways,ZYL,KUL,2025-12-13,2.689651,1 Stop,Boeing 787,Economy,Travel Agency,39882.499349,11982.374902,51864.874251,Winter Holidays,83
3,Singapore Airlines,RJH,DAC,2025-05-30,0.686054,Direct,Airbus A320,Economy,Direct Booking,4435.60734,200.0,4635.60734,Regular,56
4,British Airways,SPD,YYZ,2025-04-25,14.055609,1 Stop,Airbus A350,Business,Direct Booking,59243.806146,14886.570922,74130.377068,Regular,90


In [29]:
# Save the DataFrame resulting from data cleaning to a file using pickle, so you don't have to do the same steps if you want to do analysis
with open('./datasets/data_cleaned.pkl', 'wb') as file:
    pickle.dump(df, file)

# 2. Exploratory Data Analysis

In [None]:
# Loading DataFrame from pickle file
with open('./datasets/data_cleaned.pkl', 'rb') as file:
    df = pickle.load(file)