# 1. Importing the Required Libraries

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings('ignore')

# 2.Reading the Dataset and Understanding the Columns/Features.

In [3]:
data = pd.read_csv(r"C:\Users\Admin\Desktop\Data_Guru\Airlines_Passenger_Satisfaction_Prediction\dataset\airline_passenger_satisfaction.csv")
data.head()

Unnamed: 0,ID,Gender,Age,Customer Type,Type of Travel,Class,Flight Distance,Departure Delay,Arrival Delay,Departure and Arrival Time Convenience,...,On-board Service,Seat Comfort,Leg Room Service,Cleanliness,Food and Drink,In-flight Service,In-flight Wifi Service,In-flight Entertainment,Baggage Handling,Satisfaction
0,1,Male,48,First-time,Business,Business,821,2,5.0,3,...,3,5,2,5,5,5,3,5,5,Neutral or Dissatisfied
1,2,Female,35,Returning,Business,Business,821,26,39.0,2,...,5,4,5,5,3,5,2,5,5,Satisfied
2,3,Male,41,Returning,Business,Business,853,0,0.0,4,...,3,5,3,5,5,3,4,3,3,Satisfied
3,4,Male,50,Returning,Business,Business,1905,0,0.0,2,...,5,5,5,4,4,5,2,5,5,Satisfied
4,5,Female,49,Returning,Business,Business,3470,0,1.0,3,...,3,4,4,5,4,3,3,3,3,Satisfied


In [4]:
# By default, pandas may limit the number of columns displayed in the output, which can make it difficult to view all columns in a large dataset.
# To ensure all columns are visible, we can use `pd.set_option('display.max_columns', 50)`, which increases the maximum number of columns displayed to 50.

pd.set_option('display.max_columns',50)

In [5]:
data.head()  

Unnamed: 0,ID,Gender,Age,Customer Type,Type of Travel,Class,Flight Distance,Departure Delay,Arrival Delay,Departure and Arrival Time Convenience,Ease of Online Booking,Check-in Service,Online Boarding,Gate Location,On-board Service,Seat Comfort,Leg Room Service,Cleanliness,Food and Drink,In-flight Service,In-flight Wifi Service,In-flight Entertainment,Baggage Handling,Satisfaction
0,1,Male,48,First-time,Business,Business,821,2,5.0,3,3,4,3,3,3,5,2,5,5,5,3,5,5,Neutral or Dissatisfied
1,2,Female,35,Returning,Business,Business,821,26,39.0,2,2,3,5,2,5,4,5,5,3,5,2,5,5,Satisfied
2,3,Male,41,Returning,Business,Business,853,0,0.0,4,4,4,5,4,3,5,3,5,5,3,4,3,3,Satisfied
3,4,Male,50,Returning,Business,Business,1905,0,0.0,2,2,3,4,2,5,5,5,4,4,5,2,5,5,Satisfied
4,5,Female,49,Returning,Business,Business,3470,0,1.0,3,3,3,5,3,3,4,4,5,4,3,3,3,3,Satisfied


### Understanding the Dataset :

| Field                          | Description                                                                                         |
|--------------------------------|-----------------------------------------------------------------------------------------------------|
| **ID**                         | Unique passenger identifier                                                                        |
| **Gender**                     | Gender of the passenger (Female/Male)                                                             |
| **Age**                        | Age of the passenger                                                                              |
| **Customer Type**              | Type of airline customer (First-time/Returning)                                                  |
| **Type of Travel**             | Purpose of the flight (Business/Personal)                                                        |
| **Class**                      | Travel class in the airplane for the passenger seat                                               |
| **Flight Distance**            | Flight distance in miles                                                                          |
| **Departure Delay**            | Flight departure delay in minutes                                                                |
| **Arrival Delay**              | Flight arrival delay in minutes                                                                  |
| **Departure and Arrival Time Convenience** | Satisfaction level with the convenience of the flight departure and arrival times from 1 (lowest) to 5 (highest) - 0 means "not applicable" |
| **Ease of Online Booking**     | Satisfaction level with the online booking experience from 1 (lowest) to 5 (highest) - 0 means "not applicable" |
| **Check-in Service**           | Satisfaction level with the check-in service from 1 (lowest) to 5 (highest) - 0 means "not applicable" |
| **Online Boarding**            | Satisfaction level with the online boarding experience from 1 (lowest) to 5 (highest) - 0 means "not applicable" |
| **Gate Location**              | Satisfaction level with the gate location in the airport from 1 (lowest) to 5 (highest) - 0 means "not applicable" |
| **On-board Service**           | Satisfaction level with the on-boarding service in the airport from 1 (lowest) to 5 (highest) - 0 means "not applicable" |
| **Seat Comfort**               | Satisfaction level with the comfort of the airplane seat from 1 (lowest) to 5 (highest) - 0 means "not applicable" |
| **Leg Room Service**           | Satisfaction level with the leg room of the airplane seat from 1 (lowest) to 5 (highest) - 0 means "not applicable" |
| **Cleanliness**                | Satisfaction level with the cleanliness of the airplane from 1 (lowest) to 5 (highest) - 0 means "not applicable" |
| **Food and Drink**             | Satisfaction level with the food and drinks on the airplane from 1 (lowest) to 5 (highest) - 0 means "not applicable" |
| **In-flight Service**          | Satisfaction level with the in-flight service from 1 (lowest) to 5 (highest) - 0 means "not applicable" |
| **In-flight Wifi Service**     | Satisfaction level with the in-flight Wifi service from 1 (lowest) to 5 (highest) - 0 means "not applicable" |
| **In-flight Entertainment**    | Satisfaction level with the in-flight entertainment from 1 (lowest) to 5 (highest) - 0 means "not applicable" |
| **Baggage Handling**           | Satisfaction level with the baggage handling from the airline from 1 (lowest) to 5 (highest) - 0 means "not applicable" |
| **Satisfaction**               | Overall satisfaction level with the airline (Satisfied/Neutral or Unsatisfied)                   |




# 3. Data Exploration , Cleaning and Handling

* ##### Initial Data Inspection

In [6]:
data.head()

Unnamed: 0,ID,Gender,Age,Customer Type,Type of Travel,Class,Flight Distance,Departure Delay,Arrival Delay,Departure and Arrival Time Convenience,Ease of Online Booking,Check-in Service,Online Boarding,Gate Location,On-board Service,Seat Comfort,Leg Room Service,Cleanliness,Food and Drink,In-flight Service,In-flight Wifi Service,In-flight Entertainment,Baggage Handling,Satisfaction
0,1,Male,48,First-time,Business,Business,821,2,5.0,3,3,4,3,3,3,5,2,5,5,5,3,5,5,Neutral or Dissatisfied
1,2,Female,35,Returning,Business,Business,821,26,39.0,2,2,3,5,2,5,4,5,5,3,5,2,5,5,Satisfied
2,3,Male,41,Returning,Business,Business,853,0,0.0,4,4,4,5,4,3,5,3,5,5,3,4,3,3,Satisfied
3,4,Male,50,Returning,Business,Business,1905,0,0.0,2,2,3,4,2,5,5,5,4,4,5,2,5,5,Satisfied
4,5,Female,49,Returning,Business,Business,3470,0,1.0,3,3,3,5,3,3,4,4,5,4,3,3,3,3,Satisfied


In [7]:
data.tail()

Unnamed: 0,ID,Gender,Age,Customer Type,Type of Travel,Class,Flight Distance,Departure Delay,Arrival Delay,Departure and Arrival Time Convenience,Ease of Online Booking,Check-in Service,Online Boarding,Gate Location,On-board Service,Seat Comfort,Leg Room Service,Cleanliness,Food and Drink,In-flight Service,In-flight Wifi Service,In-flight Entertainment,Baggage Handling,Satisfaction
129875,129876,Male,28,Returning,Personal,Economy Plus,447,2,3.0,4,4,4,4,2,5,1,4,4,4,5,4,4,4,Neutral or Dissatisfied
129876,129877,Male,41,Returning,Personal,Economy Plus,308,0,0.0,5,3,5,3,4,5,2,5,2,2,4,3,2,5,Neutral or Dissatisfied
129877,129878,Male,42,Returning,Personal,Economy Plus,337,6,14.0,5,2,4,2,1,3,3,4,3,3,4,2,3,5,Neutral or Dissatisfied
129878,129879,Male,50,Returning,Personal,Economy Plus,337,31,22.0,4,4,3,4,1,4,4,5,3,3,4,5,3,5,Satisfied
129879,129880,Female,20,Returning,Personal,Economy Plus,337,0,0.0,1,3,4,3,2,4,2,4,2,2,2,3,2,1,Neutral or Dissatisfied


* ##### Checking for the 10 random samples of the dataset

In [8]:
data.sample(10)

Unnamed: 0,ID,Gender,Age,Customer Type,Type of Travel,Class,Flight Distance,Departure Delay,Arrival Delay,Departure and Arrival Time Convenience,Ease of Online Booking,Check-in Service,Online Boarding,Gate Location,On-board Service,Seat Comfort,Leg Room Service,Cleanliness,Food and Drink,In-flight Service,In-flight Wifi Service,In-flight Entertainment,Baggage Handling,Satisfaction
17316,17317,Male,19,Returning,Personal,Economy,403,0,8.0,4,1,5,1,3,4,3,4,3,3,4,1,3,4,Neutral or Dissatisfied
122377,122378,Male,37,First-time,Business,Economy,129,0,0.0,5,0,1,0,4,3,4,5,4,4,3,0,4,1,Satisfied
114604,114605,Male,44,Returning,Business,Business,325,66,80.0,1,1,4,4,1,5,5,5,4,5,5,1,5,5,Satisfied
24981,24982,Female,24,First-time,Business,Economy,484,0,0.0,4,5,1,5,2,1,1,4,1,1,1,5,1,3,Satisfied
128014,128015,Male,45,Returning,Business,Business,1440,0,0.0,5,5,4,3,5,3,4,3,1,1,3,3,3,3,Neutral or Dissatisfied
39662,39663,Male,22,First-time,Business,Economy,476,0,0.0,0,3,2,3,4,3,2,1,2,2,5,3,2,3,Neutral or Dissatisfied
70290,70291,Female,37,Returning,Business,Business,452,19,35.0,4,2,1,3,4,4,3,4,4,1,4,4,4,4,Neutral or Dissatisfied
8664,8665,Male,39,Returning,Business,Business,227,2,3.0,2,3,4,5,2,5,4,5,5,5,5,2,5,5,Satisfied
101923,101924,Male,38,First-time,Business,Business,2297,0,0.0,4,4,3,4,2,4,4,2,4,4,4,4,4,3,Satisfied
5676,5677,Female,35,Returning,Business,Business,3805,0,0.0,3,3,4,5,3,3,5,4,3,4,3,3,3,4,Satisfied


* ##### Checking for Number of Rows/Records and Columns/Features in the dataset

In [9]:
print(f"Number of rows/records in the dataset:{data.shape[0]}")
print(f"Number of columns/features in the dataset:{data.shape[1]}")

Number of rows/records in the dataset:129880
Number of columns/features in the dataset:24


* ##### Checking for name of those Columns:

In [10]:
print(f"Names of those columns are:{data.columns}")

Names of those columns are:Index(['ID', 'Gender', 'Age', 'Customer Type', 'Type of Travel', 'Class',
       'Flight Distance', 'Departure Delay', 'Arrival Delay',
       'Departure and Arrival Time Convenience', 'Ease of Online Booking',
       'Check-in Service', 'Online Boarding', 'Gate Location',
       'On-board Service', 'Seat Comfort', 'Leg Room Service', 'Cleanliness',
       'Food and Drink', 'In-flight Service', 'In-flight Wifi Service',
       'In-flight Entertainment', 'Baggage Handling', 'Satisfaction'],
      dtype='object')


* ##### Checking for the basic information about the columns or features in the dataset

In [11]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 129880 entries, 0 to 129879
Data columns (total 24 columns):
 #   Column                                  Non-Null Count   Dtype  
---  ------                                  --------------   -----  
 0   ID                                      129880 non-null  int64  
 1   Gender                                  129880 non-null  object 
 2   Age                                     129880 non-null  int64  
 3   Customer Type                           129880 non-null  object 
 4   Type of Travel                          129880 non-null  object 
 5   Class                                   129880 non-null  object 
 6   Flight Distance                         129880 non-null  int64  
 7   Departure Delay                         129880 non-null  int64  
 8   Arrival Delay                           129487 non-null  float64
 9   Departure and Arrival Time Convenience  129880 non-null  int64  
 10  Ease of Online Booking                  1298

* ##### Checking for the Null Values if present in the dataset

In [12]:
data.isnull().sum()

ID                                          0
Gender                                      0
Age                                         0
Customer Type                               0
Type of Travel                              0
Class                                       0
Flight Distance                             0
Departure Delay                             0
Arrival Delay                             393
Departure and Arrival Time Convenience      0
Ease of Online Booking                      0
Check-in Service                            0
Online Boarding                             0
Gate Location                               0
On-board Service                            0
Seat Comfort                                0
Leg Room Service                            0
Cleanliness                                 0
Food and Drink                              0
In-flight Service                           0
In-flight Wifi Service                      0
In-flight Entertainment           

Insight : We are having the Null/Missing Values present in the Column : "Arrival Delay" That too 393 missing values..

In [15]:
data.head()

Unnamed: 0,ID,Gender,Age,Customer Type,Type of Travel,Class,Flight Distance,Departure Delay,Arrival Delay,Departure and Arrival Time Convenience,Ease of Online Booking,Check-in Service,Online Boarding,Gate Location,On-board Service,Seat Comfort,Leg Room Service,Cleanliness,Food and Drink,In-flight Service,In-flight Wifi Service,In-flight Entertainment,Baggage Handling,Satisfaction
0,1,Male,48,First-time,Business,Business,821,2,5.0,3,3,4,3,3,3,5,2,5,5,5,3,5,5,Neutral or Dissatisfied
1,2,Female,35,Returning,Business,Business,821,26,39.0,2,2,3,5,2,5,4,5,5,3,5,2,5,5,Satisfied
2,3,Male,41,Returning,Business,Business,853,0,0.0,4,4,4,5,4,3,5,3,5,5,3,4,3,3,Satisfied
3,4,Male,50,Returning,Business,Business,1905,0,0.0,2,2,3,4,2,5,5,5,4,4,5,2,5,5,Satisfied
4,5,Female,49,Returning,Business,Business,3470,0,1.0,3,3,3,5,3,3,4,4,5,4,3,3,3,3,Satisfied
