# Airline Passenger Satisfaction
*By Rohan Dhadwal* 


## 1. Introduction

- Write a short introduction/background about your data. (include information of data source )

**Dataset** - Airline Passenger Satisfaction

**Source** - https://www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction

This dataset represents airline passenger satisfaction survey results that were collected to evaluate factors that influence passenger satisfaction or dissatisfaction. It includes various aspects of the travel experience, such as inflight services, online booking, and airport conveniences. Each passenger's satisfaction is rated based on multiple features, including flight distance, class, inflight wifi service, departure/arrival time convenience, ease of online booking, gate location, food and drink quality, online boarding, seat comfort, inflight entertainment, on-board service, legroom, baggage handling, check-in service, inflight service, cleanliness, and any departure or arrival delays, along with demographic information like age, gender, and customer loyalty. These features collectively offer a detailed view of the factors that contribute to overall passenger satisfaction.

The dataset originates from an airline survey and is designed to assess the correlation between passenger satisfaction and specific services provided during air travel. The satisfaction outcomes are categorized as "satisfied" or "neutral or dissatisfied." This offers valuable insight into which factors most strongly affect customer experience, helping airlines optimize their services.

**Columns** -

Gender: Gender of the passengers (Female, Male)

Customer Type: The customer type (Loyal customer, disloyal customer)

Age: The actual age of the passengers

Type of Travel: Purpose of the flight of the passengers (Personal Travel, Business Travel)

Class: Travel class in the plane of the passengers (Business, Eco, Eco Plus)

Flight distance: The flight distance of this journey

Inflight wifi service: Satisfaction level of the inflight wifi service (0:Not Applicable;1-5)

Departure/Arrival time convenient: Satisfaction level of Departure/Arrival time convenient

Ease of Online booking: Satisfaction level of online booking

Gate location: Satisfaction level of Gate location

Food and drink: Satisfaction level of Food and drink

Online boarding: Satisfaction level of online boarding

Seat comfort: Satisfaction level of Seat comfort

Inflight entertainment: Satisfaction level of inflight entertainment

On-board service: Satisfaction level of On-board service

Leg room service: Satisfaction level of Leg room service

Baggage handling: Satisfaction level of baggage handling

Check-in service: Satisfaction level of Check-in service

Inflight service: Satisfaction level of inflight service

Cleanliness: Satisfaction level of Cleanliness

Departure Delay in Minutes: Minutes delayed when departure

Arrival Delay in Minutes: Minutes delayed when Arrival

Satisfaction: Airline satisfaction level(Satisfaction, neutral or dissatisfaction)

- what is your research objective(s) and goal(s)

The objective of this project is to conduct comprehensive exploratory data analysis (EDA) to uncover the underlying patterns, distributions, and correlations within the airline passenger satisfaction dataset. By analyzing various aspects of the passenger journey, we aim to identify key factors that influence satisfaction and dissatisfaction levels. Ultimately, this analysis could be used to predict passenger satisfaction and suggest improvements to airline services. This will allow the airline to prioritize the factors that affect customer satisfaction and make informed decisions.

This exploration will serve as a foundation for further predictive modeling and customer experience optimization.

### EDA Hypotheses

Having worked in the hospitality industry before, I was always curious about the airline department in my previous organization. This sparked a personal interest in exploring how airlines measure and improve customer satisfaction. I see this project as an opportunity to make an impact in the airline industry by understanding the key drivers of passenger satisfaction and dissatisfaction.

**Key Research Questions:**

1. **What factors most strongly correlate with passenger satisfaction?** I am interested in identifying the most significant drivers of passenger satisfaction, such as inflight wifi service, seat comfort, or baggage handling, to understand what airlines should prioritize for improving the passenger experience.

2. **How do delays (departure and arrival) impact overall passenger satisfaction?** Since delays can be a common pain point in air travel, I am curious about how much of an impact they have on passenger satisfaction and whether other service aspects can compensate for the inconvenience.

3. **How does the satisfaction level vary by passenger age group?** Understanding how different age demographics perceive airline services can help tailor marketing strategies and service offerings, ensuring that airlines meet the diverse needs of their passengers.

These questions will guide my exploratory data analysis where I will aiming to derive actionable insights for enhancing the airline customer experience. As I delve deeper into the dataset, I might also analyze additional factors, such as interactions between multiple service features, or trends within specific customer segments, to uncover deeper insights and potential areas for improvement in airline operations and customer satisfaction strategies.

## 2. Data Analysis


### 2-1. Importing Libraries

In [22]:
# Import libraries for EDA 
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import LabelEncoder

### 2-2. Importing Data

In [24]:
# Importing data.csv
df = pd.read_csv('data.csv')
df.head()

Unnamed: 0.1,Unnamed: 0,id,Gender,Customer Type,Age,Type of Travel,Class,Flight Distance,Inflight wifi service,Departure/Arrival time convenient,...,Inflight entertainment,On-board service,Leg room service,Baggage handling,Checkin service,Inflight service,Cleanliness,Departure Delay in Minutes,Arrival Delay in Minutes,satisfaction
0,0,19556,Female,Loyal Customer,52,Business travel,Eco,160,5,4,...,5,5,5,5,2,5,5,50,44.0,satisfied
1,1,90035,Female,Loyal Customer,36,Business travel,Business,2863,1,1,...,4,4,4,4,3,4,5,0,0.0,satisfied
2,2,12360,Male,disloyal Customer,20,Business travel,Eco,192,2,0,...,2,4,1,3,2,2,2,0,0.0,neutral or dissatisfied
3,3,77959,Male,Loyal Customer,44,Business travel,Business,3377,0,0,...,1,1,1,1,3,1,4,0,6.0,satisfied
4,4,36875,Female,Loyal Customer,49,Business travel,Eco,1182,2,3,...,2,2,2,2,4,2,4,0,20.0,satisfied


### 2-3. Review Data


In [28]:
# Checking the rows and columns
df.shape

(25976, 25)

In [None]:
This means that we have 25 columns and 25,976 data entries for this dataset

In [30]:
# DataFrame information
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25976 entries, 0 to 25975
Data columns (total 25 columns):
 #   Column                             Non-Null Count  Dtype  
---  ------                             --------------  -----  
 0   Unnamed: 0                         25976 non-null  int64  
 1   id                                 25976 non-null  int64  
 2   Gender                             25976 non-null  object 
 3   Customer Type                      25976 non-null  object 
 4   Age                                25976 non-null  int64  
 5   Type of Travel                     25976 non-null  object 
 6   Class                              25976 non-null  object 
 7   Flight Distance                    25976 non-null  int64  
 8   Inflight wifi service              25976 non-null  int64  
 9   Departure/Arrival time convenient  25976 non-null  int64  
 10  Ease of Online booking             25976 non-null  int64  
 11  Gate location                      25976 non-null  int

In [32]:
# DataFrame summary statisftics
df.describe()

Unnamed: 0.1,Unnamed: 0,id,Age,Flight Distance,Inflight wifi service,Departure/Arrival time convenient,Ease of Online booking,Gate location,Food and drink,Online boarding,Seat comfort,Inflight entertainment,On-board service,Leg room service,Baggage handling,Checkin service,Inflight service,Cleanliness,Departure Delay in Minutes,Arrival Delay in Minutes
count,25976.0,25976.0,25976.0,25976.0,25976.0,25976.0,25976.0,25976.0,25976.0,25976.0,25976.0,25976.0,25976.0,25976.0,25976.0,25976.0,25976.0,25976.0,25976.0,25893.0
mean,12987.5,65005.657992,39.620958,1193.788459,2.724746,3.046812,2.756775,2.977094,3.215353,3.261665,3.449222,3.357753,3.385664,3.350169,3.633238,3.314175,3.649253,3.286226,14.30609,14.740857
std,7498.769632,37611.526647,15.135685,998.683999,1.335384,1.533371,1.412951,1.282133,1.331506,1.355536,1.32009,1.338299,1.282088,1.318862,1.176525,1.269332,1.180681,1.31933,37.42316,37.517539
min,0.0,17.0,7.0,31.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0
25%,6493.75,32170.5,27.0,414.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,3.0,3.0,3.0,2.0,0.0,0.0
50%,12987.5,65319.5,40.0,849.0,3.0,3.0,3.0,3.0,3.0,4.0,4.0,4.0,4.0,4.0,4.0,3.0,4.0,3.0,0.0,0.0
75%,19481.25,97584.25,51.0,1744.0,4.0,4.0,4.0,4.0,4.0,4.0,5.0,4.0,4.0,4.0,5.0,4.0,5.0,4.0,12.0,13.0
max,25975.0,129877.0,85.0,4983.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,1128.0,1115.0


### 2-4. Generate Sub-dataset

In [None]:
# 1. Make sub-dataset(s) from your original dataset for your research objective, goals by dropping unnecessary variables 


In [None]:
# 2. try to change variable names

In [None]:
# 3. check if there are missing values

In [None]:
# 4. Check if there are duplicated values

### 2-5. Checking Outliers

In [None]:
# check if quantitative variables have outliers.

### 2-6. Generating Plot(s)

In [None]:
# generate plots to support your objective and goals

## 3. EDA


In [None]:
# perform your EDA

## 4. Conclusion 

In [None]:
# Answer the questions

## 5. Summary

(summarize your results) 

## References

> provide a list of references