# Data Card

### About Dataset

## Context
#### US Airline passenger satisfaction survey -https://www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction/data

## Content
#### Satisfaction: Airline satisfaction level(Satisfaction, neutral or dissatisfaction)
#### Age: The actual age of the passengers
#### Gender: Gender of the passengers (Female, Male)
#### Type of Travel: Purpose of the flight of the passengers (Personal Travel, Business Travel)"
#### Class:Travel class in the plane of the passengers (Business, Eco, Eco Plus)"
#### Customer Type:The customer type (Loyal customer, disloyal customer)
#### Flight distance:The flight distance of this journey
#### Inflight wifi service:Satisfaction level of the inflight wifi service (0:Not Applicable;1-5)"
#### Ease of Online booking:Satisfaction level of online booking
#### Inflight service:Satisfaction level of inflight service
#### Online boarding:Satisfaction level of online boarding
#### Inflight entertainment:Satisfaction level of inflight entertainment
#### Food and drink:Satisfaction level of Food and drink
#### Seat comfort:Satisfaction level of Seat comfort
#### On-board service:Satisfaction level of On-board service
#### Leg room service:Satisfaction level of Leg room service
#### Departure/Arrival time convenient:Satisfaction level of Departure/Arrival time convenient
#### Baggage handling:Satisfaction level of baggage handling
#### Gate location:Satisfaction level of Gate location
#### Cleanliness:Satisfaction level of Cleanliness
#### Check-in service:Satisfaction level of Check-in service
#### Departure Delay in Minutes:Minutes delayed when departure
#### Arrival Delay in Minutes:Minutes delayed when Arrival
#### Flight cancelled:Whether the Flight cancelled or not (Yes, No)
#### Flight time in minutes:Minutes of Flight takes


# Import Libraries

In [1]:
import pandas as pd
import numpy as np
import myfunctions as mf # my module
import importlib
importlib.reload(mf) # reload my functions if any update is made 

<module 'myfunctions' from 'C:\\Users\\igyan\\Desktop\\DS\\PROJECTS\\my_projects\\myfunctions.py'>

# Load Dataset

In [2]:
# file located on my local machine
file_path = r"C:\\Users\\igyan\\Desktop\\DS\\PROJECTS\my_projects/satisfaction.csv"
user_df = pd.read_csv(file_path)

# print the first 5 rows
user_df.head()

Unnamed: 0,id,satisfaction_v2,Gender,Customer Type,Age,Type of Travel,Class,Flight Distance,Seat comfort,Departure/Arrival time convenient,...,Online support,Ease of Online booking,On-board service,Leg room service,Baggage handling,Checkin service,Cleanliness,Online boarding,Departure Delay in Minutes,Arrival Delay in Minutes
0,11112,satisfied,Female,Loyal Customer,65,Personal Travel,Eco,265,0,0,...,2,3,3,0,3,5,3,2,0,0.0
1,110278,satisfied,Male,Loyal Customer,47,Personal Travel,Business,2464,0,0,...,2,3,4,4,4,2,3,2,310,305.0
2,103199,satisfied,Female,Loyal Customer,15,Personal Travel,Eco,2138,0,0,...,2,2,3,3,4,4,4,2,0,0.0
3,47462,satisfied,Female,Loyal Customer,60,Personal Travel,Eco,623,0,0,...,3,1,1,0,1,4,1,3,0,0.0
4,120011,satisfied,Female,Loyal Customer,70,Personal Travel,Eco,354,0,0,...,4,2,2,0,2,4,2,5,0,0.0


# Dataset Overview

In [3]:
# Dataset Info
user_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 129880 entries, 0 to 129879
Data columns (total 24 columns):
 #   Column                             Non-Null Count   Dtype  
---  ------                             --------------   -----  
 0   id                                 129880 non-null  int64  
 1   satisfaction_v2                    129880 non-null  object 
 2   Gender                             129880 non-null  object 
 3   Customer Type                      129880 non-null  object 
 4   Age                                129880 non-null  int64  
 5   Type of Travel                     129880 non-null  object 
 6   Class                              129880 non-null  object 
 7   Flight Distance                    129880 non-null  int64  
 8   Seat comfort                       129880 non-null  int64  
 9   Departure/Arrival time convenient  129880 non-null  int64  
 10  Food and drink                     129880 non-null  int64  
 11  Gate location                      1298

In [4]:
# checking for missing values
print('Missing Values:\n')
user_df.isnull().sum()

Missing Values:



id                                     0
satisfaction_v2                        0
Gender                                 0
Customer Type                          0
Age                                    0
Type of Travel                         0
Class                                  0
Flight Distance                        0
Seat comfort                           0
Departure/Arrival time convenient      0
Food and drink                         0
Gate location                          0
Inflight wifi service                  0
Inflight entertainment                 0
Online support                         0
Ease of Online booking                 0
On-board service                       0
Leg room service                       0
Baggage handling                       0
Checkin service                        0
Cleanliness                            0
Online boarding                        0
Departure Delay in Minutes             0
Arrival Delay in Minutes             393
dtype: int64

In [5]:
# check for duplicate rows in dataset
user_df.duplicated().sum()

0

In [6]:
# check for outliers 
mf.check_for_outliers(user_df)

id                                       0
Age                                      0
Flight Distance                       2581
Seat comfort                             0
Departure/Arrival time convenient        0
Food and drink                           0
Gate location                            0
Inflight wifi service                    0
Inflight entertainment                   0
Online support                           0
Ease of Online booking                   0
On-board service                     13270
Leg room service                         0
Baggage handling                         0
Checkin service                      15370
Cleanliness                              0
Online boarding                          0
Departure Delay in Minutes           18098
Arrival Delay in Minutes             17492
dtype: int64


# Data Cleaning

### standardizing column name

In [7]:
# standardize column names using the snakecase format (lowercase and underscore)
mf.format_column_name(user_df)

standardized columns names:



['id',
 'satisfaction_v2',
 'gender',
 'customer_type',
 'age',
 'type_of_travel',
 'class',
 'flight_distance',
 'seat_comfort',
 'departure/arrival_time_convenient',
 'food_and_drink',
 'gate_location',
 'inflight_wifi_service',
 'inflight_entertainment',
 'online_support',
 'ease_of_online_booking',
 'on-board_service',
 'leg_room_service',
 'baggage_handling',
 'checkin_service',
 'cleanliness',
 'online_boarding',
 'departure_delay_in_minutes',
 'arrival_delay_in_minutes']

### Imputing Missing Values

In [8]:
# imputing missing values in arrival_delay_in_minutes column using the mean
user_df['arrival_delay_in_minutes'] = user_df['arrival_delay_in_minutes']. \
                                    fillna(user_df['arrival_delay_in_minutes'].mean())

# verify if dataset has no missing values 
user_df.isnull().sum().any()

False

### Handling outliers using vectorization

In [9]:
# handle missing values using lowerbound and upperbound values as replacement
mf.handle_outliers(user_df)

# verify if there are no outliers
mf.check_for_outliers(user_df)

Outliers capped successfully.

id                                   0
age                                  0
flight_distance                      0
seat_comfort                         0
departure/arrival_time_convenient    0
food_and_drink                       0
gate_location                        0
inflight_wifi_service                0
inflight_entertainment               0
online_support                       0
ease_of_online_booking               0
on-board_service                     0
leg_room_service                     0
baggage_handling                     0
checkin_service                      0
cleanliness                          0
online_boarding                      0
departure_delay_in_minutes           0
arrival_delay_in_minutes             0
dtype: int64


# Exploratory Data Analysis