# <span style = "color: gray"> Airline Passenger Satisfaction Prediction </span>

***

This dataset contains an airline passenger satisfaction survey. What factors are highly correlated to a satisfied (or dissatisfied) passenger? Can you predict passenger satisfaction?

## <span style = "color : blue"> Content </span>

* Gender: Gender of the passengers (Female, Male)

* Customer Type: The customer type (Loyal customer, disloyal customer)

* Age: The actual age of the passengers

* Type of Travel: Purpose of the flight of the passengers (Personal Travel, Business Travel)

* Class: Travel class in the plane of the passengers (Business, Eco, Eco Plus)

* Flight distance: The flight distance of this journey

* Inflight wifi service: Satisfaction level of the inflight wifi service (0:Not Applicable;1-5)

* Departure/Arrival time convenient: Satisfaction level of Departure/Arrival time convenient

* Ease of Online booking: Satisfaction level of online booking

* Gate location: Satisfaction level of Gate location

* Food and drink: Satisfaction level of Food and drink

* Online boarding: Satisfaction level of online boarding

* Seat comfort: Satisfaction level of Seat comfort

* Inflight entertainment: Satisfaction level of inflight entertainment

* On-board service: Satisfaction level of On-board service

* Leg room service: Satisfaction level of Leg room service

* Baggage handling: Satisfaction level of baggage handling

* Check-in service: Satisfaction level of Check-in service

* Inflight service: Satisfaction level of inflight service

* Cleanliness: Satisfaction level of Cleanliness

* Departure Delay in Minutes: Minutes delayed when departure

* Arrival Delay in Minutes: Minutes delayed when Arrival

* Satisfaction: Airline satisfaction level(Satisfaction, neutral or dissatisfaction)

## Let's dive into it

### Import necessary libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Read 'airline_passenger_satisfaction.csv' dataset and store it in a DataFrame

In [None]:
df = pd.read_csv('airline_passenger_satisfaction.csv')

### View the top 5 rows

In [None]:
pd.set_option('display.max_columns', None)

In [None]:
df.head()

### Find info of the dataset

In [None]:
df.info

### Find basic statistical information about the dataset

In [None]:
df.describe()

### Check for any null values

In [None]:
df.isna().sum()

### Fill missing value with mean

In [None]:
df['Arrival Delay in Minutes'].fillna(df['Arrival Delay in Minutes'].mean(),inplace=True)

### View unique values in all categorical columns

In [None]:
categorical_columns = ['Gender', 'Customer Type', 'Type of Travel','Class', 'satisfaction']


for feature in categorical_columns:
    print(f'Unique values in {feature}',df[feature].unique().tolist())




### Change values in satisfaction to:
* Neutral to 1
* Dissatisfied to 0
* Satisfied to 1


In [None]:
def satisfied(x):
    if x == 'neutral or dissatisfied':
        return 0
    else:
        return 1

In [None]:
df['satisfaction']= df['satisfaction'].apply(satisfied)

In [None]:
df.head()

In [None]:
df['satisfaction'].unique()

### Drop Unnamed: 0 and id column

In [None]:
df.drop(columns=['Unnamed: 0','id'],inplace=True)

In [None]:
df.head()

### Change the rest of the categorical data into nominal using OneHotEncoding

In [None]:
df = pd.get_dummies(df,dtype=int)

In [None]:
df

# <span style = "color : red"> Visualization </span>

### Plot a pairplot of the dataset

In [None]:
sns.pairplot(df)

### Plot a countplot of Type of travel

### Plot a countplot of Customer Type

In [None]:
sns.countplot(data=df,x='Type of Travel_Business travel',hue='Type of Travel_Business travel')
plt.xticks(labels=['Personal Travel','Business Travel'],ticks=[0,1])
plt.legend(['personal','business'])
plt.show()

### Split the columns into input and target variables

In [None]:
X, y = df.drop('satisfaction',axis=1),df['satisfaction']

### Standardise the data using StandardScaler

In [27]:
from sklearn.preprocessing import StandardScaler

In [28]:
st = StandardScaler()
X = pd.DataFrame(st.fit_transform(X),columns=st.get_feature_names_out())

NameError: name 'X' is not defined

In [None]:
X.head()

### Split the dataset into training and testing set

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=42)

### Check the shape of X_train and X_test

In [26]:
X_train.shape

NameError: name 'X_train' is not defined

In [None]:
X_test.shape

### Create Random Forest model

In [None]:
from sklearn.ensemble import RandomForestClassifier

In [None]:
rd = RandomForestClassifier()

### Train the model with X_train and y_train

In [None]:
rd.fit(X_train,y_train)

### Check the score of our trained model

In [None]:
rd.score(X_train,y_train)

### Make predictions with X_test

In [None]:
y_pred = rd.predict(X_test)

### Check the acccuracy score of our prediction

In [None]:
from sklearn import metrics

In [None]:
metrics.accuracy_score(y_test,y_pred)

### Create a confusion matrix

In [None]:
metrics.confusion_matrix(y_test,y_pred)

### Plot confusion matrix on heatmap

In [None]:
sns.heatmap(metrics.confusion_matrix(y_test,y_pred),annot=True)

### Create classification report

In [None]:
print(metrics.classification_report(y_test,y_pred))

***

# <span style = "color : green;font-size:40px"> Great Job! </span>