# Airline Passenger Satisfation

## Introduction

In this Machine Learning project, our objective is to predict airline passenger satisfaction based on various features like flight distance, inflight wifi service, cleanliness, food and drink, among others. The importance of customer satisfaction cannot be overstated, especially in the airline industry where it can determine the success or failure of an airline. By accurately predicting passenger satisfaction, airlines can improve their services and increase customer loyalty.

## Objectives

- To accurately predict airline passenger satisfaction based on various features.
- To identify which features have the most impact on passenger satisfaction.
- To provide insights to airlines for improving their services and increasing customer loyalty.

## Possible Results

The possible results of this project are:

- A Machine Learning model that can accurately predict airline passenger satisfaction.
- Identification of the most important features that impact passenger satisfaction.
- Insights and recommendations for airlines to improve their services and increase customer loyalty.

## Methodology

1. **Data Collection**: Collect data from various sources such as public datasets, surveys, and web scraping.
2. **Data Cleaning**: Clean the data by removing duplicates, handling missing values, and removing irrelevant features.
3. **Exploratory Data Analysis (EDA)**: Perform EDA to understand the data and identify trends, patterns, and outliers.
4. **Feature Engineering**: Create new features from the existing features that are relevant to the problem.
5. **Model Selection**: Select the appropriate Machine Learning algorithm that can accurately predict passenger satisfaction.
6. **Model Training**: Train the selected model on the cleaned and transformed data.
7. **Model Evaluation**: Evaluate the model's performance using various metrics such as accuracy, precision, recall, and F1-score.
8. **Hyperparameter Tuning**: Optimize the model's hyperparameters to improve its performance.
9. **Model Deployment**: Deploy the model in a production environment and test its performance in a real-world scenario.

To implement this methodology in Python, we will be using various libraries such as pandas, numpy, matplotlib, seaborn, and scikit-learn. We will also be using Jupyter Notebook to perform the analysis and Python scripts to deploy the model in a production environment.

To start the project, we will collect data from various sources such as public datasets, surveys, and web scraping. Then, we will clean the data by removing duplicates, handling missing values, and removing irrelevant features. After cleaning the data, we will perform exploratory data analysis (EDA) to understand the data and identify trends, patterns, and outliers.

Next, we will create new features from the existing ones that are relevant to the problem, which is called feature engineering. We will then select the appropriate Machine Learning algorithm that can accurately predict passenger satisfaction. We will train the selected model on the cleaned and transformed data, evaluate its performance using various metrics such as accuracy, precision, recall, and F1-score, and optimize its hyperparameters to improve its performance.

Finally, we will deploy the model in a production environment and test its performance in a real-world scenario. Throughout the project, we will be using various Python libraries such as pandas, numpy, matplotlib, seaborn, and scikit-learn. We will also be using Jupyter Notebook to perform the analysis and Python scripts to deploy the model in a production environment.

In this project, we will be using a CSV file that is stored in a GitHub repository. We will be reading this CSV file into a Jupyter Notebook using the pandas `read_csv` method.

## 1. Libraries

### 1.2 User defined functions

In [1]:
import sys
sys.path.append('/utils/__init__.py')
import utils

utils.letsgo()

Utils load correctly


### 1.3 Other libraries

In [2]:
import pandas as pd
import numpy as np
import seaborn as sns

# Save a palette to a variable:
palette = sns.color_palette("bright")
sns.set_palette(palette)

import matplotlib.pyplot as plt 
from matplotlib.ticker import ScalarFormatter

%matplotlib inline 
sns.set(color_codes=True)

# Machine learning
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn import tree
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from sklearn import metrics
from sklearn.ensemble import RandomForestRegressor
from xgboost import XGBRegressor

## 2. Load dataset

In [6]:
url = 'https://raw.githubusercontent.com/jeanpierec/ia_esp/main/P0_finalproject/datasets/test.csv'
df_test = pd.read_csv(url)

url = 'https://raw.githubusercontent.com/jeanpierec/ia_esp/main/P0_finalproject/datasets/train.csv'
df_train = pd.read_csv(url)

In [7]:
df_test.head()

Unnamed: 0.1,Unnamed: 0,id,Gender,Customer Type,Age,Type of Travel,Class,Flight Distance,Inflight wifi service,Departure/Arrival time convenient,...,Inflight entertainment,On-board service,Leg room service,Baggage handling,Checkin service,Inflight service,Cleanliness,Departure Delay in Minutes,Arrival Delay in Minutes,satisfaction
0,0,19556,Female,Loyal Customer,52,Business travel,Eco,160,5,4,...,5,5,5,5,2,5,5,50,44.0,satisfied
1,1,90035,Female,Loyal Customer,36,Business travel,Business,2863,1,1,...,4,4,4,4,3,4,5,0,0.0,satisfied
2,2,12360,Male,disloyal Customer,20,Business travel,Eco,192,2,0,...,2,4,1,3,2,2,2,0,0.0,neutral or dissatisfied
3,3,77959,Male,Loyal Customer,44,Business travel,Business,3377,0,0,...,1,1,1,1,3,1,4,0,6.0,satisfied
4,4,36875,Female,Loyal Customer,49,Business travel,Eco,1182,2,3,...,2,2,2,2,4,2,4,0,20.0,satisfied


In [8]:
df_train.head()

Unnamed: 0.1,Unnamed: 0,id,Gender,Customer Type,Age,Type of Travel,Class,Flight Distance,Inflight wifi service,Departure/Arrival time convenient,...,Inflight entertainment,On-board service,Leg room service,Baggage handling,Checkin service,Inflight service,Cleanliness,Departure Delay in Minutes,Arrival Delay in Minutes,satisfaction
0,0,70172,Male,Loyal Customer,13,Personal Travel,Eco Plus,460,3,4,...,5,4,3,4,4,5,5,25,18.0,neutral or dissatisfied
1,1,5047,Male,disloyal Customer,25,Business travel,Business,235,3,2,...,1,1,5,3,1,4,1,1,6.0,neutral or dissatisfied
2,2,110028,Female,Loyal Customer,26,Business travel,Business,1142,2,2,...,5,4,3,4,4,4,5,0,0.0,satisfied
3,3,24026,Female,Loyal Customer,25,Business travel,Business,562,2,5,...,2,2,5,3,1,4,2,11,9.0,neutral or dissatisfied
4,4,119299,Male,Loyal Customer,61,Business travel,Business,214,3,3,...,3,3,4,4,3,3,3,0,0.0,satisfied
