<a href="https://colab.research.google.com/github/swapnalishamrao/Supervised_ML_Classification_Project/blob/main/Supervised_ML_Classification_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <font size='8px'><font color='indigo'><u>**Airline_Passenger_Referral_Prediction**

##### **Project Type**    - Supervised ML Classification
##### **Contribution**    - Individual
##### **Name**            - Swapnali Shamrao Mane

# <font size='6px'><font color='red'><u>**Project Summary & Technical Documentation-**

Air travel changed the world by connecting people globally in record time, making it a standout invention of the 1900s. Its main strength, speed, turned it into an essential way to transport both goods and people.

In the fast-paced world of air travel, where making passengers happy is crucial, airlines need to predict which passengers will recommend them to others. Knowing which passengers are likely to tell their friends and networks about an airline can significantly improve customer satisfaction and increase profits.

###**Steps**:

 **Getting Data**

 **Cleaning & Preprocessing Data**:

 This involves fixing missing information, handling extreme values, and making data easier to use for analysis.

 **Exploring Data Analysis(EDA)**:

  This means looking at the data using different graphs and charts.

**Dividing the Data**:

Splitting it into parts for training and testing.

**Choosing Models and Hyperparameter Tuning**:

To make accurate predictions, various classification models are used, such as Logistic Regression, Random Forests, and Support Vector Machines (SVM). . Ensuring model reliability, hyperparameter tuning is performed to optimize performance and mitigate overfitting of these models helps make them work better.

**Measuring Performance**:

 This focuses on metrics that tell how well the models work. The most important is Recall, followed by Accuracy and ROC AUC. These metrics show how well the models can correctly identify passengers who will recommend airlines. This is essential for making customer-focused strategies.

#<font size='4px'><font color='3971'><u> **GitHub Link -**

[GitHub Link](https://github.com/swapnalishamrao/Supervised_ML_Classification_Project)

# <font size='6px'><font color='paintgray'><u>**Problem Statement-**


In the competitive airline industry, pleasing customers and keeping them loyal is crucial for success. Airlines are always looking for new ways to make passengers happier and improve their reputation. One big challenge is figuring out which passengers are likely to recommend the airline to others.

The goal is to create a model that can predict which passengers will refer the airline to their friends. This model will help airlines:

  -Make customers happier

  -Make more money

  -Advertise more effectively

  -Provide better service
Stay ahead of the competition



# <font size='6px'><font color='indi'>***Let's Begin !***

## ***1. Know Your Data***

## <font size='5px'><font color='indigo'>**Importing Library and connecting drive**

In [1]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import warnings
warnings.filterwarnings('ignore')
warnings.simplefilter('ignore')

# Importing all models from sklearn to be used in model building
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.svm import SVC

from sklearn import model_selection
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV
from sklearn.svm import LinearSVC
from statsmodels.stats.outliers_influence import variance_inflation_factor

# Importing  metrics for evaluation of models
from sklearn import metrics
from sklearn.metrics import classification_report,confusion_matrix
from sklearn.metrics import accuracy_score,precision_score
from sklearn.metrics import recall_score,f1_score,roc_curve, roc_auc_score

## **Dataset Loading**

In [2]:
# Mounting drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
#load the dataset from drive
df = pd.read_excel("/content/drive/MyDrive/data_airline_reviews.xlsx")

##<font size='5px'><font color='darkorange'> **Dataset First View**

In [4]:
# First Five Observations
df.head()

Unnamed: 0,airline,overall,author,review_date,customer_review,aircraft,traveller_type,cabin,route,date_flown,seat_comfort,cabin_service,food_bev,entertainment,ground_service,value_for_money,recommended
0,,,,,,,,,,,,,,,,,
1,Turkish Airlines,7.0,Christopher Hackley,8th May 2019,âœ… Trip Verified | London to Izmir via Istanb...,,Business,Economy Class,London to Izmir via Istanbul,2019-05-01 00:00:00,4.0,5.0,4.0,4.0,2.0,4.0,yes
2,,,,,,,,,,,,,,,,,
3,Turkish Airlines,2.0,Adriana Pisoi,7th May 2019,âœ… Trip Verified | Istanbul to Bucharest. We ...,,Family Leisure,Economy Class,Istanbul to Bucharest,2019-05-01 00:00:00,4.0,1.0,1.0,1.0,1.0,1.0,no
4,,,,,,,,,,,,,,,,,


In [5]:
# Last five observations
df.tail()

Unnamed: 0,airline,overall,author,review_date,customer_review,aircraft,traveller_type,cabin,route,date_flown,seat_comfort,cabin_service,food_bev,entertainment,ground_service,value_for_money,recommended
131890,Ukraine International,,Andriy Yesypenko,19th May 2006,Kiev - London (Gatwick) in business class (in ...,,,,,,,,,,,,no
131891,,,,,,,,,,,,,,,,,
131892,Ukraine International,,Volodya Bilotkach,29th April 2006,Several flights - KBP to AMS (3 times one way)...,,,,,,,,,,,,no
131893,,,,,,,,,,,,,,,,,
131894,Ukraine International,,Kasper Hettinga,10th February 2006,KBP-AMS with UIA. Although it was a relatively...,,,,,,,,,,,,no


##<font size='5px'><font color='#skyblue'> **Data Inispection**

In [6]:
# Checking shape of the dataset
df.shape

(131895, 17)

Dataset having 131895 observations and 17 columns.

In [7]:
# Checking columns name of dataset
df.columns

Index(['airline', 'overall', 'author', 'review_date', 'customer_review',
       'aircraft', 'traveller_type', 'cabin', 'route', 'date_flown',
       'seat_comfort', 'cabin_service', 'food_bev', 'entertainment',
       'ground_service', 'value_for_money', 'recommended'],
      dtype='object')

##<font size='5px'><font color='deeppink'>**Dataset Information**

In [8]:
# Dataset Info
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 131895 entries, 0 to 131894
Data columns (total 17 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   airline          65947 non-null  object 
 1   overall          64017 non-null  float64
 2   author           65947 non-null  object 
 3   review_date      65947 non-null  object 
 4   customer_review  65947 non-null  object 
 5   aircraft         19718 non-null  object 
 6   traveller_type   39755 non-null  object 
 7   cabin            63303 non-null  object 
 8   route            39726 non-null  object 
 9   date_flown       39633 non-null  object 
 10  seat_comfort     60681 non-null  float64
 11  cabin_service    60715 non-null  float64
 12  food_bev         52608 non-null  float64
 13  entertainment    44193 non-null  float64
 14  ground_service   39358 non-null  float64
 15  value_for_money  63975 non-null  float64
 16  recommended      64440 non-null  object 
dtypes: float64

1.In the dataset, there are object, float64 dtypes features present.

2.Dataset is having numerical and categorical data.

3.Mixed data (Numerical+Categorical)

4.There are 17 features

In [9]:
# Basic description of Dataset
df.describe(include='all')

Unnamed: 0,airline,overall,author,review_date,customer_review,aircraft,traveller_type,cabin,route,date_flown,seat_comfort,cabin_service,food_bev,entertainment,ground_service,value_for_money,recommended
count,65947,64017.0,65947,65947,65947,19718,39755,63303,39726,39633,60681.0,60715.0,52608.0,44193.0,39358.0,63975.0,64440
unique,81,,44069,3015,61172,2088,4,4,24549,63,,,,,,,2
top,Spirit Airlines,,Anders Pedersen,19th January 2015,On March 2/14 a friend and I were booked on an...,A320,Solo Leisure,Economy Class,Bangkok to Hong Kong,August 2015,,,,,,,no
freq,2934,,96,253,6,2157,14798,48558,35,1204,,,,,,,33894
mean,,5.14543,,,,,,,,,2.95216,3.191814,2.90817,2.863372,2.69282,2.943962,
std,,3.477532,,,,,,,,,1.441362,1.565789,1.481893,1.507262,1.612215,1.58737,
min,,1.0,,,,,,,,,1.0,1.0,1.0,1.0,1.0,1.0,
25%,,1.0,,,,,,,,,1.0,2.0,1.0,1.0,1.0,1.0,
50%,,5.0,,,,,,,,,3.0,3.0,3.0,3.0,3.0,3.0,
75%,,9.0,,,,,,,,,4.0,5.0,4.0,4.0,4.0,4.0,
