# "Invistico" Airlines
## Decision Tree Approach to Predicting Customer Satisfaction

---
#### Overview:
Purpose: Determine the most significant features important to customer satisfaction. 

Objective: Build a decision tree model that determines customer satisfaction based on historical flight experience feedback for an anonymized airline.

**Part 1:** EDA & Data Preparation
- Uni/Bi/Multi Variate Analysis
- Feature Engineering, Imputations, Data Cleaning
- Checking Model Assumptions

**Part 2:** Model Building and Evaluation
- Data Preparation - Dummy Encoding, Transformations, Data Leakage Mitigation, Scaling
- Model Training, Testing
- Checking Model Assumptions
- Model Evaluations

**Part 3:** Interpreting Model Results, Conduct Final Model Evaluation
- Results Visualizations
- Model Interpretations
- Full Dataset Predictions and Performance Evaluation
  
**Part 4:** Conclusion/Summary

---

### **Change Log**

Date | Author | Version | Change Desc
--- | --- | --- | ---
2024_0501 | S. Souto | v1 | Initial Version
2024_0813 | S. Souto | v1.1 | Updated format to match portfolio projects

---

### **Data Sources**

1. Original data: Kaggle.com: "Invistico_Airline.csv"

### **Notebook Setup**

In [8]:
# Import packages and libraries
import pandas as pd

# Packages for visualizations

# Packages for date conversions

# Packages for modeling, evaluation

In [9]:
# Notebook setup
pd.set_option('display.max_columns', None)

## Part 1: EDA & Data Preparation

### Data Loading

In [12]:
# Load dataset into dataframe, save copy
df0 = pd.read_csv('data/Invistico_Airline.csv')
df1 = df0.copy()

In [13]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 129880 entries, 0 to 129879
Data columns (total 22 columns):
 #   Column                             Non-Null Count   Dtype  
---  ------                             --------------   -----  
 0   satisfaction                       129880 non-null  object 
 1   Customer Type                      129880 non-null  object 
 2   Age                                129880 non-null  int64  
 3   Type of Travel                     129880 non-null  object 
 4   Class                              129880 non-null  object 
 5   Flight Distance                    129880 non-null  int64  
 6   Seat comfort                       129880 non-null  int64  
 7   Departure/Arrival time convenient  129880 non-null  int64  
 8   Food and drink                     129880 non-null  int64  
 9   Gate location                      129880 non-null  int64  
 10  Inflight wifi service              129880 non-null  int64  
 11  Inflight entertainment             1298

### Initial Exploration

In [15]:
df1.head()

Unnamed: 0,satisfaction,Customer Type,Age,Type of Travel,Class,Flight Distance,Seat comfort,Departure/Arrival time convenient,Food and drink,Gate location,Inflight wifi service,Inflight entertainment,Online support,Ease of Online booking,On-board service,Leg room service,Baggage handling,Checkin service,Cleanliness,Online boarding,Departure Delay in Minutes,Arrival Delay in Minutes
0,satisfied,Loyal Customer,65,Personal Travel,Eco,265,0,0,0,2,2,4,2,3,3,0,3,5,3,2,0,0.0
1,satisfied,Loyal Customer,47,Personal Travel,Business,2464,0,0,0,3,0,2,2,3,4,4,4,2,3,2,310,305.0
2,satisfied,Loyal Customer,15,Personal Travel,Eco,2138,0,0,0,3,2,0,2,2,3,3,4,4,4,2,0,0.0
3,satisfied,Loyal Customer,60,Personal Travel,Eco,623,0,0,0,3,3,4,3,1,1,0,1,4,1,3,0,0.0
4,satisfied,Loyal Customer,70,Personal Travel,Eco,354,0,0,0,3,4,3,4,2,2,0,2,4,2,5,0,0.0


#### Check for missing, duplicate data

In [17]:
# Check for duplicates
print('Shape of dataframe:', df1.shape)
print('Shape of dataframe with duplicates dropped:', df1.drop_duplicates().shape)

Shape of dataframe: (129880, 22)
Shape of dataframe with duplicates dropped: (129880, 22)


In [18]:
# Check for missing values
print('Total count of missing values:', df1.isna().sum().sum())

Total count of missing values: 393


In [19]:
# Display missing values per column in dataframe
print('Missing values per column:')
df1.isna().sum()

Missing values per column:


satisfaction                           0
Customer Type                          0
Age                                    0
Type of Travel                         0
Class                                  0
Flight Distance                        0
Seat comfort                           0
Departure/Arrival time convenient      0
Food and drink                         0
Gate location                          0
Inflight wifi service                  0
Inflight entertainment                 0
Online support                         0
Ease of Online booking                 0
On-board service                       0
Leg room service                       0
Baggage handling                       0
Checkin service                        0
Cleanliness                            0
Online boarding                        0
Departure Delay in Minutes             0
Arrival Delay in Minutes             393
dtype: int64

#### Summary Statistics

In [21]:
# Display descriptive stats
df1.describe(include='all')

Unnamed: 0,satisfaction,Customer Type,Age,Type of Travel,Class,Flight Distance,Seat comfort,Departure/Arrival time convenient,Food and drink,Gate location,Inflight wifi service,Inflight entertainment,Online support,Ease of Online booking,On-board service,Leg room service,Baggage handling,Checkin service,Cleanliness,Online boarding,Departure Delay in Minutes,Arrival Delay in Minutes
count,129880,129880,129880.0,129880,129880,129880.0,129880.0,129880.0,129880.0,129880.0,129880.0,129880.0,129880.0,129880.0,129880.0,129880.0,129880.0,129880.0,129880.0,129880.0,129880.0,129487.0
unique,2,2,,2,3,,,,,,,,,,,,,,,,,
top,satisfied,Loyal Customer,,Business travel,Business,,,,,,,,,,,,,,,,,
freq,71087,106100,,89693,62160,,,,,,,,,,,,,,,,,
mean,,,39.427957,,,1981.409055,2.838597,2.990645,2.851994,2.990422,3.24913,3.383477,3.519703,3.472105,3.465075,3.485902,3.695673,3.340807,3.705759,3.352587,14.713713,15.091129
std,,,15.11936,,,1027.115606,1.392983,1.527224,1.443729,1.30597,1.318818,1.346059,1.306511,1.30556,1.270836,1.292226,1.156483,1.260582,1.151774,1.298715,38.071126,38.46565
min,,,7.0,,,50.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
25%,,,27.0,,,1359.0,2.0,2.0,2.0,2.0,2.0,2.0,3.0,2.0,3.0,2.0,3.0,3.0,3.0,2.0,0.0,0.0
50%,,,40.0,,,1925.0,3.0,3.0,3.0,3.0,3.0,4.0,4.0,4.0,4.0,4.0,4.0,3.0,4.0,4.0,0.0,0.0
75%,,,51.0,,,2544.0,4.0,4.0,4.0,4.0,4.0,4.0,5.0,5.0,4.0,5.0,5.0,4.0,5.0,4.0,12.0,13.0


Preliminary analysis:


### Feature Engineering

### Univariate Analysis

### Imputations

### Bivariate Analysis

### Data Cleaning

### Multivariate Analysis

## Part 2: Model Building and Evaluation

# (One#) Notebook Header:
## (Two#) Notebook Part sections
### (Three#) Part section Heading
#### (Four#) Step Section Grouping
##### (Five#) Step section specific info