# Extra Examples - Pivoting

Let's use customer satisfaction for this example: https://www.kaggle.com/johndddddd/customer-satisfaction

We'll load in via `pd.read_excel`, so you might need to run `pip install xlrd` and restart your kernel. It will take a long time to load the data, be warned. If you have any issues, I'll save it out as a csv and include it as well so you can use `read_csv`.

Lets investigate:

1. Assign a numeric ranking to "satisfaction_v2"
2. Pivot to show average satisfaction by gender and class.
3. What is most correlated with satisfaction
4. Are the online features correlated in count?

In [13]:
import pandas as pd
df = pd.read_excel("satisfaction.xlsx")
df.to_csv("satisfaction.csv", index=False, float_format="%0.1f")
df.head()

Unnamed: 0,id,satisfaction_v2,Gender,Customer Type,Age,Type of Travel,Class,Flight Distance,Seat comfort,Departure/Arrival time convenient,...,Online support,Ease of Online booking,On-board service,Leg room service,Baggage handling,Checkin service,Cleanliness,Online boarding,Departure Delay in Minutes,Arrival Delay in Minutes
0,11112,satisfied,Female,Loyal Customer,65,Personal Travel,Eco,265,0,0,...,2,3,3,0,3,5,3,2,0,0.0
1,110278,satisfied,Male,Loyal Customer,47,Personal Travel,Business,2464,0,0,...,2,3,4,4,4,2,3,2,310,305.0
2,103199,satisfied,Female,Loyal Customer,15,Personal Travel,Eco,2138,0,0,...,2,2,3,3,4,4,4,2,0,0.0
3,47462,satisfied,Female,Loyal Customer,60,Personal Travel,Eco,623,0,0,...,3,1,1,0,1,4,1,3,0,0.0
4,120011,satisfied,Female,Loyal Customer,70,Personal Travel,Eco,354,0,0,...,4,2,2,0,2,4,2,5,0,0.0


In [2]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 129880 entries, 0 to 129879
Data columns (total 24 columns):
 #   Column                             Non-Null Count   Dtype  
---  ------                             --------------   -----  
 0   id                                 129880 non-null  int64  
 1   satisfaction_v2                    129880 non-null  object 
 2   Gender                             129880 non-null  object 
 3   Customer Type                      129880 non-null  object 
 4   Age                                129880 non-null  int64  
 5   Type of Travel                     129880 non-null  object 
 6   Class                              129880 non-null  object 
 7   Flight Distance                    129880 non-null  int64  
 8   Seat comfort                       129880 non-null  int64  
 9   Departure/Arrival time convenient  129880 non-null  int64  
 10  Food and drink                     129880 non-null  int64  
 11  Gate location                      1298

## Make satisfaction_v2 numeric

In [3]:
df.satisfaction_v2.unique()

array(['satisfied', 'neutral or dissatisfied'], dtype=object)

In [14]:
df["satisfaction_binary"] = [1 if s == 'satisfied' else 0 for s in df['satisfaction_v2']] 

In [5]:
df.head()

Unnamed: 0,id,satisfaction_v2,Gender,Customer Type,Age,Type of Travel,Class,Flight Distance,Seat comfort,Departure/Arrival time convenient,...,Ease of Online booking,On-board service,Leg room service,Baggage handling,Checkin service,Cleanliness,Online boarding,Departure Delay in Minutes,Arrival Delay in Minutes,satisfaction_binary
0,11112,satisfied,Female,Loyal Customer,65,Personal Travel,Eco,265,0,0,...,3,3,0,3,5,3,2,0,0.0,1
1,110278,satisfied,Male,Loyal Customer,47,Personal Travel,Business,2464,0,0,...,3,4,4,4,2,3,2,310,305.0,1
2,103199,satisfied,Female,Loyal Customer,15,Personal Travel,Eco,2138,0,0,...,2,3,3,4,4,4,2,0,0.0,1
3,47462,satisfied,Female,Loyal Customer,60,Personal Travel,Eco,623,0,0,...,1,1,0,1,4,1,3,0,0.0,1
4,120011,satisfied,Female,Loyal Customer,70,Personal Travel,Eco,354,0,0,...,2,2,0,2,4,2,5,0,0.0,1


## Satisfcation based on gender and class 

In [8]:
df.pivot_table(index="Class", columns="Gender", values='satisfaction_binary', aggfunc='mean')

Gender,Female,Male
Class,Unnamed: 1_level_1,Unnamed: 2_level_1
Business,0.720628,0.697997
Eco,0.590799,0.19009
Eco Plus,0.57793,0.258493


## What is most correlated with satisfaction

In [12]:
df.corr().satisfaction_binary.sort_values(ascending=False)

satisfaction_binary                  1.000000
Inflight entertainment               0.523496
Ease of Online booking               0.431772
Online support                       0.390143
On-board service                     0.352047
Online boarding                      0.338147
Leg room service                     0.304928
Checkin service                      0.266179
Baggage handling                     0.260347
Cleanliness                          0.259330
Seat comfort                         0.242384
Inflight wifi service                0.227062
Food and drink                       0.120677
Age                                  0.117971
id                                   0.013728
Gate location                       -0.012071
Departure/Arrival time convenient   -0.015507
Flight Distance                     -0.039224
Departure Delay in Minutes          -0.073909
Arrival Delay in Minutes            -0.080691
Name: satisfaction_binary, dtype: float64

## Can we check if Online features have duplicate info?

This is a very open question, so a lot of ways to go about this. Correlation might be one, or
checking the frequency of all of them against each other might be another.

In [24]:
df.corr()['Ease of Online booking'].sort_values()

Arrival Delay in Minutes            -0.039806
Departure Delay in Minutes          -0.036545
Flight Distance                     -0.022299
id                                   0.000830
Gate location                        0.001442
Departure/Arrival time convenient    0.001755
Food and drink                       0.041189
Age                                  0.071594
Checkin service                      0.137744
Seat comfort                         0.211531
Inflight entertainment               0.321731
Leg room service                     0.355122
Baggage handling                     0.398322
Cleanliness                          0.417675
satisfaction_binary                  0.431772
On-board service                     0.436264
Inflight wifi service                0.601100
Online support                       0.617489
Online boarding                      0.684320
Ease of Online booking               1.000000
Name: Ease of Online booking, dtype: float64

In [25]:
df.corr()['Online support'].sort_values()

Arrival Delay in Minutes            -0.036087
Departure Delay in Minutes          -0.034018
Flight Distance                     -0.032022
Departure/Arrival time convenient   -0.000546
Gate location                        0.002908
Food and drink                       0.028554
id                                   0.054023
Cleanliness                          0.095726
Baggage handling                     0.102444
Seat comfort                         0.120278
Age                                  0.121201
Leg room service                     0.138433
On-board service                     0.157930
Checkin service                      0.206824
satisfaction_binary                  0.390143
Inflight entertainment               0.441957
Inflight wifi service                0.557340
Ease of Online booking               0.617489
Online boarding                      0.669843
Online support                       1.000000
Name: Online support, dtype: float64

In [26]:
df.corr()['Online boarding'].sort_values()

Arrival Delay in Minutes            -0.021784
Departure Delay in Minutes          -0.020045
Gate location                       -0.003043
Departure/Arrival time convenient   -0.000623
Flight Distance                      0.009604
Food and drink                       0.013587
id                                   0.025328
Age                                  0.037973
Cleanliness                          0.106238
Baggage handling                     0.111920
Leg room service                     0.112900
Seat comfort                         0.130396
On-board service                     0.139506
Checkin service                      0.184344
satisfaction_binary                  0.338147
Inflight entertainment               0.355714
Inflight wifi service                0.631786
Online support                       0.669843
Ease of Online booking               0.684320
Online boarding                      1.000000
Name: Online boarding, dtype: float64