# Extra Examples - Pivoting

Let's use customer satisfaction for this example: https://www.kaggle.com/johndddddd/customer-satisfaction

We'll load in via `pd.read_excel`, so you might need to run `pip install xlrd` and restart your kernel. It will take a long time to load the data, be warned. If you have any issues, I'll save it out as a csv and include it as well so you can use `read_csv`.

Lets investigate:

1. Assign a numeric ranking to "satisfaction_v2" lényegében egy dummy
2. Pivot to show average satisfaction by gender and class.
3. What is most correlated with satisfaction
4. Are the online features correlated in count?

In [1]:
import pandas as pd
df = pd.read_csv("satisfaction.csv")
#df.to_csv("satisfaction.csv", index=False, float_format="%0.1f")
df.head()

Unnamed: 0,id,satisfaction_v2,Gender,Customer Type,Age,Type of Travel,Class,Flight Distance,Seat comfort,Departure/Arrival time convenient,...,Online support,Ease of Online booking,On-board service,Leg room service,Baggage handling,Checkin service,Cleanliness,Online boarding,Departure Delay in Minutes,Arrival Delay in Minutes
0,11112,satisfied,Female,Loyal Customer,65,Personal Travel,Eco,265,0,0,...,2,3,3,0,3,5,3,2,0,0.0
1,110278,satisfied,Male,Loyal Customer,47,Personal Travel,Business,2464,0,0,...,2,3,4,4,4,2,3,2,310,305.0
2,103199,satisfied,Female,Loyal Customer,15,Personal Travel,Eco,2138,0,0,...,2,2,3,3,4,4,4,2,0,0.0
3,47462,satisfied,Female,Loyal Customer,60,Personal Travel,Eco,623,0,0,...,3,1,1,0,1,4,1,3,0,0.0
4,120011,satisfied,Female,Loyal Customer,70,Personal Travel,Eco,354,0,0,...,4,2,2,0,2,4,2,5,0,0.0


In [2]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 129880 entries, 0 to 129879
Data columns (total 24 columns):
 #   Column                             Non-Null Count   Dtype  
---  ------                             --------------   -----  
 0   id                                 129880 non-null  int64  
 1   satisfaction_v2                    129880 non-null  object 
 2   Gender                             129880 non-null  object 
 3   Customer Type                      129880 non-null  object 
 4   Age                                129880 non-null  int64  
 5   Type of Travel                     129880 non-null  object 
 6   Class                              129880 non-null  object 
 7   Flight Distance                    129880 non-null  int64  
 8   Seat comfort                       129880 non-null  int64  
 9   Departure/Arrival time convenient  129880 non-null  int64  
 10  Food and drink                     129880 non-null  int64  
 11  Gate location                      1298

## Make satisfaction_v2 numeric

In [3]:
# your code here
def dummy(x):
        if x=='satisfied': return 1
        return 0

In [4]:
df['dummy']=df.apply(lambda row:dummy(row['satisfaction_v2']),axis=1)

In [5]:
df['dummy']=[dummy(x) for x in df.satisfaction_v2]

In [6]:
df

Unnamed: 0,id,satisfaction_v2,Gender,Customer Type,Age,Type of Travel,Class,Flight Distance,Seat comfort,Departure/Arrival time convenient,...,Ease of Online booking,On-board service,Leg room service,Baggage handling,Checkin service,Cleanliness,Online boarding,Departure Delay in Minutes,Arrival Delay in Minutes,dummy
0,11112,satisfied,Female,Loyal Customer,65,Personal Travel,Eco,265,0,0,...,3,3,0,3,5,3,2,0,0.0,1
1,110278,satisfied,Male,Loyal Customer,47,Personal Travel,Business,2464,0,0,...,3,4,4,4,2,3,2,310,305.0,1
2,103199,satisfied,Female,Loyal Customer,15,Personal Travel,Eco,2138,0,0,...,2,3,3,4,4,4,2,0,0.0,1
3,47462,satisfied,Female,Loyal Customer,60,Personal Travel,Eco,623,0,0,...,1,1,0,1,4,1,3,0,0.0,1
4,120011,satisfied,Female,Loyal Customer,70,Personal Travel,Eco,354,0,0,...,2,2,0,2,4,2,5,0,0.0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
129875,119211,satisfied,Female,disloyal Customer,29,Personal Travel,Eco,1731,5,5,...,2,3,3,4,4,4,2,0,0.0,1
129876,97768,neutral or dissatisfied,Male,disloyal Customer,63,Personal Travel,Business,2087,2,3,...,3,2,3,3,1,2,1,174,172.0,0
129877,125368,neutral or dissatisfied,Male,disloyal Customer,69,Personal Travel,Eco,2320,3,0,...,4,4,3,4,2,3,2,155,163.0,0
129878,251,neutral or dissatisfied,Male,disloyal Customer,66,Personal Travel,Eco,2450,3,2,...,3,3,2,3,2,1,2,193,205.0,0


## Satisfcation based on gender and class 

In [7]:
# your code here
a=df.groupby(['Gender','Class']).dummy.mean().reset_index()
a.pivot(index="Gender", columns="Class", values="dummy").T

Gender,Female,Male
Class,Unnamed: 1_level_1,Unnamed: 2_level_1
Business,0.720628,0.697997
Eco,0.590799,0.19009
Eco Plus,0.57793,0.258493


In [8]:
df.pivot_table(index="Gender", columns="Class", values="dummy", aggfunc="mean")

Class,Business,Eco,Eco Plus
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,0.720628,0.590799,0.57793
Male,0.697997,0.19009,0.258493


## What is most correlated with satisfaction

In [46]:
# your code here
df.corr().dummy[:-1].sort_values()[-1:]

Inflight entertainment    0.523496
Name: dummy, dtype: float64

## Can we check if Online features have duplicate info?

This is a very open question, so a lot of ways to go about this. Correlation might be one, or
checking the frequency of all of them against each other might be another.

In [58]:
# your code here

Unnamed: 0_level_0,Online boarding,0,1,2,3,4,5
Ease of Online booking,Online support,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,2,4,1,0,0,0,0
0,3,3,0,1,1,1,0
0,4,6,0,0,1,0,0
1,1,0,8955,28,22,21,9
1,2,0,404,220,79,67,7
1,3,0,479,244,404,250,5
1,4,0,436,193,300,488,118
1,5,0,230,10,122,107,238
2,1,0,157,579,38,38,36
2,2,0,149,11225,145,127,25
