## 2021: Week 23 - NPS for Airlines

This week Prep Air are looking into their Net Promoter Score (NPS) and how this compares with a variety of other new airlines. NPS usually takes the form of asking customers "How likely are you to recommend this company on a scale of 0-10?" You then subtract the detractors of your company from the promoters and end up with a score between -100 and +100. The higher the NPS, the better!

However, like most metrics, on its own it doesn't tell you a lot. Do customers feel strongly one way or the other about any airlines? So it would be good to compare Prep Air's NPS with other airline's Net Promoter Scores too! In this challenge we'll use Z-Scores to standardise the scores and see whether Prep Air is above or below average.

### Input
1. Prep Air's customer ratings:

![img](https://lh3.googleusercontent.com/-ADGamsiZ3x8/YKFA_jX_tUI/AAAAAAAAA0M/bLB5P05KjqYqlnYFXMRcUfBcrW_gm9sdwCLcBGAsYHQ/image.png)

2. Other Airlines customer ratings

### Requirement
- Input the data
- Combine Prep Air dataset with other airlines
- Exclude any airlines who have had less than 50 customers respond
- Classify customer responses to the question in the following way:
    - 0-6 = Detractors
    - 7-8 = Passive
    - 9-10 = Promoters
- Calculate the NPS for each airline
    - NPS = % Promoters - % Detractors
        - Note: I rounded the %s down to the nearest whole number, so if your answer differs slightly from mine then this could be why! 
- Calculate the average and standard deviation of the dataset
- Take each airline's NPS and subtract the average, then divide this by the standard deviation
- Filter to just show Prep Air's NPS along with their Z-Score
- Output the data

### Output
![img](https://lh3.googleusercontent.com/-d9BUQRa31c4/YKFB3mZbWTI/AAAAAAAAA0U/cA2n_CNs34I_zLqvdBWOFRSkiMZ6WyqXgCLcBGAsYHQ/w200-h64/image.png)

3 fields
- Airline
- NPS
- Z-Score

1 row (2 including headers)

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Input the data

In [3]:
data = pd.read_excel("./data/NPS Input.xlsx", sheet_name=["Airlines", "Prep Air"])
airlines = data["Airlines"].copy()
prep = data["Prep Air"].copy()

In [8]:
airlines.head()

Unnamed: 0,Airline,CustomerID,How likely are you to recommend this airline?
0,"Schmeler, Schimmel and Collier",013d950,6
1,"Schmeler, Schimmel and Collier",0d25185,10
2,"Schmeler, Schimmel and Collier",a1b541d,10
3,"Schmeler, Schimmel and Collier",6b24ea8,9
4,"Schmeler, Schimmel and Collier",d5f96ab,7


In [6]:
### Combine Prep Air dataset with other airlines

In [10]:
airlines = pd.concat([airlines, prep], axis=0)
airlines.shape

(5907, 3)

In [None]:
### Exclude any airlines who have had less than 50 customers respond

In [28]:
grouped = airlines.groupby(["Airline"])["CustomerID"].count()
less_than_50 = list(grouped[grouped < 50].index)
airlines = airlines.loc[~airlines["Airline"].isin(less_than_50), :]
airlines["Airline"].value_counts().tail()

Toy, Bartell and Williamson    52
Satterfield Inc                52
Hamill, Koepp and Robel        52
Simonis Inc                    51
Walsh LLC                      50
Name: Airline, dtype: int64

In [None]:
### Classify customer responses to the question in the following way:

In [51]:
def nps_classification(x):
    if x <= 6:
        return "Detractors"
    elif (x >=7) & (x <= 8):
        return "Passive"
    else:
        return "Promoters"

In [52]:
airlines = airlines.rename(columns={"How likely are you to recommend this airline?": "Score"})
airlines["NPS"] = airlines["Score"].map(lambda x: nps_classification(x))
airlines["NPS"].value_counts()

Passive       1770
Promoters     1739
Detractors    1253
Name: NPS, dtype: int64

### Calculate the NPS for each airline
- NPS = % Promoters - % Detractors
- Note: I rounded the %s down to the nearest whole number, so if your answer differs slightly from mine then this could be why! 

In [61]:
grouped = airlines.groupby(["Airline"])

Passive       75
Promoters     74
Detractors    35
Name: NPS, dtype: int64

In [73]:
detractors = grouped.get_group("Prep Air")["NPS"].value_counts().sort_index()[0]
promoters = grouped.get_group("Prep Air")["NPS"].value_counts().sort_index()[2]
total = grouped.get_group("Prep Air")["NPS"].value_counts().sum()
np.round((promoters / total) - (detractors / total), 2) * 100

21.0

In [76]:
def calculate_NPS_airline(df_):
    list_airline = list(df_["Airline"].value_counts().index)
    name_nps = {}
    group = df_.groupby(["Airline"])
    for name in list_airline:
        detractors = group.get_group(name)["NPS"].value_counts().sort_index()[0]
        promoters = group.get_group(name)["NPS"].value_counts().sort_index()[2]
        total = group.get_group(name)["NPS"].value_counts().sum()
        nps = np.round((promoters / total) - (detractors / total), 2) * 100
        name_nps[name] = nps
    return name_nps

In [87]:
name_nps = calculate_NPS_airline(airlines)
name_nps = pd.Series(name_nps).sort_values(ascending=False)
name_nps.head(10)

Maggio Group                      39.0
Kautzer-Langworth                 33.0
Mueller Group                     28.0
Stracke and Sons                  25.0
Farrell and Sons                  25.0
Crist Group                       22.0
Schmeler, Schimmel and Collier    22.0
Prep Air                          21.0
Von, Brown and Frami              20.0
Goyette Inc                       20.0
dtype: float64

In [89]:
### Calculate the average and standard deviation of the dataset

In [92]:
nps_mean = name_nps.mean()
nps_std = name_nps.std()

In [94]:
### Take each airline's NPS and subtract the average, then divide this by the standard deviation

In [109]:
z_score = (name_nps - nps_mean).div(nps_std).round(3)
output = pd.concat([name_nps, z_score], axis=1).reset_index().rename(columns={"index": "Airline", 0: "NPS",
                                                                     1: "Z-Score"})
output.head()

Unnamed: 0,Airline,NPS,Z-Score
0,Maggio Group,39.0,2.956
1,Kautzer-Langworth,33.0,2.339
2,Mueller Group,28.0,1.825
3,Stracke and Sons,25.0,1.517
4,Farrell and Sons,25.0,1.517


In [None]:
### Filter to just show Prep Air's NPS along with their Z-Score

In [116]:
prep_air = output.loc[output["Airline"] == "Prep Air", :]
prep_air

Unnamed: 0,Airline,NPS,Z-Score
7,Prep Air,21.0,1.106


In [None]:
### Output the data

In [None]:
prer_air.to_csv("./")