# Preppin' Data
## 2024: Week 8 - Prep Air Loyalty
**Created by:** Jenny Martin | [Challenge Link](https://preppindata.blogspot.com/2024/02/2024-week-8-prep-air-loyalty.html)

For this week's challenge, Prep Air have asked for some What If? Analysis. <br>
They're considering 2 different systems for rewarding customer loyalty and want to understand how that might impact cost and how many customers might benefit from the program.

In [1]:
# Input the csv file
import pandas as pd
import numpy as np
customers = pd.read_csv("Prep Air Updated Customers.csv", parse_dates=["First Flight", "Last Date Flown"]).sort_index()
customers

Unnamed: 0,Customer ID,first_name,last_name,email,gender,First Flight,Last Date Flown,Number of Flights
0,1,Denyse,Gebuhr,dgebuhr0@vinaora.com,Female,2023-01-05,2023-01-05,1
1,2,Keene,Devennie,kdevennie1@plala.or.jp,Male,2023-10-05,2023-10-05,1
2,3,Tyler,McGrail,tmcgrail2@nyu.edu,Male,2022-07-18,2023-11-09,27
3,4,Drusi,Ibeson,dibeson3@hostgator.com,Female,2022-05-28,2023-11-22,32
4,5,Stanwood,Seacroft,sseacroft4@wikispaces.com,Male,2022-08-19,2023-12-23,5
...,...,...,...,...,...,...,...,...
9994,9995,Hesther,Braidwood,hbraidwoodrm@reuters.com,Female,2023-09-09,2024-01-19,29
9995,9996,Jelene,Dodgshun,jdodgshunrn@angelfire.com,Female,2022-03-16,2023-09-16,22
9996,9997,Ira,Duff,iduffro@delicious.com,Male,2022-02-08,2023-11-26,32
9997,9998,Yalonda,Carrivick,ycarrivickrp@samsung.com,Female,2022-06-18,2023-03-12,14


In [2]:
# Input the xlsx file
tier = pd.read_excel("Prep Air Loyalty.xlsx", sheet_name="Prep Air Loyalty")
tier

Unnamed: 0,Tier Grouping,Number of Flights,Tier,Benefits
0,5,1-4,Tier 0,
1,5,5-9,Tier 1,Early Seat Selection
2,5,10-14,Tier 2,Free Seat Selection
3,5,15-19,Tier 3,Priority Bag Drop & Boarding
4,5,20-24,Tier 4,First Checked Bag Free
5,5,25-29,Tier 5,First Class Lounge Access
6,5,30+,Tier 6,"First Class Lounge Access for 1 Guest, , £250 ..."
7,10,1-9,Tier 0,
8,10,10-19,Tier 1,"Free Seat Selection, Early Seat Selection"
9,10,20-29,Tier 2,"First Checked Bag Free, Priority Bag Drop & Bo..."


In [3]:
costs = pd.read_excel("Prep Air Loyalty.xlsx", sheet_name="Costings")
costs

Unnamed: 0,Benefit,Cost
0,Early Seat Selection,0
1,Free Seat Selection,£15 per flight
2,First Checked Bag Free,£35 per flight
3,Priority Bag Drop & Boarding,0
4,First Class Lounge Access,£50 per flight
5,First Class Lounge Access for 1 Guest,£50 per flight
6,£250 off a flight each Year,£250 a year


In [4]:
# To be part of either Prep Air Loyalty Scheme, Customers must have flown in the last year (on or after 21st February 2023)
flown_this_year = customers[customers["Last Date Flown"].between("2023-02-21", "2024-02-20")].copy()
flown_this_year.head()

Unnamed: 0,Customer ID,first_name,last_name,email,gender,First Flight,Last Date Flown,Number of Flights
1,2,Keene,Devennie,kdevennie1@plala.or.jp,Male,2023-10-05,2023-10-05,1
2,3,Tyler,McGrail,tmcgrail2@nyu.edu,Male,2022-07-18,2023-11-09,27
3,4,Drusi,Ibeson,dibeson3@hostgator.com,Female,2022-05-28,2023-11-22,32
4,5,Stanwood,Seacroft,sseacroft4@wikispaces.com,Male,2022-08-19,2023-12-23,5
5,6,Kelcey,McCaw,kmccaw5@mlb.com,Agender,2023-02-28,2023-02-28,1


Create a parameter so that the number of flights a customer has taken is either bucketed into groups of 5 or groups to 10 <br>
e.g. if the parameter selected is 5, the groupings will be 1-4, 5-9 etc<br>
if the parameter selected is 10, the groupings will be 1-9, 10-19 etc<br>

Choose one of the parameters:

In [5]:
parameter = 5

In [6]:
# parameter = 10

In [7]:
# Create a field to categorize customers based on the selected parameter, called Tier
flown_this_year["Tier"] = flown_this_year['Number of Flights'] // parameter
flown_this_year["Tier Grouping"] = parameter
flown_this_year.head()

Unnamed: 0,Customer ID,first_name,last_name,email,gender,First Flight,Last Date Flown,Number of Flights,Tier,Tier Grouping
1,2,Keene,Devennie,kdevennie1@plala.or.jp,Male,2023-10-05,2023-10-05,1,0,5
2,3,Tyler,McGrail,tmcgrail2@nyu.edu,Male,2022-07-18,2023-11-09,27,5,5
3,4,Drusi,Ibeson,dibeson3@hostgator.com,Female,2022-05-28,2023-11-22,32,6,5
4,5,Stanwood,Seacroft,sseacroft4@wikispaces.com,Male,2022-08-19,2023-12-23,5,1,5
5,6,Kelcey,McCaw,kmccaw5@mlb.com,Agender,2023-02-28,2023-02-28,1,0,5


In [8]:
# Estimate the average number of flights a customer takes per year
flown_this_year["Years as Customer"] = flown_this_year["Last Date Flown"].dt.year - flown_this_year["First Flight"].dt.year + 1
flown_this_year["Avg Flights per Year"] = flown_this_year["Number of Flights"] / flown_this_year["Years as Customer"]
flown_this_year.head()

Unnamed: 0,Customer ID,first_name,last_name,email,gender,First Flight,Last Date Flown,Number of Flights,Tier,Tier Grouping,Years as Customer,Avg Flights per Year
1,2,Keene,Devennie,kdevennie1@plala.or.jp,Male,2023-10-05,2023-10-05,1,0,5,1,1.0
2,3,Tyler,McGrail,tmcgrail2@nyu.edu,Male,2022-07-18,2023-11-09,27,5,5,2,13.5
3,4,Drusi,Ibeson,dibeson3@hostgator.com,Female,2022-05-28,2023-11-22,32,6,5,2,16.0
4,5,Stanwood,Seacroft,sseacroft4@wikispaces.com,Male,2022-08-19,2023-12-23,5,1,5,2,2.5
5,6,Kelcey,McCaw,kmccaw5@mlb.com,Agender,2023-02-28,2023-02-28,1,0,5,1,1.0


In [9]:
# Filter the Prep Air Loyalty dataset to the selected parameter value
tier[tier["Tier Grouping"] == parameter]
tier

Unnamed: 0,Tier Grouping,Number of Flights,Tier,Benefits
0,5,1-4,Tier 0,
1,5,5-9,Tier 1,Early Seat Selection
2,5,10-14,Tier 2,Free Seat Selection
3,5,15-19,Tier 3,Priority Bag Drop & Boarding
4,5,20-24,Tier 4,First Checked Bag Free
5,5,25-29,Tier 5,First Class Lounge Access
6,5,30+,Tier 6,"First Class Lounge Access for 1 Guest, , £250 ..."
7,10,1-9,Tier 0,
8,10,10-19,Tier 1,"Free Seat Selection, Early Seat Selection"
9,10,20-29,Tier 2,"First Checked Bag Free, Priority Bag Drop & Bo..."


In [10]:
tier["Tier"] = tier["Tier"].str.split(" ").str.get(1)
tier["Tier"] = tier["Tier"].astype(int)
tier

Unnamed: 0,Tier Grouping,Number of Flights,Tier,Benefits
0,5,1-4,0,
1,5,5-9,1,Early Seat Selection
2,5,10-14,2,Free Seat Selection
3,5,15-19,3,Priority Bag Drop & Boarding
4,5,20-24,4,First Checked Bag Free
5,5,25-29,5,First Class Lounge Access
6,5,30+,6,"First Class Lounge Access for 1 Guest, , £250 ..."
7,10,1-9,0,
8,10,10-19,1,"Free Seat Selection, Early Seat Selection"
9,10,20-29,2,"First Checked Bag Free, Priority Bag Drop & Bo..."


Join the Prep Air Loyalty to the Customer dataset in a way that each customer also experiences the benefits of lower Tiers<br>
e.g. a Tier 2 customer gets all the benefits of Tier 0, Tier 1 and Tier 2 

In [11]:
benefits = flown_this_year.merge(tier, how="left", on=["Tier Grouping"])
benefits = benefits[benefits["Tier_y"] <= benefits["Tier_x"]]
benefits = benefits.dropna(subset=["Benefits"])
benefits

Unnamed: 0,Customer ID,first_name,last_name,email,gender,First Flight,Last Date Flown,Number of Flights_x,Tier_x,Tier Grouping,Years as Customer,Avg Flights per Year,Number of Flights_y,Tier_y,Benefits
8,3,Tyler,McGrail,tmcgrail2@nyu.edu,Male,2022-07-18,2023-11-09,27,5,5,2,13.5,5-9,1,Early Seat Selection
9,3,Tyler,McGrail,tmcgrail2@nyu.edu,Male,2022-07-18,2023-11-09,27,5,5,2,13.5,10-14,2,Free Seat Selection
10,3,Tyler,McGrail,tmcgrail2@nyu.edu,Male,2022-07-18,2023-11-09,27,5,5,2,13.5,15-19,3,Priority Bag Drop & Boarding
11,3,Tyler,McGrail,tmcgrail2@nyu.edu,Male,2022-07-18,2023-11-09,27,5,5,2,13.5,20-24,4,First Checked Bag Free
12,3,Tyler,McGrail,tmcgrail2@nyu.edu,Male,2022-07-18,2023-11-09,27,5,5,2,13.5,25-29,5,First Class Lounge Access
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
61965,9999,Netta,Course,ncourserq@weebly.com,Female,2022-03-08,2023-06-29,26,5,5,2,13.0,5-9,1,Early Seat Selection
61966,9999,Netta,Course,ncourserq@weebly.com,Female,2022-03-08,2023-06-29,26,5,5,2,13.0,10-14,2,Free Seat Selection
61967,9999,Netta,Course,ncourserq@weebly.com,Female,2022-03-08,2023-06-29,26,5,5,2,13.0,15-19,3,Priority Bag Drop & Boarding
61968,9999,Netta,Course,ncourserq@weebly.com,Female,2022-03-08,2023-06-29,26,5,5,2,13.0,20-24,4,First Checked Bag Free


In [12]:
# Join on the Costing dataset
benefit_costs = benefits.merge(costs, how="left", left_on="Benefits", right_on="Benefit").drop(columns=["Benefit", "Number of Flights_y", "Tier_x"])
benefit_costs.head()

Unnamed: 0,Customer ID,first_name,last_name,email,gender,First Flight,Last Date Flown,Number of Flights_x,Tier Grouping,Years as Customer,Avg Flights per Year,Tier_y,Benefits,Cost
0,3,Tyler,McGrail,tmcgrail2@nyu.edu,Male,2022-07-18,2023-11-09,27,5,2,13.5,1,Early Seat Selection,0
1,3,Tyler,McGrail,tmcgrail2@nyu.edu,Male,2022-07-18,2023-11-09,27,5,2,13.5,2,Free Seat Selection,£15 per flight
2,3,Tyler,McGrail,tmcgrail2@nyu.edu,Male,2022-07-18,2023-11-09,27,5,2,13.5,3,Priority Bag Drop & Boarding,0
3,3,Tyler,McGrail,tmcgrail2@nyu.edu,Male,2022-07-18,2023-11-09,27,5,2,13.5,4,First Checked Bag Free,£35 per flight
4,3,Tyler,McGrail,tmcgrail2@nyu.edu,Male,2022-07-18,2023-11-09,27,5,2,13.5,5,First Class Lounge Access,£50 per flight


In [13]:
# formatting the Cost column to be just values
benefit_costs["Cost"] = benefit_costs["Cost"].str.split(" ").str.get(0).str.lstrip('£').fillna(0).astype(int)
benefit_costs.head()

Unnamed: 0,Customer ID,first_name,last_name,email,gender,First Flight,Last Date Flown,Number of Flights_x,Tier Grouping,Years as Customer,Avg Flights per Year,Tier_y,Benefits,Cost
0,3,Tyler,McGrail,tmcgrail2@nyu.edu,Male,2022-07-18,2023-11-09,27,5,2,13.5,1,Early Seat Selection,0
1,3,Tyler,McGrail,tmcgrail2@nyu.edu,Male,2022-07-18,2023-11-09,27,5,2,13.5,2,Free Seat Selection,15
2,3,Tyler,McGrail,tmcgrail2@nyu.edu,Male,2022-07-18,2023-11-09,27,5,2,13.5,3,Priority Bag Drop & Boarding,0
3,3,Tyler,McGrail,tmcgrail2@nyu.edu,Male,2022-07-18,2023-11-09,27,5,2,13.5,4,First Checked Bag Free,35
4,3,Tyler,McGrail,tmcgrail2@nyu.edu,Male,2022-07-18,2023-11-09,27,5,2,13.5,5,First Class Lounge Access,50


Calculate the Yearly Cost of each Benefit
e.g. if the Benefit Cost is per flight tahen make sure to multiply it by the Avg Number of Flights that customer takes in a year

In [14]:
flight_benefits = [
    "Early Seat Selection",
    "First Checked Bag Free",
    "Free Seat Selection",
    "Priority Bag Drop & Boarding"
]

def yearly_costs(customer_benefits):
    if customer_benefits["Benefits"] in flight_benefits:
        return customer_benefits["Cost"] * customer_benefits["Avg Flights per Year"]
    else:
        return customer_benefits["Cost"]

benefit_costs["Yearly Cost"] = benefit_costs.apply(yearly_costs, axis=1)
benefit_costs

Unnamed: 0,Customer ID,first_name,last_name,email,gender,First Flight,Last Date Flown,Number of Flights_x,Tier Grouping,Years as Customer,Avg Flights per Year,Tier_y,Benefits,Cost,Yearly Cost
0,3,Tyler,McGrail,tmcgrail2@nyu.edu,Male,2022-07-18,2023-11-09,27,5,2,13.5,1,Early Seat Selection,0,0.0
1,3,Tyler,McGrail,tmcgrail2@nyu.edu,Male,2022-07-18,2023-11-09,27,5,2,13.5,2,Free Seat Selection,15,202.5
2,3,Tyler,McGrail,tmcgrail2@nyu.edu,Male,2022-07-18,2023-11-09,27,5,2,13.5,3,Priority Bag Drop & Boarding,0,0.0
3,3,Tyler,McGrail,tmcgrail2@nyu.edu,Male,2022-07-18,2023-11-09,27,5,2,13.5,4,First Checked Bag Free,35,472.5
4,3,Tyler,McGrail,tmcgrail2@nyu.edu,Male,2022-07-18,2023-11-09,27,5,2,13.5,5,First Class Lounge Access,50,50.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
22283,9999,Netta,Course,ncourserq@weebly.com,Female,2022-03-08,2023-06-29,26,5,2,13.0,1,Early Seat Selection,0,0.0
22284,9999,Netta,Course,ncourserq@weebly.com,Female,2022-03-08,2023-06-29,26,5,2,13.0,2,Free Seat Selection,15,195.0
22285,9999,Netta,Course,ncourserq@weebly.com,Female,2022-03-08,2023-06-29,26,5,2,13.0,3,Priority Bag Drop & Boarding,0,0.0
22286,9999,Netta,Course,ncourserq@weebly.com,Female,2022-03-08,2023-06-29,26,5,2,13.0,4,First Checked Bag Free,35,455.0


In [15]:
# Total up the Yearly Cost for each Tier

tier_pivot = benefit_costs.pivot_table(index="Tier_y", values="Yearly Cost", aggfunc="sum")
tier_pivot.head()

Unnamed: 0_level_0,Yearly Cost
Tier_y,Unnamed: 1_level_1
1,0.0
2,1075660.0
3,0.0
4,1739372.0
5,95950.0


In [16]:
# and count the Number of Customers in Each Tier

tier_pivot_2 = benefit_costs.pivot_table(index="Tier_y", values="Customer ID", aggfunc="nunique").rename(columns={"Customer ID": "Number of Customers"})
tier_pivot_2.head()

Unnamed: 0_level_0,Number of Customers
Tier_y,Unnamed: 1_level_1
1,6764
2,5526
3,4309
4,3089
5,1919


## Output 1
#### 5 Tier Grouping Output
<br>
6 rows<br>
3 fields:<br>
- Tier<br>
- Year Cost<br>
- Number of Customers<br>

In [17]:
# joining both pivot tables
# Output the data

output = tier_pivot.merge(tier_pivot_2, how="left", left_index=True, right_index=True)
output.index.name = "Tier"
output

Unnamed: 0_level_0,Yearly Cost,Number of Customers
Tier,Unnamed: 1_level_1,Unnamed: 2_level_1
1,0.0,6764
2,1075660.0,5526
3,0.0,4309
4,1739372.0,3089
5,95950.0,1919
6,0.0,681


## Output 2
#### 10 Tier Grouping Output
<br>
3 rows<br>
3 fields:<br>
- Tier<br>
- Yearly Cost<br>
- Number of Customers<br>

In [18]:
# Generating csv output file
output.to_csv("output-202408.csv")