## 2021: Week 14 - Prep Air In-Flight Purchases

We are revisiting our Prep Air Airline this week by looking at some flight details and trying to provide some data driven answers. As part of running an airline we are always interested in how successful the in-flight service is and if we can make any improvements to try to boost sales. 

For this week's challenge we have been provided with a selection of different data sources and we want to combine them to answer some questions that will help us to understand some purchasing patterns on the flights. 

### Input

- Passenger List

    A list of all the passengers from a selection of flights. This includes their name, a passenger number, the flight number and the total they purchased whilst on the flight. Note, not all flights are full and can hold a maximum of 120 passengers.


- Seat List

    A mapping of where each passenger sits within each flight. This is the same for all flights across our fleet and includes the row number and seat letter within each row.


- Flight Details

    These are details about the flight number, departure and arrival times.


- Plane Details

    This documents where the business class section is on each of the planes. We provide free in-flight purchases for Business Class passengers.

### Requirement
- Input the Data
- Assign a label for where each seat is located. 
- They are assigned as follows:
    - A & F - Window Seats
    - B & E - Middle Seats
    - C & D - Aisle Seats 
- Combine the Seat List and Passenger List tables. 

- Parse the Flight Details so that they are in separate fields 

- Calculate the time of day for each flight.
- They are assigned as follows: 
    - Morning - Before 12:00 
    - Afternoon - Between 12:00 - 18:00
    - Evening - After 18:00

- Join the Flight Details & Plane Details to the Passenger & Seat tables. We should be able to identify what rows are Business or Economy Class for each flight. 

- Answer the following questions: 
    1. What time of day were the most purchases made? We want to take a look at the average for the flights within each time period. 
    2. What seat position had the highest purchase amount? Is the aisle seat the highest earner because it's closest to the trolley?
    3. As Business Class purchases are free, how much is this costing us? 

- Bonus: If you have Tableau Prep 2021.1 or later, you can now output to Excel files. Can you combine all of the outputs into a single Excel workbook, with a different sheet for each output?

### Output

1. What time of day were the most purchases made? (Avg per flight)
2. What seat position had the highest purchase amount?
3. Business class purchases are free. How much is this costing us?

In [465]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Input the data

In [466]:
data = pd.read_excel("./data/Week 14 - Input.xlsx", sheet_name=["Passenger List", "SeatList",
                                                                "FlightDetails", "PlaneDetails"])

In [467]:
passenger_list = data["Passenger List"].copy()
seat_list = data["SeatList"].copy()
flight_details = data["FlightDetails"].copy()
plane_details = data["PlaneDetails"].copy()

In [468]:
passenger_list = passenger_list.dropna(axis=1)
passenger_list.head(10)

Unnamed: 0,first_name,last_name,passenger_number,flight_number,purchase_amount
0,Jerrylee,Rein,1,1,48.29
1,Forester,Iashvili,2,1,0.0
2,Shaun,Sherwill,3,1,0.0
3,Werner,Basile,4,1,58.21
4,Kerwinn,Skillen,5,1,41.96
5,Rockey,Grafton-Herbert,6,1,53.89
6,Barbara,Garstang,7,1,0.0
7,Clyve,Anderson,8,1,0.0
8,Isiahi,Roycroft,9,1,38.07
9,Alvis,Burdus,10,1,45.41


### Assign a label for where each seat is located

In [469]:
a = seat_list["A"]
f = seat_list["F"]
window_seat = pd.concat([a, f], axis=0).to_frame(name="Window Seats").reset_index(drop=True)

b = seat_list["B"]
e = seat_list["E"]
middle_seat = pd.concat([b, e], axis=0).to_frame(name="Middle Seats").reset_index(drop=True)

c = seat_list["C"]
d = seat_list["D"]
aisle_seat = pd.concat([c, d], axis=0).to_frame(name="Aisle Seats").reset_index(drop=True)

In [470]:
seat_list = pd.concat([window_seat, middle_seat, aisle_seat], axis=1)
seat_list.columns

Index(['Window Seats', 'Middle Seats', 'Aisle Seats'], dtype='object')

### Combine the Seat List and Passenger List tables

In [471]:
passenger_list.loc[passenger_list["passenger_number"].isin(seat_list["Window Seats"]), "Seat Position"] = "Window"
passenger_list.loc[passenger_list["passenger_number"].isin(seat_list["Middle Seats"]), "Seat Position"] = "Middle"
passenger_list.loc[passenger_list["passenger_number"].isin(seat_list["Aisle Seats"]), "Seat Position"] = "Aisle"
passenger_list.sample(10)

Unnamed: 0,first_name,last_name,passenger_number,flight_number,purchase_amount,Seat Position
542,Alison,Sphinxe,73,6,57.62,Window
613,Jannelle,Baniard,24,7,55.09,Window
403,Laryssa,Hallows,10,5,0.0,Aisle
538,Carolyne,Musso,69,6,17.73,Aisle
666,Whitby,Ladley,77,7,19.33,Middle
812,Haze,Golder,15,9,0.0,Aisle
775,Mitchel,Poppleton,79,8,18.41,Window
805,Vasilis,Wallage,8,9,4.66,Middle
425,Lyda,Spearett,32,5,0.0,Middle
482,Lilas,Brugger,13,6,27.32,Window


### Parse the Flight Details so that they are in separate fields

In [472]:
flight_details["Flight ID"] = flight_details["[FlightID|DepAir|ArrAir|DepDate|DepTime]"].map(lambda x: x.split("|")[0].replace("[", ""))
flight_details["DepAir"] = flight_details["[FlightID|DepAir|ArrAir|DepDate|DepTime]"].map(lambda x: x.split("|")[1])
flight_details["ArrAir"] = flight_details["[FlightID|DepAir|ArrAir|DepDate|DepTime]"].map(lambda x: x.split("|")[2])
flight_details["DepDate"] = flight_details["[FlightID|DepAir|ArrAir|DepDate|DepTime]"].map(lambda x: x.split("|")[3])
flight_details["DepTime"] = flight_details["[FlightID|DepAir|ArrAir|DepDate|DepTime]"].map(lambda x: x.split("|")[4].replace("]", ""))
flight_details = flight_details.drop("[FlightID|DepAir|ArrAir|DepDate|DepTime]", axis=1)
flight_details["Flight ID"] = flight_details["Flight ID"].astype(int)
flight_details

Unnamed: 0,Flight ID,DepAir,ArrAir,DepDate,DepTime
0,1,LHR,SEA,2020-10-08,14:53:00
1,2,MTY,JFK,2020-12-03,06:51:00
2,3,SEA,BOS,2020-11-21,20:45:00
3,4,LHR,BOS,2020-10-31,21:01:00
4,5,MTY,CAI,2020-12-07,09:33:00
5,6,JFK,LHR,2020-11-10,05:05:00
6,7,TPE,LHR,2020-12-01,14:22:00
7,8,BOS,SEA,2020-12-26,16:45:00
8,9,JFK,LHR,2020-10-23,19:06:00
9,10,CAI,LHR,2020-12-22,10:54:00


### Calculate the time of day for each flight

In [473]:
passenger_list = passenger_list.merge(flight_details, how="left", 
                                      left_on="flight_number", right_on="Flight ID")
passenger_list.head(10)

Unnamed: 0,first_name,last_name,passenger_number,flight_number,purchase_amount,Seat Position,Flight ID,DepAir,ArrAir,DepDate,DepTime
0,Jerrylee,Rein,1,1,48.29,Window,1,LHR,SEA,2020-10-08,14:53:00
1,Forester,Iashvili,2,1,0.0,Middle,1,LHR,SEA,2020-10-08,14:53:00
2,Shaun,Sherwill,3,1,0.0,Aisle,1,LHR,SEA,2020-10-08,14:53:00
3,Werner,Basile,4,1,58.21,Aisle,1,LHR,SEA,2020-10-08,14:53:00
4,Kerwinn,Skillen,5,1,41.96,Middle,1,LHR,SEA,2020-10-08,14:53:00
5,Rockey,Grafton-Herbert,6,1,53.89,Window,1,LHR,SEA,2020-10-08,14:53:00
6,Barbara,Garstang,7,1,0.0,Window,1,LHR,SEA,2020-10-08,14:53:00
7,Clyve,Anderson,8,1,0.0,Middle,1,LHR,SEA,2020-10-08,14:53:00
8,Isiahi,Roycroft,9,1,38.07,Aisle,1,LHR,SEA,2020-10-08,14:53:00
9,Alvis,Burdus,10,1,45.41,Aisle,1,LHR,SEA,2020-10-08,14:53:00


In [474]:
morning = flight_details["DepTime"] < "12:00:00"
afternoon = (flight_details["DepTime"] >= "12:00:00") & (flight_details["DepTime"] <= "18:00:00")
evening = flight_details["DepTime"] > "18:00:00"

In [475]:
flight_details.loc[morning, "DepartTime"] = "Morning"
flight_details.loc[afternoon, "DepartTime"] = "Afternoon"
flight_details.loc[evening, "DepartTime"] = "Evening"

### Join the Flight Details & Plane Details to the Passenger & Seat tables. 
### We should be able to identify what rows are Business or Economy Class for each flight

In [476]:
passenger_list = passenger_list.merge(plane_details, how="left", 
                                      left_on="flight_number", right_on="FlightNo.")
passenger_list = passenger_list.merge(flight_details[["Flight ID", "DepartTime"]], how="left", on="Flight ID")
passenger_list = passenger_list.drop(["FlightNo.", "Flight ID"], axis=1)
passenger_list

Unnamed: 0,first_name,last_name,passenger_number,flight_number,purchase_amount,Seat Position,DepAir,ArrAir,DepDate,DepTime,Business Class,DepartTime
0,Jerrylee,Rein,1,1,48.29,Window,LHR,SEA,2020-10-08,14:53:00,1-5,Afternoon
1,Forester,Iashvili,2,1,0.00,Middle,LHR,SEA,2020-10-08,14:53:00,1-5,Afternoon
2,Shaun,Sherwill,3,1,0.00,Aisle,LHR,SEA,2020-10-08,14:53:00,1-5,Afternoon
3,Werner,Basile,4,1,58.21,Aisle,LHR,SEA,2020-10-08,14:53:00,1-5,Afternoon
4,Kerwinn,Skillen,5,1,41.96,Middle,LHR,SEA,2020-10-08,14:53:00,1-5,Afternoon
...,...,...,...,...,...,...,...,...,...,...,...,...
995,Skye,McLaverty,106,10,10.46,Aisle,CAI,LHR,2020-12-22,10:54:00,1-5,Morning
996,Margaux,Rymour,107,10,0.00,Middle,CAI,LHR,2020-12-22,10:54:00,1-5,Morning
997,Corny,Vaszoly,108,10,44.60,Window,CAI,LHR,2020-12-22,10:54:00,1-5,Morning
998,Vittorio,Rushbrook,109,10,0.00,Window,CAI,LHR,2020-12-22,10:54:00,1-5,Morning


In [477]:
passenger_list["Business Class"].value_counts()

1-5     606
1-2     107
1-8     104
1-3     101
1-10     82
Name: Business Class, dtype: int64

In [478]:
from_ = plane_details["Business Class"].map(lambda x:x[0]).astype(int).values
to_ = plane_details["Business Class"].map(lambda x:x.split("-")[-1]).astype(int).values + 1
to_

array([ 6,  9, 11,  6,  6,  6,  3,  4,  6,  6])

In [479]:
grouped = passenger_list.groupby(["flight_number"])
results = []
for i in range(0, 10):
    tmp = grouped.get_group(i+1)
    index_ = tmp["passenger_number"].isin(np.arange(from_[i], to_[i]))
    tmp.loc[index_, "Business_class_check"] = True
    results.append(tmp)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = infer_fill_value(value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(loc, value, pi)


In [480]:
final_output = pd.concat(results)
final_output = final_output.drop("Business Class", axis=1)
final_output["Business_class_check"] = final_output["Business_class_check"].fillna(False)
final_output

Unnamed: 0,first_name,last_name,passenger_number,flight_number,purchase_amount,Seat Position,DepAir,ArrAir,DepDate,DepTime,DepartTime,Business_class_check
0,Jerrylee,Rein,1,1,48.29,Window,LHR,SEA,2020-10-08,14:53:00,Afternoon,True
1,Forester,Iashvili,2,1,0.00,Middle,LHR,SEA,2020-10-08,14:53:00,Afternoon,True
2,Shaun,Sherwill,3,1,0.00,Aisle,LHR,SEA,2020-10-08,14:53:00,Afternoon,True
3,Werner,Basile,4,1,58.21,Aisle,LHR,SEA,2020-10-08,14:53:00,Afternoon,True
4,Kerwinn,Skillen,5,1,41.96,Middle,LHR,SEA,2020-10-08,14:53:00,Afternoon,True
...,...,...,...,...,...,...,...,...,...,...,...,...
995,Skye,McLaverty,106,10,10.46,Aisle,CAI,LHR,2020-12-22,10:54:00,Morning,False
996,Margaux,Rymour,107,10,0.00,Middle,CAI,LHR,2020-12-22,10:54:00,Morning,False
997,Corny,Vaszoly,108,10,44.60,Window,CAI,LHR,2020-12-22,10:54:00,Morning,False
998,Vittorio,Rushbrook,109,10,0.00,Window,CAI,LHR,2020-12-22,10:54:00,Morning,False


### What time of day were the most purchases made? 
### We want to take a look at the average for the flights within each time period. 

In [481]:
total_amount = final_output.groupby(["DepartTime"])["purchase_amount"].sum()

In [482]:
num_flight = final_output.groupby(["DepartTime"])["flight_number"].nunique()
num_flight

DepartTime
Afternoon    3
Evening      3
Morning      4
Name: flight_number, dtype: int64

In [483]:
time_of_day = round(total_amount / num_flight, 2)
time_of_day = time_of_day.reset_index()
time_of_day = time_of_day.rename(columns={0: "Avg per Flight"})
time_of_day["Rank"] = time_of_day["Avg per Flight"].rank(ascending=False).astype(int)
time_of_day

Unnamed: 0,DepartTime,Avg per Flight,Rank
0,Afternoon,2373.95,1
1,Evening,2192.29,3
2,Morning,2213.83,2


### What seat position had the highest purchase amount? 
### Is the aisle seat the highest earner because it's closest to the trolley?

In [484]:
seat_position = final_output.groupby(["Seat Position"])["purchase_amount"].sum().reset_index()
seat_position = seat_position.rename(columns={"purchase_amount": "Purchase Amount"})
seat_position["Rank"] = seat_position["Purchase Amount"].rank(ascending=False).astype(int)
seat_position

Unnamed: 0,Seat Position,Purchase Amount,Rank
0,Aisle,7636.04,2
1,Middle,7225.66,3
2,Window,7692.32,1


### As Business Class purchases are free, how much is this costing us?

In [485]:
business_or_economy = final_output.groupby(["Business_class_check"])["purchase_amount"].sum().reset_index()
business_or_economy["Rank"] = business_or_economy["purchase_amount"].rank(ascending=False).astype(int)
business_or_economy

Unnamed: 0,Business_class_check,purchase_amount,Rank
0,False,21412.4,1
1,True,1141.62,2


In [486]:
with pd.ExcelWriter("./output/Week13_output.xlsx") as writer:
    time_of_day.to_excel(writer, sheet_name="Time of Day")
    seat_position.to_excel(writer, sheet_name="Seat Position")
    business_or_economy.to_excel(writer, sheet_name="Business or Economy")