# Preppin' Data
## 2024: Week 4 - Unpopular Seats
**Created by:** Carl Allchin | [Challenge Link](https://preppindata.blogspot.com/2024/01/2024-week-4-unpopular-seats.html)

This challenge will test using join types to return the data you require for the output.<br>
This week we are trying to understand which seats aren't chosen on our planes as we're thinking of applying fees for customers to choose their seat when booking. 

In [1]:
# Input the Excel workbook containing the four worksheets of data

import pandas as pd
flights = pd.read_excel("PD 2024 Wk 4 Input.xlsx", sheet_name=["Flow Card", "Non_flow Card", "Non_flow Card2"])
seatplan = pd.read_excel("PD 2024 Wk 4 Input.xlsx", sheet_name="Seat Plan")

In [2]:
flights

{'Flow Card':       CustomerID  Seat  Row Class
 0            654     2    2    FC
 1            466     4    5    FC
 2             27     4    3    FC
 3            519     1    4    FC
 4            933     2    3    FC
 ...          ...   ...  ...   ...
 9719        3040    10   38     E
 9720        4429     3   28     E
 9721        2593    10   37     E
 9722        4336     6   42     E
 9723        4378     3   36     E
 
 [9724 rows x 4 columns],
 'Non_flow Card':       CustomerID  Seat  Row Class
 0            765     1    3    FC
 1            501     2    7    FC
 2            885     4    2    FC
 3            203     1    5    FC
 4            676     2    3    FC
 ...          ...   ...  ...   ...
 9744        3257     3   39     E
 9745        3321     3   35     E
 9746        2906     8   29     E
 9747        2207     3   36     E
 9748        3510     9   29     E
 
 [9749 rows x 4 columns],
 'Non_flow Card2':       CustomerID  Seat  Row Class
 0            242    

In [3]:
seatplan

Unnamed: 0,Class,Seat,Row
0,FC,1,1
1,FC,2,1
2,FC,3,1
3,FC,4,1
4,FC,1,2
...,...,...,...
291,E,6,42
292,E,7,42
293,E,8,42
294,E,9,42


In [4]:
# Union the Flow Card and Non-Flow card data sets together
# Create a data field to show whether the seat was booked by someone with the Flow Card or not
# Call this field 'Flow Card?'

flights = pd.concat([flights["Flow Card"], flights["Non_flow Card"], flights["Non_flow Card2"]], keys=["Yes", "No", "No"], names=['Flow Card?']).reset_index().sort_index()
flights.pop("level_1")
flights

Unnamed: 0,Flow Card?,CustomerID,Seat,Row,Class
0,Yes,654,2,2,FC
1,Yes,466,4,5,FC
2,Yes,27,4,3,FC
3,Yes,519,1,4,FC
4,Yes,933,2,3,FC
...,...,...,...,...,...
29208,No,3005,7,35,E
29209,No,4685,4,27,E
29210,No,2512,8,38,E
29211,No,3863,4,37,E


In [5]:
# Aggregate the Seat Bookings to count how many bookings there are for: Each Seat, In each Row, In each Class, For Flow and Non-Flow Card holders

flights_pivot = flights.pivot_table(index=["Flow Card?", "Class", "Row", "Seat"], aggfunc="size").reset_index(name="Count of Bookings")
flights_pivot

Unnamed: 0,Flow Card?,Class,Row,Seat,Count of Bookings
0,No,BC,9,1,69
1,No,BC,9,2,63
2,No,BC,9,3,51
3,No,BC,9,4,65
4,No,BC,10,1,58
...,...,...,...,...,...
573,Yes,PE,26,4,37
574,Yes,PE,26,5,31
575,Yes,PE,26,6,30
576,Yes,PE,26,7,33


In [6]:
# Join on the Seating Plan data to ensure you have a data set for every seat on the plane, even if it hasn't been booked

combined = seatplan.merge(flights_pivot, how="left", on=["Class", "Row", "Seat"])
combined_pivot = combined.pivot_table(values="Count of Bookings", index=["Class", "Row", "Seat"], aggfunc="sum").reset_index()
combined_pivot

Unnamed: 0,Class,Row,Seat,Count of Bookings
0,BC,9,1,101.0
1,BC,9,2,87.0
2,BC,9,3,92.0
3,BC,9,4,94.0
4,BC,10,1,82.0
...,...,...,...,...
291,PE,26,4,111.0
292,PE,26,5,101.0
293,PE,26,6,90.0
294,PE,26,7,119.0


In [20]:
# Only return the records for the seats that haven't been booked

no_bookings = combined_pivot[combined_pivot["Count of Bookings"] == 0]
no_bookings.pop("Count of Bookings")
no_bookings

Unnamed: 0,Class,Row,Seat
54,E,28,5
95,E,32,6
134,E,36,5
145,E,37,6
174,E,40,5
175,E,40,6
184,E,41,5


## Output

In [22]:
# Output the data set showing what seat, rows and class have NOT been booked
output = no_bookings.rename_axis(columns=None)
output

Unnamed: 0,Class,Row,Seat
54,E,28,5
95,E,32,6
134,E,36,5
145,E,37,6
174,E,40,5
175,E,40,6
184,E,41,5


In [23]:
# Generating csv output file
output.to_csv("output-202404.csv", index=False)