# Preppin' Data
## 2024: Week 2 - Average Price Analysis
**Created by:** Carl Allchin | [Challenge Link](https://preppindata.blogspot.com/2024/01/2024-week-2-average-price-analysis.html)

The input data set for this week is the output from week one.

In [1]:
# Input the two csv files
import pandas as pd
output1 = pd.read_csv("PD 2024 Wk 1 Output Flow Card.csv", parse_dates=["Date"], date_format="%d/%m/%Y")
output2 = pd.read_csv("PD 2024 Wk 1 Output Non-Flow Card.csv", parse_dates=["Date"], date_format="%d/%m/%Y")

In [2]:
output1.head()

Unnamed: 0,Date,Flight Number,From,To,Class,Price,Flow Card?,Bags Checked,Meal Type
0,2024-07-22,PA010,Tokyo,New York,Economy,2380.0,Yes,0,Egg Free
1,2024-04-20,PA002,New York,London,Economy,3490.0,Yes,1,Vegan
2,2024-01-23,PA010,Tokyo,New York,Premium Economy,825.0,Yes,1,Vegetarian
3,2024-06-05,PA006,Tokyo,London,First Class,618.0,Yes,3,Vegan
4,2024-03-30,PA004,Perth,London,First Class,446.0,Yes,1,Nut Free


In [3]:
output2.head()

Unnamed: 0,Date,Flight Number,From,To,Class,Price,Flow Card?,Bags Checked,Meal Type
0,2024-09-28,PA008,Perth,New York,Economy,1855.0,No,2,Vegetarian
1,2024-10-01,PA008,Perth,New York,Business Class,634.8,No,0,Vegetarian
2,2024-03-04,PA007,New York,Perth,Business Class,458.4,No,3,Nut Free
3,2024-02-25,PA010,Tokyo,New York,Premium Economy,1435.0,No,0,
4,2024-03-29,PA004,Perth,London,Economy,2730.0,No,2,Vegan


In [4]:
# Union the files together
flights = pd.concat([output1, output2])
flights

Unnamed: 0,Date,Flight Number,From,To,Class,Price,Flow Card?,Bags Checked,Meal Type
0,2024-07-22,PA010,Tokyo,New York,Economy,2380.0,Yes,0,Egg Free
1,2024-04-20,PA002,New York,London,Economy,3490.0,Yes,1,Vegan
2,2024-01-23,PA010,Tokyo,New York,Premium Economy,825.0,Yes,1,Vegetarian
3,2024-06-05,PA006,Tokyo,London,First Class,618.0,Yes,3,Vegan
4,2024-03-30,PA004,Perth,London,First Class,446.0,Yes,1,Nut Free
...,...,...,...,...,...,...,...,...,...
1890,2024-03-06,PA006,Tokyo,London,Premium Economy,940.0,No,2,Vegetarian
1891,2024-05-05,PA009,New York,Tokyo,Economy,1360.0,No,3,Nut Free
1892,2024-06-14,PA008,Perth,New York,First Class,245.0,No,1,Dairy Free
1893,2024-01-16,PA010,Tokyo,New York,Economy,2410.0,No,2,Egg Free


In [5]:
# Convert the Date field to a Quarter Number instead
# Name this field Quarter
flights["Quarter"] = flights["Date"].dt.quarter.astype(str)
flights

Unnamed: 0,Date,Flight Number,From,To,Class,Price,Flow Card?,Bags Checked,Meal Type,Quarter
0,2024-07-22,PA010,Tokyo,New York,Economy,2380.0,Yes,0,Egg Free,3
1,2024-04-20,PA002,New York,London,Economy,3490.0,Yes,1,Vegan,2
2,2024-01-23,PA010,Tokyo,New York,Premium Economy,825.0,Yes,1,Vegetarian,1
3,2024-06-05,PA006,Tokyo,London,First Class,618.0,Yes,3,Vegan,2
4,2024-03-30,PA004,Perth,London,First Class,446.0,Yes,1,Nut Free,1
...,...,...,...,...,...,...,...,...,...,...
1890,2024-03-06,PA006,Tokyo,London,Premium Economy,940.0,No,2,Vegetarian,1
1891,2024-05-05,PA009,New York,Tokyo,Economy,1360.0,No,3,Nut Free,2
1892,2024-06-14,PA008,Perth,New York,First Class,245.0,No,1,Dairy Free,2
1893,2024-01-16,PA010,Tokyo,New York,Economy,2410.0,No,2,Egg Free,1


In [6]:
flights = flights.set_index(["Quarter","Flow Card?", "Class"]).sort_index()
flights

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Date,Flight Number,From,To,Price,Bags Checked,Meal Type
Quarter,Flow Card?,Class,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1,No,Business Class,2024-03-04,PA007,New York,Perth,458.4,3,Nut Free
1,No,Business Class,2024-03-10,PA011,Perth,Tokyo,556.8,3,Dairy Free
1,No,Business Class,2024-03-07,PA006,Tokyo,London,811.2,0,Egg Free
1,No,Business Class,2024-03-26,PA008,Perth,New York,484.8,2,Egg Free
1,No,Business Class,2024-03-02,PA003,London,Perth,439.2,1,Dairy Free
...,...,...,...,...,...,...,...,...,...
4,Yes,Premium Economy,2024-11-29,PA003,London,Perth,915.0,0,Dairy Free
4,Yes,Premium Economy,2024-12-26,PA006,Tokyo,London,1060.0,1,Vegan
4,Yes,Premium Economy,2024-10-17,PA009,New York,Tokyo,1172.5,1,Nut Free
4,Yes,Premium Economy,2024-11-08,PA005,London,Tokyo,747.5,0,Dairy Free


**Aggregate the data in the following ways:**<br>
Median price per Quarter, Flow Card? and Class<br>
Minimum price per Quarter, Flow Card? and Class<br>
Maximum price per Quarter, Flow Card? and Class<br><br>
**Create three separate flows where you have only one of the aggregated measures in each.** <br>
One for the minimum price<br>
One for the median price<br>
One for the maximum price<br>

In [7]:
# Median price per Quarter, Flow Card? and Class
price_median = flights.pivot_table(values="Price", index=["Quarter","Flow Card?", "Class"], aggfunc="median").rename(columns={"Price": "medium"})
price_median

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,medium
Quarter,Flow Card?,Class,Unnamed: 3_level_1
1,No,Business Class,574.8
1,No,Economy,2340.0
1,No,First Class,438.0
1,No,Premium Economy,1075.0
1,Yes,Business Class,523.2
1,Yes,Economy,2325.0
1,Yes,First Class,447.5
1,Yes,Premium Economy,1160.0
2,No,Business Class,553.8
2,No,Economy,2325.0


In [8]:
# Minimum price per Quarter, Flow Card? and Class
price_min = flights.pivot_table(values="Price", index=["Quarter","Flow Card?", "Class"], aggfunc="min").rename(columns={"Price": "minimum"})
price_min

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,minimum
Quarter,Flow Card?,Class,Unnamed: 3_level_1
1,No,Business Class,241.2
1,No,Economy,1030.0
1,No,First Class,204.0
1,No,Premium Economy,515.0
1,Yes,Business Class,249.6
1,Yes,Economy,1020.0
1,Yes,First Class,201.0
1,Yes,Premium Economy,502.5
2,No,Business Class,240.0
2,No,Economy,1000.0


In [9]:
# Maximum price per Quarter, Flow Card? and Class
price_max = flights.pivot_table(values="Price", index=["Quarter","Flow Card?", "Class"], aggfunc="max").rename(columns={"Price": "maximum"})
price_max

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,maximum
Quarter,Flow Card?,Class,Unnamed: 3_level_1
1,No,Business Class,834.0
1,No,Economy,3455.0
1,No,First Class,699.0
1,No,Premium Economy,1702.5
1,Yes,Business Class,840.0
1,Yes,Economy,3500.0
1,Yes,First Class,698.0
1,Yes,Premium Economy,1737.5
2,No,Business Class,828.0
2,No,Economy,3480.0


In [10]:
# Union these flows back together
flights = pd.concat([price_median, price_min, price_max], axis="columns").reset_index()
flights

Unnamed: 0,Quarter,Flow Card?,Class,medium,minimum,maximum
0,1,No,Business Class,574.8,241.2,834.0
1,1,No,Economy,2340.0,1030.0,3455.0
2,1,No,First Class,438.0,204.0,699.0
3,1,No,Premium Economy,1075.0,515.0,1702.5
4,1,Yes,Business Class,523.2,249.6,840.0
5,1,Yes,Economy,2325.0,1020.0,3500.0
6,1,Yes,First Class,447.5,201.0,698.0
7,1,Yes,Premium Economy,1160.0,502.5,1737.5
8,2,No,Business Class,553.8,240.0,828.0
9,2,No,Economy,2325.0,1000.0,3480.0


In [11]:
# Optional = you might want to add a column to show which aggregation each value is minimum, medium or maximum.
flights = flights.melt(id_vars=["Quarter", "Flow Card?", "Class"], value_vars=['medium', 'minimum', 'maximum'], var_name="Price Aggregation", value_name="Price")
flights

Unnamed: 0,Quarter,Flow Card?,Class,Price Aggregation,Price
0,1,No,Business Class,medium,574.8
1,1,No,Economy,medium,2340.0
2,1,No,First Class,medium,438.0
3,1,No,Premium Economy,medium,1075.0
4,1,Yes,Business Class,medium,523.2
...,...,...,...,...,...
91,4,No,Premium Economy,maximum,1730.0
92,4,Yes,Business Class,maximum,834.0
93,4,Yes,Economy,maximum,3460.0
94,4,Yes,First Class,maximum,697.0


In [12]:
# Now pivot the data to have a column per class for each quarter and whether the passenger had a flow card or not
flights = flights.reset_index().pivot_table(index=["Quarter", "Flow Card?", "Price Aggregation"], columns="Class", values="Price")
flights

Unnamed: 0_level_0,Unnamed: 1_level_0,Class,Business Class,Economy,First Class,Premium Economy
Quarter,Flow Card?,Price Aggregation,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,No,maximum,834.0,3455.0,699.0,1702.5
1,No,medium,574.8,2340.0,438.0,1075.0
1,No,minimum,241.2,1030.0,204.0,515.0
1,Yes,maximum,840.0,3500.0,698.0,1737.5
1,Yes,medium,523.2,2325.0,447.5,1160.0
1,Yes,minimum,249.6,1020.0,201.0,502.5
2,No,maximum,828.0,3480.0,694.0,1745.0
2,No,medium,553.8,2325.0,445.0,1205.0
2,No,minimum,240.0,1000.0,202.0,507.5
2,Yes,maximum,840.0,3490.0,696.0,1737.5


What's this you see??? Economy is the most expensive seats and first class is the cheapest?<br>
When you go and check with your manager you realise the original data has been incorrectly classified so you need to change the names of these columns.<br><br>
**Change the name of the following columns:**<br>
Economy to First<br>
First Class to Economy<br>
Business Class to Premium<br>
Premium Economy to Business<br>

In [13]:
flights = flights.rename(columns={"Economy": "First", 
                                  "First Class": "Economy", 
                                  "Business Class": "Premium",
                                  "Premium Economy": "Business"})
flights

Unnamed: 0_level_0,Unnamed: 1_level_0,Class,Premium,First,Economy,Business
Quarter,Flow Card?,Price Aggregation,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,No,maximum,834.0,3455.0,699.0,1702.5
1,No,medium,574.8,2340.0,438.0,1075.0
1,No,minimum,241.2,1030.0,204.0,515.0
1,Yes,maximum,840.0,3500.0,698.0,1737.5
1,Yes,medium,523.2,2325.0,447.5,1160.0
1,Yes,minimum,249.6,1020.0,201.0,502.5
2,No,maximum,828.0,3480.0,694.0,1745.0
2,No,medium,553.8,2325.0,445.0,1205.0
2,No,minimum,240.0,1000.0,202.0,507.5
2,Yes,maximum,840.0,3490.0,696.0,1737.5


## Output
24 rows (25 including headers)<br>
<br>
**6 data fields:**
- Flow Card?
- Quarter
- Economy
- Premium
- Business
- First
- Optional = you might want to add a column to show which aggregation each value is minimum, medium or maximum.

In [14]:
# Output the data
output = flights.rename_axis(columns=None)
output

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Premium,First,Economy,Business
Quarter,Flow Card?,Price Aggregation,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,No,maximum,834.0,3455.0,699.0,1702.5
1,No,medium,574.8,2340.0,438.0,1075.0
1,No,minimum,241.2,1030.0,204.0,515.0
1,Yes,maximum,840.0,3500.0,698.0,1737.5
1,Yes,medium,523.2,2325.0,447.5,1160.0
1,Yes,minimum,249.6,1020.0,201.0,502.5
2,No,maximum,828.0,3480.0,694.0,1745.0
2,No,medium,553.8,2325.0,445.0,1205.0
2,No,minimum,240.0,1000.0,202.0,507.5
2,Yes,maximum,840.0,3490.0,696.0,1737.5


In [16]:
# Generating csv output file
output.to_csv("output-202402.csv", index=False)