### Preppin' Data Challenge 2024: Week 2 - Average Price Analysis
Input the two csv files

In [288]:
import pandas as pd
import numpy as np

df1 = pd.read_csv('PD2024_W2_FLOWCARD.csv')
df2 = pd.read_csv('PD2024_WK2_NONFLOWCARD.csv')

<br>
Union the files together

In [289]:
df = pd.concat([df1, df2], ignore_index=True)

<br>
Convert the Date field to a Quarter Number instead

- Name this field Quarter

In [290]:
#Coverting String Date to Datetime
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')

#Coverting Date value to Quarter (Integer)
df['Quarter'] = df['Date'].dt.quarter


Aggregate the data in the following ways:

- Median price per Quarter, Flow Card? and Class
- Minimum price per Quarter, Flow Card? and Class
- Maximum price per Quarter, Flow Card? and Class

Create three separate flows where you have only one of the aggregated measures in each.

- One for the minimum price
- One for the median price
- One for the maximum price

Optional = you might want to add a column to show which aggregation each value is minimum, medium or maximum.

In [291]:
# Median price per Quarter, Flow Card? and Class
df_flow1 = df.groupby(['Quarter', 'Flow Card?', 'Class'], as_index=False).agg({'Price':'median'})
df_flow1['Aggregation Unit'] = 'Median'

In [292]:
# Minimum price per Quarter, Flow Card? and Class
df_flow2 = df.groupby(['Quarter', 'Flow Card?', 'Class'], as_index=False).agg({'Price':'min'})
df_flow2['Aggregation Unit'] = 'Minimum'

In [293]:
# Maximum price per Quarter, Flow Card? and Class
df_flow3 = df.groupby(['Quarter', 'Flow Card?', 'Class'], as_index=False).agg({'Price':'max'})
df_flow3['Aggregation Unit'] = 'Maximum'

<br>
Now pivot the data to have a column per class for each quarter and whether the passenger had a flow card or not

In [294]:
#Pivoting df_flow1
Pivot_flow1 = df_flow1.pivot(index = ['Quarter', 'Flow Card?', 'Aggregation Unit'], columns='Class', values = 'Price')
Pivot_flow1.reset_index(inplace=True) #avoid having hierarchical index after pivoting the DataFrame

In [295]:
#Pivoting df_flow2
Pivot_flow2 = df_flow2.pivot(index = ['Quarter', 'Flow Card?', 'Aggregation Unit'], columns='Class', values = 'Price')
Pivot_flow2.reset_index(inplace=True) #avoid having hierarchical index after pivoting the DataFrame

In [296]:
#Pivoting df_flow3
Pivot_flow3 = df_flow3.pivot(index = ['Quarter', 'Flow Card?', 'Aggregation Unit'], columns='Class', values = 'Price')
Pivot_flow3.reset_index(inplace=True) #avoid having hierarchical index after pivoting the DataFrame

<br>
Union these flows back together

In [297]:
df = pd.concat([Pivot_flow1, Pivot_flow2, Pivot_flow3], ignore_index=True)

<br>
What's this you see??? Economy is the most expensive seats and first class is the cheapest? When you go and check with your manager you realise the original data has been incorrectly classified so you need to the names of these columns.

Change the name of the following columns:

- Economy to First
- First Class to Economy
- Business Class to Premium
- Premium Economy to Business

In [298]:
df.rename(columns={'Economy': 'First'}, inplace=True)
df.rename(columns={'First Class': 'Economy'}, inplace=True)
df.rename(columns={'Business Class': 'Premium'}, inplace=True)
df.rename(columns={'Premium Economy': 'Business'}, inplace=True)

In [299]:
#reordering columns as desired output
df = df[['Flow Card?', 'Quarter', 'Economy', 'Premium', 'Business', 'First', 'Aggregation Unit']]

<br>
Ouput the Data

In [300]:
df.to_csv('PD2024W2_OUTPUT.csv', index=False)

In [301]:
#verifying the output
Output = pd.read_csv('PD2024W2_OUTPUT.csv')
Output

Unnamed: 0,Flow Card?,Quarter,Economy,Premium,Business,First,Aggregation Unit
0,No,1,438.0,574.8,1075.0,2340.0,Median
1,Yes,1,447.5,523.2,1160.0,2325.0,Median
2,No,2,445.0,553.8,1205.0,2325.0,Median
3,Yes,2,459.0,517.8,1071.25,2290.0,Median
4,No,3,487.0,490.8,1125.0,2285.0,Median
5,Yes,3,457.0,553.8,1090.0,2347.5,Median
6,No,4,428.0,555.6,1062.5,2202.5,Median
7,Yes,4,424.0,522.6,1108.75,2212.5,Median
8,No,1,204.0,241.2,515.0,1030.0,Minimum
9,Yes,1,201.0,249.6,502.5,1020.0,Minimum
