# Preppin' Data
## 2024: Week 1 - Prep Air's Flow Card
**Created by:** Carl Allchin | [Challenge Link](https://preppindata.blogspot.com/2024/01/2024-week-1-prep-airs-flow-card.html)

At Preppin' Data we use a number of (mock) companies to look at the challenges they have with their data. For January, we're going to focus on our own airline, Prep Air. <br>
The airline has introduced a new loyalty card called the Flow Card. We need to clean up a number of data sets to determine how well the card is doing. <br>

The first task is setting some context for later weeks by understanding how popular the Flow Card is. Our stakeholder would like two data sets about our passengers. One data set for card users and one data set for those who don't use the card. 

In [1]:
# Input the data
import pandas as pd
prepair = pd.read_csv("PD 2024 Wk 1 Input.csv")
prepair

Unnamed: 0,Flight Details,Flow Card?,Bags Checked,Meal Type
0,2024-07-22//PA010//Tokyo-New York//Economy//2380,1,0,Egg Free
1,2024-09-28//PA008//Perth-New York//Economy//1855,0,2,Vegetarian
2,2024-04-20//PA002//New York-London//Economy//3490,1,1,Vegan
3,2024-01-23//PA010//Tokyo-New York//Premium Eco...,1,1,Vegetarian
4,2024-10-01//PA008//Perth-New York//Business Cl...,0,0,Vegetarian
...,...,...,...,...
3773,2024-05-05//PA009//New York-Tokyo//Economy//1360,0,3,Nut Free
3774,2024-06-14//PA008//Perth-New York//First Class...,0,1,Dairy Free
3775,2024-01-16//PA010//Tokyo-New York//Economy//2410,0,2,Egg Free
3776,2024-08-16//PA005//London-Tokyo//Premium Econo...,0,0,Nut Free


In [2]:
# Split the Flight Details field to form: Date, Flight Number, Class, Price
prepair[["Date", "Flight Number", "From-To", "Class", "Price"]] = prepair["Flight Details"].str.split("//", expand=True)
prepair

Unnamed: 0,Flight Details,Flow Card?,Bags Checked,Meal Type,Date,Flight Number,From-To,Class,Price
0,2024-07-22//PA010//Tokyo-New York//Economy//2380,1,0,Egg Free,2024-07-22,PA010,Tokyo-New York,Economy,2380
1,2024-09-28//PA008//Perth-New York//Economy//1855,0,2,Vegetarian,2024-09-28,PA008,Perth-New York,Economy,1855
2,2024-04-20//PA002//New York-London//Economy//3490,1,1,Vegan,2024-04-20,PA002,New York-London,Economy,3490
3,2024-01-23//PA010//Tokyo-New York//Premium Eco...,1,1,Vegetarian,2024-01-23,PA010,Tokyo-New York,Premium Economy,825
4,2024-10-01//PA008//Perth-New York//Business Cl...,0,0,Vegetarian,2024-10-01,PA008,Perth-New York,Business Class,634.79999999999995
...,...,...,...,...,...,...,...,...,...
3773,2024-05-05//PA009//New York-Tokyo//Economy//1360,0,3,Nut Free,2024-05-05,PA009,New York-Tokyo,Economy,1360
3774,2024-06-14//PA008//Perth-New York//First Class...,0,1,Dairy Free,2024-06-14,PA008,Perth-New York,First Class,245
3775,2024-01-16//PA010//Tokyo-New York//Economy//2410,0,2,Egg Free,2024-01-16,PA010,Tokyo-New York,Economy,2410
3776,2024-08-16//PA005//London-Tokyo//Premium Econo...,0,0,Nut Free,2024-08-16,PA005,London-Tokyo,Premium Economy,960


In [3]:
# Dropping the Flight Details Column
prepair.pop("Flight Details")

0        2024-07-22//PA010//Tokyo-New York//Economy//2380
1        2024-09-28//PA008//Perth-New York//Economy//1855
2       2024-04-20//PA002//New York-London//Economy//3490
3       2024-01-23//PA010//Tokyo-New York//Premium Eco...
4       2024-10-01//PA008//Perth-New York//Business Cl...
                              ...                        
3773     2024-05-05//PA009//New York-Tokyo//Economy//1360
3774    2024-06-14//PA008//Perth-New York//First Class...
3775     2024-01-16//PA010//Tokyo-New York//Economy//2410
3776    2024-08-16//PA005//London-Tokyo//Premium Econo...
3777    2024-01-06//PA004//Perth-London//First Class//236
Name: Flight Details, Length: 3778, dtype: object

In [4]:
# Split the From-To field to form: From, To
prepair[["From", "To"]] = prepair["From-To"].str.split("-", expand=True)
prepair

Unnamed: 0,Flow Card?,Bags Checked,Meal Type,Date,Flight Number,From-To,Class,Price,From,To
0,1,0,Egg Free,2024-07-22,PA010,Tokyo-New York,Economy,2380,Tokyo,New York
1,0,2,Vegetarian,2024-09-28,PA008,Perth-New York,Economy,1855,Perth,New York
2,1,1,Vegan,2024-04-20,PA002,New York-London,Economy,3490,New York,London
3,1,1,Vegetarian,2024-01-23,PA010,Tokyo-New York,Premium Economy,825,Tokyo,New York
4,0,0,Vegetarian,2024-10-01,PA008,Perth-New York,Business Class,634.79999999999995,Perth,New York
...,...,...,...,...,...,...,...,...,...,...
3773,0,3,Nut Free,2024-05-05,PA009,New York-Tokyo,Economy,1360,New York,Tokyo
3774,0,1,Dairy Free,2024-06-14,PA008,Perth-New York,First Class,245,Perth,New York
3775,0,2,Egg Free,2024-01-16,PA010,Tokyo-New York,Economy,2410,Tokyo,New York
3776,0,0,Nut Free,2024-08-16,PA005,London-Tokyo,Premium Economy,960,London,Tokyo


In [5]:
# Dropping the From-To Column
prepair.pop("From-To")

0        Tokyo-New York
1        Perth-New York
2       New York-London
3        Tokyo-New York
4        Perth-New York
             ...       
3773     New York-Tokyo
3774     Perth-New York
3775     Tokyo-New York
3776       London-Tokyo
3777       Perth-London
Name: From-To, Length: 3778, dtype: object

In [6]:
# Convert the following data fields to the correct data types: Date to a date format
prepair["Date"] = pd.to_datetime(prepair["Date"], format="%Y-%m-%d")
prepair

Unnamed: 0,Flow Card?,Bags Checked,Meal Type,Date,Flight Number,Class,Price,From,To
0,1,0,Egg Free,2024-07-22,PA010,Economy,2380,Tokyo,New York
1,0,2,Vegetarian,2024-09-28,PA008,Economy,1855,Perth,New York
2,1,1,Vegan,2024-04-20,PA002,Economy,3490,New York,London
3,1,1,Vegetarian,2024-01-23,PA010,Premium Economy,825,Tokyo,New York
4,0,0,Vegetarian,2024-10-01,PA008,Business Class,634.79999999999995,Perth,New York
...,...,...,...,...,...,...,...,...,...
3773,0,3,Nut Free,2024-05-05,PA009,Economy,1360,New York,Tokyo
3774,0,1,Dairy Free,2024-06-14,PA008,First Class,245,Perth,New York
3775,0,2,Egg Free,2024-01-16,PA010,Economy,2410,Tokyo,New York
3776,0,0,Nut Free,2024-08-16,PA005,Premium Economy,960,London,Tokyo


In [7]:
# Convert the following data fields to the correct data types: Price to a decimal value
prepair["Price"] = prepair["Price"].astype(float)
prepair

Unnamed: 0,Flow Card?,Bags Checked,Meal Type,Date,Flight Number,Class,Price,From,To
0,1,0,Egg Free,2024-07-22,PA010,Economy,2380.0,Tokyo,New York
1,0,2,Vegetarian,2024-09-28,PA008,Economy,1855.0,Perth,New York
2,1,1,Vegan,2024-04-20,PA002,Economy,3490.0,New York,London
3,1,1,Vegetarian,2024-01-23,PA010,Premium Economy,825.0,Tokyo,New York
4,0,0,Vegetarian,2024-10-01,PA008,Business Class,634.8,Perth,New York
...,...,...,...,...,...,...,...,...,...
3773,0,3,Nut Free,2024-05-05,PA009,Economy,1360.0,New York,Tokyo
3774,0,1,Dairy Free,2024-06-14,PA008,First Class,245.0,Perth,New York
3775,0,2,Egg Free,2024-01-16,PA010,Economy,2410.0,Tokyo,New York
3776,0,0,Nut Free,2024-08-16,PA005,Premium Economy,960.0,London,Tokyo


In [8]:
# Change the Flow Card field to Yes / No values instead of 1 / 0
prepair["Flow Card?"] = prepair["Flow Card?"].replace(1, "Yes")
prepair["Flow Card?"] = prepair["Flow Card?"].replace(0, "No")
prepair

Unnamed: 0,Flow Card?,Bags Checked,Meal Type,Date,Flight Number,Class,Price,From,To
0,Yes,0,Egg Free,2024-07-22,PA010,Economy,2380.0,Tokyo,New York
1,No,2,Vegetarian,2024-09-28,PA008,Economy,1855.0,Perth,New York
2,Yes,1,Vegan,2024-04-20,PA002,Economy,3490.0,New York,London
3,Yes,1,Vegetarian,2024-01-23,PA010,Premium Economy,825.0,Tokyo,New York
4,No,0,Vegetarian,2024-10-01,PA008,Business Class,634.8,Perth,New York
...,...,...,...,...,...,...,...,...,...
3773,No,3,Nut Free,2024-05-05,PA009,Economy,1360.0,New York,Tokyo
3774,No,1,Dairy Free,2024-06-14,PA008,First Class,245.0,Perth,New York
3775,No,2,Egg Free,2024-01-16,PA010,Economy,2410.0,Tokyo,New York
3776,No,0,Nut Free,2024-08-16,PA005,Premium Economy,960.0,London,Tokyo


## Outputs
Create two tables, one for Flow Card holders and one for non-Flow Card holders

In [9]:
# Output 1 - Flow Card holders
output1 = prepair[prepair["Flow Card?"] == "Yes"]
output1

Unnamed: 0,Flow Card?,Bags Checked,Meal Type,Date,Flight Number,Class,Price,From,To
0,Yes,0,Egg Free,2024-07-22,PA010,Economy,2380.0,Tokyo,New York
2,Yes,1,Vegan,2024-04-20,PA002,Economy,3490.0,New York,London
3,Yes,1,Vegetarian,2024-01-23,PA010,Premium Economy,825.0,Tokyo,New York
6,Yes,3,Vegan,2024-06-05,PA006,First Class,618.0,Tokyo,London
8,Yes,1,Nut Free,2024-03-30,PA004,First Class,446.0,Perth,London
...,...,...,...,...,...,...,...,...,...
3764,Yes,2,Egg Free,2024-11-23,PA005,Economy,2070.0,London,Tokyo
3766,Yes,3,Nut Free,2024-11-04,PA003,First Class,210.0,London,Perth
3770,Yes,0,Dairy Free,2024-04-29,PA012,Economy,3490.0,Tokyo,Perth
3772,Yes,2,Vegetarian,2024-09-26,PA001,First Class,207.0,London,New York


In [10]:
# Output 2 - Non-Flow Card holders
output2 = prepair[prepair["Flow Card?"] == "No"]
output2

Unnamed: 0,Flow Card?,Bags Checked,Meal Type,Date,Flight Number,Class,Price,From,To
1,No,2,Vegetarian,2024-09-28,PA008,Economy,1855.0,Perth,New York
4,No,0,Vegetarian,2024-10-01,PA008,Business Class,634.8,Perth,New York
5,No,3,Nut Free,2024-03-04,PA007,Business Class,458.4,New York,Perth
7,No,0,,2024-02-25,PA010,Premium Economy,1435.0,Tokyo,New York
13,No,2,Vegan,2024-03-29,PA004,Economy,2730.0,Perth,London
...,...,...,...,...,...,...,...,...,...
3771,No,2,Vegetarian,2024-03-06,PA006,Premium Economy,940.0,Tokyo,London
3773,No,3,Nut Free,2024-05-05,PA009,Economy,1360.0,New York,Tokyo
3774,No,1,Dairy Free,2024-06-14,PA008,First Class,245.0,Perth,New York
3775,No,2,Egg Free,2024-01-16,PA010,Economy,2410.0,Tokyo,New York


In [14]:
# Generating csv output files
output1.to_csv("output1-202401.csv", index=False)
output2.to_csv("output2-202401.csv", index=False)