## 2021: Week 32 Excelling through aggregation

My partner is an amazing Excel user as are many of her colleagues. When in a pub, a frequent getting to know you question was "What's your favourite Excel function?". As a SQL / Tableau user, after my first meeting I knew I had to up my game. SUMIFS became my go to answer and that is one of the functions we will look to replicate in Prep this week. 

SUMIF, or SUMIFS if you have multiple conditions, allows you to scan a data set and summarise the values that match any condition you create. When working with large tables with multiple entries per category, this is a great way to create some totals to help you analyse the data set. Whilst SUMIF doesn't exist within Prep, the IF function and aggregation step can be used to create the same effect. 

Excel allows for lots of different types of aggregations so whilst SUMIF was my go-to answer: average, minimum, count etc are all possible too. 

The challenge this week is forming the logic in Prep to replicate SUMIFS and AVERAGEIFS.

### Input
Daily ticket sales for each flight for six months.
![img](https://1.bp.blogspot.com/-WhxHHlosk1c/YRIqRHZT3gI/AAAAAAAACPI/AeYzSx7Fy_EbMjT0krNuS9oPjd8hS5JuwCLcBGAsYHQ/w640-h196/Screenshot%2B2021-08-10%2Bat%2B08.26.42.png)

### Requirement
- Input data
- Form Flight name
- Workout how many days between the sale and the flight departing
- Classify daily sales of a flight as:
    - Less than 7 days before departure
    - 7 or more days before departure
- Mimic the SUMIFS and AverageIFS functions by aggregating the previous requirements fields by each Flight and Class
- Round all data to zero decimal places
- Output the data

### Output
![img](https://1.bp.blogspot.com/-C6WFeDTp2IQ/YRJfOw3MMJI/AAAAAAAACPQ/07jL6a59TsY9HPSlmlVc_W-iB5wjZT2zQCLcBGAsYHQ/w640-h122/Screenshot%2B2021-08-10%2Bat%2B12.12.39.png)

One file containing:
6 data fields:
- Flight
- Class
- Avg. daily sales 7 days of more until the flight
- Avg. daily sales less than 7 days until the flight
- Sales 7 days of more until the flight
- Sales less than 7 days until the flight

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
### Input the data

In [4]:
df = pd.read_csv("./data/PD 2021 Wk 32 Input - Data.csv")
df.head()

Unnamed: 0,Departure,Destination,Date,Class,Date of Flight,Ticket Sales
0,London,Perth,01/01/2021,Economy,31/01/2021,572
1,London,Perth,02/01/2021,Economy,31/01/2021,1111
2,London,Perth,03/01/2021,Economy,31/01/2021,845
3,London,Perth,04/01/2021,Economy,31/01/2021,862
4,London,Perth,05/01/2021,Economy,31/01/2021,1087


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1448 entries, 0 to 1447
Data columns (total 6 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Departure       1448 non-null   object
 1   Destination     1448 non-null   object
 2   Date            1448 non-null   object
 3   Class           1448 non-null   object
 4   Date of Flight  1448 non-null   object
 5   Ticket Sales    1448 non-null   int64 
dtypes: int64(1), object(5)
memory usage: 68.0+ KB


In [None]:
### Form Flight name

In [9]:
df["Flight"] = df["Departure"] + " to " + df["Destination"]
df["Flight"].value_counts()

London to Perth    362
Perth to London    362
London to Paris    362
Paris to London    362
Name: Flight, dtype: int64

In [None]:
### Workout how many days between the sale and the flight departing

In [14]:
df["Date"] = pd.to_datetime(df["Date"], format="%d/%m/%Y")
df["Date of Flight"] = pd.to_datetime(df["Date of Flight"], format="%d/%m/%Y")

In [17]:
df["Sales days"] = df["Date of Flight"] - df["Date"]
df["Sales days"].value_counts()

15 days    48
14 days    48
1 days     48
2 days     48
3 days     48
4 days     48
5 days     48
6 days     48
7 days     48
8 days     48
9 days     48
10 days    48
11 days    48
12 days    48
13 days    48
0 days     48
16 days    48
17 days    48
18 days    48
19 days    48
20 days    48
21 days    48
22 days    48
23 days    48
24 days    48
25 days    48
26 days    48
27 days    48
29 days    40
28 days    40
30 days    24
Name: Sales days, dtype: int64

### Classify daily sales of a flight as:
- Less than 7 days before departure
- 7 or more days before departure

In [24]:
less_than_7_days = df.loc[df["Sales days"] < "7 days"]
more_than_7_days = df.loc[df["Sales days"] >= "7 days"]

Unnamed: 0,Departure,Destination,Date,Class,Date of Flight,Ticket Sales,Flight,Sales days
24,London,Perth,2021-01-25,Economy,2021-01-31,971,London to Perth,6 days
25,London,Perth,2021-01-26,Economy,2021-01-31,569,London to Perth,5 days
26,London,Perth,2021-01-27,Economy,2021-01-31,333,London to Perth,4 days
27,London,Perth,2021-01-28,Economy,2021-01-31,833,London to Perth,3 days
28,London,Perth,2021-01-29,Economy,2021-01-31,1006,London to Perth,2 days
...,...,...,...,...,...,...,...,...
1428,Paris,London,2021-06-11,Business,2021-06-15,1218,Paris to London,4 days
1429,Paris,London,2021-06-12,Business,2021-06-15,1486,Paris to London,3 days
1430,Paris,London,2021-06-13,Business,2021-06-15,863,Paris to London,2 days
1431,Paris,London,2021-06-14,Business,2021-06-15,844,Paris to London,1 days
