## 2021: Week 21 Getting Trolleyed

Our final challenge for calculations month is all about the Analytical Calculations in Tableau Prep, Level of Detail calculations and Rankings. These calculations let you answer the questions your stakeholders have before you've even visualised anything. Sometimes all we need is the answer. If you're not using Prep for this challenge, I have hopefully made a challenge you can replicate too (you might just need to use a join). 

### Challenge
With the Prep Air - New Trolley Inventory project finally delivered at the end of May, we want to analyse what are the products that we are now selling for a much higher amount than we did before the project. We want to analyse the top three products based on price rise per destination.

### Input
One spreadsheet with 10 sheets, one sheet per month (Jan is month 1, Feb is month 2 etc)

![img](https://1.bp.blogspot.com/-KkflvnUjLZU/YKvdxeHrXVI/AAAAAAAACLU/c9omraL8U5AhyH8u6A-yFphCvXCxYvU9gCLcBGAsYHQ/w640-h150/Screenshot%2B2021-05-24%2Bat%2B18.09.04.png)

### Requirements
- Input data
- Bring all the sheets together
- Use the Day of Month and Table Names (sheet name in other tools) to form a date field for the purchase called 'Date'
- Create 'New Trolley Inventory?' field to show whether the purchase was made on or after 1st June 2021 (the first date with the revised inventory after the project closed)
- Remove lots of the detail of the product name:
    - Only return any names before the '-' (hyphen)
    - If a product doesn't have a hyphen return the full product name
- Make price a numeric field
- Work out the average selling price per product
- Workout the Variance (difference) between the selling price and the average selling price
- Rank the Variances (1 being the largest positive variance) per destination and whether the product was sold before or after the new trolley inventory project delivery
- Return only ranks 1-5 
- Output the data

### Output
We want to know which two products appeared more than once in the rankings and whether they were sold before or after the project delivery. Tweet us your answer!

![img](https://1.bp.blogspot.com/-NTuxXrdAUSU/YKvjS3y9jBI/AAAAAAAACLc/5T643zvPbf8G7NeS3k24uXhCwtKGsGKfQCLcBGAsYHQ/w640-h122/Screenshot%2B2021-05-24%2Bat%2B18.32.40.png)

One file:
11 fields:
- New Trolley Inventory
- Variance Rank by Destination (remember this also factors in the pre / post project delivery)
- Variance 
- Average Price per Product
- Date
- Product
- First name
- Last Name
- Email
- Price
- Destination

50 rows (51 rows including headers)

In [28]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [29]:
### Input data

In [30]:
data = pd.read_excel("./data/PD 2021 Wk 21 Input.xlsx", sheet_name=["Month 1", "Month 2", "Month 3", "Month 4", "Month 5",
                                                                    "Month 6", "Month 7", "Month 8", "Month 9", "Month 10"])

In [31]:
### Bring all the sheets together

In [32]:
files = []
for i in range(1, 11):
    file_name = "Month " + str(i)
    file = data[file_name].copy()
    files.append(file)

In [33]:
### Use the Day of Month and Table Names (sheet name in other tools) to form a date field for the purchase called 'Date'

In [34]:
for i in range(1, 11):
    files[i-1].loc[:, "Year"] = 2021
    files[i-1].loc[:, "Month"] = i
    files[i-1].loc[:, "Day of Month"] = files[i-1].loc[:, "Day of Month"].astype(int)

In [45]:
for i in range(0, 10):
    date = files[i].apply(lambda df_: str(df_["Year"]) + "/" + str(df_["Month"]) + "/" + str(df_["Day of Month"]), axis=1)
    files[i].loc[:, "Date"] = pd.to_datetime(date, format="%Y/%m/%d")

In [47]:
files[0].info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 10 columns):
 #   Column        Non-Null Count  Dtype         
---  ------        --------------  -----         
 0   Day of Month  1000 non-null   int32         
 1   first_name    1000 non-null   object        
 2   last_name     1000 non-null   object        
 3   email         1000 non-null   object        
 4   Product       1000 non-null   object        
 5   Price         1000 non-null   object        
 6   Destination   1000 non-null   object        
 7   Year          1000 non-null   int64         
 8   Month         1000 non-null   int64         
 9   Date          1000 non-null   datetime64[ns]
dtypes: datetime64[ns](1), int32(1), int64(2), object(6)
memory usage: 74.3+ KB


0       2021/1/9
1      2021/1/19
2      2021/1/25
3       2021/1/9
4      2021/1/21
         ...    
995     2021/1/2
996    2021/1/28
997    2021/1/11
998    2021/1/19
999     2021/1/3
Length: 1000, dtype: object