## 2021: Week 21 Getting Trolleyed

Our final challenge for calculations month is all about the Analytical Calculations in Tableau Prep, Level of Detail calculations and Rankings. These calculations let you answer the questions your stakeholders have before you've even visualised anything. Sometimes all we need is the answer. If you're not using Prep for this challenge, I have hopefully made a challenge you can replicate too (you might just need to use a join). 

### Challenge
With the Prep Air - New Trolley Inventory project finally delivered at the end of May, we want to analyse what are the products that we are now selling for a much higher amount than we did before the project. We want to analyse the top three products based on price rise per destination.

### Input
One spreadsheet with 10 sheets, one sheet per month (Jan is month 1, Feb is month 2 etc)

![img](https://1.bp.blogspot.com/-KkflvnUjLZU/YKvdxeHrXVI/AAAAAAAACLU/c9omraL8U5AhyH8u6A-yFphCvXCxYvU9gCLcBGAsYHQ/w640-h150/Screenshot%2B2021-05-24%2Bat%2B18.09.04.png)

### Requirements
- Input data
- Bring all the sheets together
- Use the Day of Month and Table Names (sheet name in other tools) to form a date field for the purchase called 'Date'
- Create 'New Trolley Inventory?' field to show whether the purchase was made on or after 1st June 2021 (the first date with the revised inventory after the project closed)
- Remove lots of the detail of the product name:
    - Only return any names before the '-' (hyphen)
    - If a product doesn't have a hyphen return the full product name
- Make price a numeric field
- Work out the average selling price per product
- Workout the Variance (difference) between the selling price and the average selling price
- Rank the Variances (1 being the largest positive variance) per destination and whether the product was sold before or after the new trolley inventory project delivery
- Return only ranks 1-5 
- Output the data

### Output
We want to know which two products appeared more than once in the rankings and whether they were sold before or after the project delivery. Tweet us your answer!

![img](https://1.bp.blogspot.com/-NTuxXrdAUSU/YKvjS3y9jBI/AAAAAAAACLc/5T643zvPbf8G7NeS3k24uXhCwtKGsGKfQCLcBGAsYHQ/w640-h122/Screenshot%2B2021-05-24%2Bat%2B18.32.40.png)

One file:
11 fields:
- New Trolley Inventory
- Variance Rank by Destination (remember this also factors in the pre / post project delivery)
- Variance 
- Average Price per Product
- Date
- Product
- First name
- Last Name
- Email
- Price
- Destination

50 rows (51 rows including headers)

In [320]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Input data

In [321]:
data = pd.read_excel("./data/PD 2021 Wk 21 Input.xlsx", sheet_name=["Month 1", "Month 2", "Month 3", "Month 4", "Month 5",
                                                                    "Month 6", "Month 7", "Month 8", "Month 9", "Month 10"])

### Bring all the sheets together

In [323]:
files = []
for i in range(1, 11):
    file_name = "Month " + str(i)
    file = data[file_name].copy()
    files.append(file)

### Use the Day of Month and Table Names (sheet name in other tools) to form a date field for the purchase called 'Date'

In [324]:
for i in range(1, 11):
    files[i-1].loc[:, "Year"] = 2021
    files[i-1].loc[:, "Month"] = i
    files[i-1].loc[:, "Day of Month"] = files[i-1].loc[:, "Day of Month"].astype(int)

In [325]:
for i in range(0, 10):
    date = files[i].apply(lambda df_: str(df_["Year"]) + "/" + str(df_["Month"]) + "/" + str(df_["Day of Month"]), axis=1)
    files[i].loc[:, "Date"] = pd.to_datetime(date, format="%Y/%m/%d")

In [326]:
df = pd.concat(files, axis=0)
df.shape

(10000, 10)

### Create 'New Trolley Inventory?' field to show whether the purchase was made on or after 1st June 2021 (the first date with the revised inventory after the project closed)

In [327]:
df.loc[:, "New Trolley Inventory?"] = False
df.loc[df["Date"] >= "2021-06-01", "New Trolley Inventory?"] = True

### Remove lots of the detail of the product name:
- Only return any names before the '-' (hyphen)
- If a product doesn't have a hyphen return the full product name

In [328]:
df.head()

Unnamed: 0,Day of Month,first_name,last_name,email,Product,Price,Destination,Year,Month,Date,New Trolley Inventory?
0,9,Daffie,Clemont,dclemont0@unc.edu,Emulsifier,$10.14,New York,2021,1,2021-01-09,False
1,19,Lucio,Muzzall,lmuzzall1@dell.com,Chambord Royal,$33.89,London,2021,1,2021-01-19,False
2,25,Corbie,Shrigley,cshrigley2@sourceforge.net,Apples - Sliced / Wedge,$1.64,Perth,2021,1,2021-01-25,False
3,9,Sioux,Couth,scouth3@bluehost.com,Vinegar - White Wine,$19.84,Paris,2021,1,2021-01-09,False
4,21,Almira,Rickards,arickards4@godaddy.com,Food Colouring - Pink,$20.15,Edinburgh,2021,1,2021-01-21,False


In [329]:
import re
def product_name_preprocessing(df_):
    index_true = df_["Product"].map(lambda x: bool(re.findall(r"[-]+", x)))
    df_.loc[index_true, "Product"] = df_.loc[index_true, "Product"].map(lambda x: x.split("-")[0])
    return df_

In [330]:
df = product_name_preprocessing(df)
df["Product"].value_counts().head()

Wine                        692
Cheese                      296
Bread                       265
Soup                        194
Beef                        179
                           ... 
Soup Bowl Clear 8oz92008      1
Ham Black Forest              1
Iced Tea Concentrate          1
Maple Syrup                   1
Creme De Cacao Mcguines       1
Name: Product, Length: 1028, dtype: int64

### Make price a numeric field

In [331]:
df["Price"] = df["Price"].str.replace("$", "").astype(float)

  """Entry point for launching an IPython kernel.


### Work out the average selling price per product

In [332]:
group_avg = df.groupby(["Product"])["Price"].mean()
group_avg.head()

Product
7up Diet, 355 Ml      27.690000
Absolut Citron        23.888000
Alize Gold Passion    18.872000
Alize Red Passion     27.090000
Alize Sunset          19.880000
                        ...    
Yoplait               29.397500
Yoplait Drink         15.756667
Yucca                 19.820000
Yukon Jack            26.597500
Zucchini              24.514444
Name: Price, Length: 1028, dtype: float64

### Workout the Variance (difference) between the selling price and the average selling price

In [333]:
df = df.merge(group_avg, how="left", left_on="Product", right_on=group_avg.index).rename(columns={"Price_x" : "Price",
                                                                                                  "Price_y" : "Avg Price per Product"})
df["Variance"] = abs(df["Price"] - df["Avg Price per Product"])
df.head()

Unnamed: 0,Day of Month,first_name,last_name,email,Product,Price,Destination,Year,Month,Date,New Trolley Inventory?,Avg Price per Product,Variance
0,9,Daffie,Clemont,dclemont0@unc.edu,Emulsifier,10.14,New York,2021,1,2021-01-09,False,21.6,11.46
1,19,Lucio,Muzzall,lmuzzall1@dell.com,Chambord Royal,33.89,London,2021,1,2021-01-19,False,16.034,17.856
2,25,Corbie,Shrigley,cshrigley2@sourceforge.net,Apples,1.64,Perth,2021,1,2021-01-25,False,19.782,18.142
3,9,Sioux,Couth,scouth3@bluehost.com,Vinegar,19.84,Paris,2021,1,2021-01-09,False,23.054821,3.214821
4,21,Almira,Rickards,arickards4@godaddy.com,Food Colouring,20.15,Edinburgh,2021,1,2021-01-21,False,26.033333,5.883333


### Rank the Variances (1 being the largest positive variance) per destination and whether the product was sold before or after the new trolley inventory project delivery

In [334]:
df["Destination"] = df["Destination"].str.strip()
rank = df.groupby(["Destination", "New Trolley Inventory?"])["Variance"].rank(method="dense", ascending=False).astype(int)
df["Rank"] = rank

### Return only ranks 1-5 

In [335]:
output = df[df["Rank"].isin([1, 2, 3, 4, 5])]
output = output.reset_index(drop=True)
output.shape

(50, 14)

### Output the data

In [336]:
output = output.drop(["Day of Month", "Year", "Month"], axis=1)
output = output.rename(columns={"Rank" : "Variance Rank by Destination"})
output.head()

Unnamed: 0,first_name,last_name,email,Product,Price,Destination,Date,New Trolley Inventory?,Avg Price per Product,Variance,Variance Rank by Destination
0,Kizzie,Bruggeman,kbruggemanq@thetimes.co.uk,Foil Wrap,0.91,New York,2021-01-09,False,42.611667,41.701667,1
1,Leslie,Streight,lstreight22@ed.gov,Hot Choc Vending,1.08,Perth,2021-01-19,False,29.518,28.438,3
2,Ingemar,Burgiss,iburgissbp@howstuffworks.com,Black Currants,2.91,Paris,2021-01-07,False,40.9025,37.9925,1
3,Catherina,Eymer,ceymere6@facebook.com,Eggplant Italian,3.73,Paris,2021-01-23,False,35.298333,31.568333,2
4,Fairfax,Raikes,fraikesox@twitpic.com,Rabbit,1.15,Edinburgh,2021-01-22,False,26.529565,25.379565,3


In [337]:
output.to_csv("./output/Week21_output.csv")