## 2021: Week 4

This is the last in the 'Starter Challenges' series to get you up and Preppin' to start the new year. We've enjoyed running this mini-series so much we're already looking at creating another similar series later in the year. 

This week's challenge involves picking up some more of the fundamental skills and gives you some chances to practice some of the skills you've picked up over the last few weeks. As always, we'll be guiding you along the way with some useful help links if you need a couple of reminders or chance to explore those new techniques. 

The new technique for you to learn this week is Joins. If you've worked with different data solutions for a number of years, you'll be familiar with Joins but if you are new you're in for a treat! Joins allow us to bring two data sources together. This allows for much easier, richer and deeper analysis as data is often in many different locations. Use the help links if this is a new technique for you. Joins are one of the harder concepts to pick up so make sure you've set aside a good amount of time to explore.

### Input

![img](https://1.bp.blogspot.com/-sJ7fitvkMMk/YA69_pgQbII/AAAAAAAACHk/NQ8TVmd0fLcGOyipZhPWl5_VTsfvZIfhgCLcBGAsYHQ/w640-h192/Screenshot%2B2021-01-25%2Bat%2B12.47.37.png)

What's new is there is also a set of Quarterly Targets that each store is expected to achieve.

### Requirements

- Input the file 
- Union the Stores data together
- Remove any unnecessary data fields your Input step might create and rename the 'Table Names' as 'Store' 
- Pivot the product columns 
- Split the 'Customer Type - Product' field to create: 
    - Customer Type
    - Product
    - Also rename the Values column resulting from you pivot as 'Products Sold'
- Turn the date into a 'Quarter' number 
- Sum up the products sold by Store and Quarter 
- Add the Targets data 
- Join the Targets data with the aggregated Stores data 
    - Note: this should give you 20 rows of data
- Remove any duplicate fields formed by the Join
- Calculate the Variance between each Store's Quarterly actual sales and the target. Call this field 'Variance to Target' 
- Rank the Store's based on the Variance to Target in each quarter 
    - The greater the variance the better the rank
- Output the data 

### Output

![img2](https://1.bp.blogspot.com/-h9JR5XeRNBM/YA6_dbPM22I/AAAAAAAACH4/C0D2jBnLwgcv-YEPX0YTq__u2yPkY5CwQCLcBGAsYHQ/w640-h246/Screenshot%2B2021-01-25%2Bat%2B12.42.54.png)

6 Data Fields:

- Quarter
- Rank
- Store
- Products Sold
- Target 
- Variance to Target

20 Rows (21 rows including headers)

In [185]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
import plotly.express as px

In [186]:
# Input the data
data = pd.read_excel("./data/PD 2021 Wk 4 Input.xlsx", sheet_name=["Manchester", "London", "Leeds", "York", "Birmingham", "Targets"])
data.keys()

dict_keys(['Manchester', 'London', 'Leeds', 'York', 'Birmingham', 'Targets'])

In [187]:
manchester = pd.DataFrame(data["Manchester"])
london = pd.DataFrame(data["London"])
leeds = pd.DataFrame(data["Leeds"])
york = pd.DataFrame(data["York"])
birmingham = pd.DataFrame(data["Birmingham"])
targets = pd.DataFrame(data["Targets"])

In [188]:
# Pivot the product columns
manchester = manchester.melt(id_vars="Date", var_name="Customer_Product",
                             value_name="Amount")

# Split the 'Customer Type - Product' field to create: Customer Type, Product
customer_product = manchester["Customer_Product"].str.split("-", expand=True).rename(columns={0:"Customer", 1:"Product"})
manchester["Customer Type"] = customer_product["Customer"]
manchester["Product"] = customer_product["Product"]
manchester = manchester.rename(columns={"Amount":"Products Sold"})
manchester

Unnamed: 0,Date,Customer_Product,Products Sold,Customer Type,Product
0,2021-01-21,New - Saddles,13.0,New,Saddles
1,2021-02-21,New - Saddles,1.0,New,Saddles
2,2021-03-21,New - Saddles,8.0,New,Saddles
3,2021-04-21,New - Saddles,3.0,New,Saddles
4,2021-05-21,New - Saddles,2.0,New,Saddles
...,...,...,...,...,...
91,2021-08-21,Existing - Bags,9.0,Existing,Bags
92,2021-09-21,Existing - Bags,23.0,Existing,Bags
93,2021-10-21,Existing - Bags,7.0,Existing,Bags
94,2021-11-21,Existing - Bags,0.0,Existing,Bags


In [189]:
# Turn the date into s Quarter number
manchester["Quarter"] = pd.to_datetime(manchester["Date"]).dt.quarter

In [190]:
# Remove any unnecessary data fields your Input step might create and rename the 'Table Names' as 'Store' 
manchester = manchester.drop(["Customer_Product"], axis=1)

In [191]:
# Sum up the products sold by Store and Quarter
def tidy_data(df, store_name):
    df = df.melt(id_vars="Date", var_name="Customer_Product", value_name="Products Sold")
    customer_product = df["Customer_Product"].str.split("-", expand=True).rename(columns={0:"Customer", 1:"Product"})
    df["Customer Type"] = customer_product["Customer"]
    df["Product"] = customer_product["Product"]
    df["Store"] = str(store_name)
    df = df.drop("Customer_Product", axis=1)
    df["Quarter"] = pd.to_datetime(df["Date"]).dt.quarter
    return df

In [192]:
results = []
for key, value in data.items():
    if key == "Targets":
        break
    df = pd.DataFrame(value)
    tmp = tidy_data(df, key)
    results.append(tmp)
df = pd.concat(results, axis=0)

In [193]:
grouped = df.groupby(["Store", "Quarter"])["Products Sold"].sum().reset_index()
grouped

Unnamed: 0,Store,Quarter,Products Sold
0,Birmingham,1,477.0
1,Birmingham,2,346.0
2,Birmingham,3,348.0
3,Birmingham,4,404.0
4,Leeds,1,488.0
5,Leeds,2,331.0
6,Leeds,3,279.0
7,Leeds,4,349.0
8,London,1,425.0
9,London,2,324.0


In [194]:
# Add the Targets data & Join the Targets data with the aggregated Stores data
grouped = grouped.merge(targets, on=["Quarter", "Store"])
grouped

Unnamed: 0,Store,Quarter,Products Sold,Target
0,Birmingham,1,477.0,475.0
1,Birmingham,2,346.0,325.0
2,Birmingham,3,348.0,300.0
3,Birmingham,4,404.0,400.0
4,Leeds,1,488.0,490.0
5,Leeds,2,331.0,325.0
6,Leeds,3,279.0,300.0
7,Leeds,4,349.0,400.0
8,London,1,425.0,475.0
9,London,2,324.0,325.0


In [195]:
# Calculate the Variance between each Store's Quarterly actual sales and the target. Call this field 'Variance to Target'
grouped["Variance to Target"] = grouped["Products Sold"] - grouped["Target"]
grouped

Unnamed: 0,Store,Quarter,Products Sold,Target,Variance to Target
0,Birmingham,1,477.0,475.0,2.0
1,Birmingham,2,346.0,325.0,21.0
2,Birmingham,3,348.0,300.0,48.0
3,Birmingham,4,404.0,400.0,4.0
4,Leeds,1,488.0,490.0,-2.0
5,Leeds,2,331.0,325.0,6.0
6,Leeds,3,279.0,300.0,-21.0
7,Leeds,4,349.0,400.0,-51.0
8,London,1,425.0,475.0,-50.0
9,London,2,324.0,325.0,-1.0


In [196]:
# Rank the Store's based on the Variance to Target in each quarter (The greater the variance the better the rank)
grouped["Rank"] = grouped.groupby(["Quarter"])["Variance to Target"].rank(method="min", ascending=False).astype(int)
grouped

Unnamed: 0,Store,Quarter,Products Sold,Target,Variance to Target,Rank
0,Birmingham,1,477.0,475.0,2.0,2
1,Birmingham,2,346.0,325.0,21.0,2
2,Birmingham,3,348.0,300.0,48.0,2
3,Birmingham,4,404.0,400.0,4.0,3
4,Leeds,1,488.0,490.0,-2.0,3
5,Leeds,2,331.0,325.0,6.0,3
6,Leeds,3,279.0,300.0,-21.0,4
7,Leeds,4,349.0,400.0,-51.0,5
8,London,1,425.0,475.0,-50.0,5
9,London,2,324.0,325.0,-1.0,4


In [197]:
ranking_index = grouped.groupby(["Quarter"])["Rank"].apply(lambda ser_: ser_.sort_values()).reset_index()["level_1"]
ranking_index = ranking_index.values

In [198]:
grouped.index = ranking_index
final_output = grouped.sort_values(["Quarter", "Rank"], ascending=True)
final_output = final_output.loc[:, ["Quarter", "Rank", "Store", "Products Sold", "Target", "Variance to Target"]]
final_output = final_output.reset_index(drop=True)
final_output

Unnamed: 0,Quarter,Rank,Store,Products Sold,Target,Variance to Target
0,1,1,York,499.0,490.0,9.0
1,1,2,Birmingham,477.0,475.0,2.0
2,1,3,Leeds,488.0,490.0,-2.0
3,1,4,Manchester,440.0,475.0,-35.0
4,1,5,London,425.0,475.0,-50.0
5,2,1,York,329.0,300.0,29.0
6,2,2,Birmingham,346.0,325.0,21.0
7,2,3,Leeds,331.0,325.0,6.0
8,2,4,London,324.0,325.0,-1.0
9,2,5,Manchester,288.0,300.0,-12.0


In [199]:
final_output.to_csv("./output/Week4_output.csv")