## 2021: Week 31 Excelling in Prep

When you are working with data in most organisations, you will frequently come across requests from Excel users using Excel terms. This week's challenge looks at term that is ubiquitous with Excel - a pivot table. 

Pivot tables are often contain summarised data values, have totals and filter out certain parts of the data set. The challenge this week will be to take an input and create a pivot table. Pivot tables are likely to be structured differently to most of our analytical outputs.

### Input
The weekly sales of Bike Components from Preppin's bike store Allchains is what we are analysing. The returns are where the product has been deemed faulty before it's sold. 

![img](https://1.bp.blogspot.com/-dk0JLWbsnSc/YQl4-TLNElI/AAAAAAAACO4/4Rpsr-zZ5HoSqyYShmxv0pgfUwb-fgguwCLcBGAsYHQ/w640-h262/Screenshot%2B2021-08-03%2Bat%2B18.11.28.png)

### Requirement
- Input data
- Remove the 'Return to Manufacturer' records
- Create a total for each Store of all the items sold (help)
- Aggregate the data to Store sales by Item
- Output the data

### Output
![img](https://1.bp.blogspot.com/-zAcqRtDPLE0/YQl8uhM-dUI/AAAAAAAACPA/Q5_ORFfX0pI6KNOr5q4qAPPadn30PtRpgCLcBGAsYHQ/w640-h108/Screenshot%2B2021-08-03%2Bat%2B18.28.07.png)

6 columns:
- Items sold per store
- Wheels
- Tyres
- Saddles
- Brakes
- Store

4 rows of data (5 rows including header)

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Input data

In [3]:
df = pd.read_csv("./data/PD 2021 Wk 31 Input.csv")
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 71 entries, 0 to 70
Data columns (total 5 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Date             71 non-null     object
 1   Store            71 non-null     object
 2   Item             71 non-null     object
 3   Status           71 non-null     object
 4   Number of Items  71 non-null     int64 
dtypes: int64(1), object(4)
memory usage: 2.9+ KB


In [4]:
df.head()

Unnamed: 0,Date,Store,Item,Status,Number of Items
0,05/07/2021,Bristol,Saddles,Sold,3
1,05/07/2021,York,Saddles,Sold,3
2,05/07/2021,Wimbledon,Saddles,Sold,1
3,05/07/2021,Wimbledon,Saddles,Return to Manufacturer,1
4,05/07/2021,Stratford,Saddles,Sold,2


### Remove the 'Return to Manufacturer' records

In [8]:
return_to_manufac_idx = df.loc[df["Status"] == "Return to Manufacturer"].index
df = df.drop(return_to_manufac_idx, axis=0)
df.shape

(64, 5)

### Create a total for each Store of all the items sold

In [15]:
pivot_table = df.pivot_table(index="Store", values="Number of Items", aggfunc="sum")
pivot_table = pivot_table.reset_index()
pivot_table = pivot_table.rename(columns={"Number of Items": "Items sold per store"})
pivot_table

Unnamed: 0,Store,Items sold per store
0,Bristol,30
1,Stratford,29
2,Wimbledon,30
3,York,28


### Aggregate the data to Store sales by Item

In [30]:
grouped = df.groupby(["Store", "Item"])["Number of Items"].sum().unstack()
grouped.columns.name = None
grouped = grouped.reset_index()
grouped

Unnamed: 0,Store,Brakes,Saddles,Tyres,Wheels
0,Bristol,7,6,9,8
1,Stratford,10,8,6,5
2,Wimbledon,10,5,8,7
3,York,6,9,6,7


In [33]:
output = pivot_table.merge(grouped, how="inner", on="Store")
output = output.loc[:, ["Items sold per store", "Wheels", "Tyres", "Saddles", "Brakes", "Store"]]
output

Unnamed: 0,Items sold per store,Wheels,Tyres,Saddles,Brakes,Store
0,30,8,9,6,7,Bristol
1,29,5,6,8,10,Stratford
2,30,7,8,5,10,Wimbledon
3,28,7,6,9,6,York


### Output the data

In [34]:
output.to_csv("./output/Week31_output.csv")