## 2021: Week 33 Excelling at adding one more row

If you've spent as long as I have in the data world, you will inevitably have had moments when your sophisticated tools are actually a lot harder to solve a challenge with than Excel. The people you work with are likely to describe challenges to you in Excel terms and expect your solutions to be able to follow the same process as their logic. It's not always that easy though. 

Last week when working with some client data (I've converted this to an Allchains example), my team was challenged to look at Orders captured in a weekly snapshot that was then exported into Excel. 

Each week the file would show any order that was still opened that hadn't been fulfilled (ie delivered to the customer). The challenge is to classify when an order is new (the first report it has appeared in), unfulfilled (when it appears in any subsequent reports) or completed (the week after the order last appears in a report). But what if we needed to know whether the order was fulfilled and when? 

In Excel, we'd stack of those rows of data on top of each other and just INSERT an extra row for each order after the last time it appears in a weekly snapshot. We don't have that ability to right-click and add the additional row in Prep so we need to think of some alternate logic. 

### Input
5 worksheets in one Excel file with the same format
![img](https://1.bp.blogspot.com/-ciSacUA9Css/YRLDuvVsyxI/AAAAAAAACPY/9htDlXATbpojtlX4mNHQmqgdBbaYvBKiwCLcBGAsYHQ/s320/Screenshot%2B2021-08-10%2Bat%2B19.18.54.png)

### Requirement
- Input the data
- Create one complete data set
- Use the Table Names field to create the Reporting Date
- Find the Minimum and Maximum date where an order appeared in the reports
- Add one week on to the maximum date to show when an order was fulfilled by
- Apply this logic:
    - The first time an order appears it should be classified as a 'New Order'
    - The week after the last time an order appears in a report (the maximum date) is when the order is classed as 'Fulfilled' 
    - Any week between 'New Order' and 'Fulfilled' status is classed as an 'Unfulfilled Order' 
- Pull of the data sets together 
- Remove any unnecessary fields
- Output the data

### Output
![img](https://1.bp.blogspot.com/-QAVqr4bOUQk/YRLTJYSNtyI/AAAAAAAACPg/uPmCoQale7cXWUbBzAdHveNsQ8Fxz4uQACLcBGAsYHQ/w640-h472/Screenshot%2B2021-08-10%2Bat%2B20.27.21.png)

4 data fields:
- Order status
- Orders
- Sales Date
- Reporting Date
35 Rows (36 rows including headers)

In [185]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Input the data

In [186]:
data = pd.read_excel("./data/Allchains Weekly Orders.xlsx", sheet_name=[0, 1, 2, 3, 4])

### Create one complete data set
### Use the Table Names field to create the Reporting Date

In [187]:
results = []
reporting_date = ["01/01/2021", "08/01/2021", "15/01/2021", "22/01/2021", "29/01/2021"]

for i in range(len(data.keys())):
    df = data[i].copy()
    df["Reporting Date"] = reporting_date[i]
    results.append(df)
df = pd.concat(results, axis=0)
df

Unnamed: 0,Orders,Sale Date,Reporting Date
0,A,2020-12-29,01/01/2021
1,B,2020-12-31,01/01/2021
2,C,2021-01-01,01/01/2021
0,B,2020-12-31,08/01/2021
1,C,2021-01-01,08/01/2021
2,D,2021-01-04,08/01/2021
3,E,2021-01-07,08/01/2021
4,F,2021-01-08,08/01/2021
0,B,2020-12-31,15/01/2021
1,D,2021-01-04,15/01/2021


### Find the Minimum and Maximum date where an order appeared in the reports
### Add one week on to the maximum date to show when an order was fulfilled by

In [188]:
df["Sale Date"] = pd.to_datetime(df["Sale Date"], format="%Y/%m/%d")
df["Reporting Date"] = pd.to_datetime(df["Reporting Date"], format="%d/%m/%Y")
df = df.sort_values(by=["Reporting Date", "Sale Date"], ascending=["True", "True"])
df = df.reset_index(drop=True)
df

Unnamed: 0,Orders,Sale Date,Reporting Date
0,A,2020-12-29,2021-01-01
1,B,2020-12-31,2021-01-01
2,C,2021-01-01,2021-01-01
3,B,2020-12-31,2021-01-08
4,C,2021-01-01,2021-01-08
5,D,2021-01-04,2021-01-08
6,E,2021-01-07,2021-01-08
7,F,2021-01-08,2021-01-08
8,B,2020-12-31,2021-01-15
9,D,2021-01-04,2021-01-15


In [189]:
### The first time an order appears it should be classified as a 'New Order'
new_orders = df.drop_duplicates(subset="Orders", keep="first")
new_orders.loc[:, "Order Status"] = "New Order"
new_orders = new_orders.drop(["Orders", "Sale Date", "Reporting Date"], axis=1)
new_orders

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = value


Unnamed: 0,Order Status
0,New Order
1,New Order
2,New Order
5,New Order
6,New Order
7,New Order
11,New Order
12,New Order
13,New Order
14,New Order


In [190]:
df = df.join(new_orders, how="left")
df

Unnamed: 0,Orders,Sale Date,Reporting Date,Order Status
0,A,2020-12-29,2021-01-01,New Order
1,B,2020-12-31,2021-01-01,New Order
2,C,2021-01-01,2021-01-01,New Order
3,B,2020-12-31,2021-01-08,
4,C,2021-01-01,2021-01-08,
5,D,2021-01-04,2021-01-08,New Order
6,E,2021-01-07,2021-01-08,New Order
7,F,2021-01-08,2021-01-08,New Order
8,B,2020-12-31,2021-01-15,
9,D,2021-01-04,2021-01-15,


In [191]:
# add one week to the maximum date
maximum_date = df.drop_duplicates(subset="Orders", keep="last")

In [192]:
days_7 = pd.Timedelta("7 days")
maximum_date["Reporting Date"] = maximum_date["Reporting Date"] + days_7
maximum_date

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,Orders,Sale Date,Reporting Date,Order Status
0,A,2020-12-29,2021-01-08,New Order
4,C,2021-01-01,2021-01-15,
6,E,2021-01-07,2021-01-15,New Order
10,F,2021-01-08,2021-01-22,
12,H,2021-01-14,2021-01-22,New Order
13,I,2021-01-15,2021-01-22,New Order
16,D,2021-01-04,2021-01-29,
17,G,2021-01-12,2021-01-29,
18,J,2021-01-15,2021-01-29,
21,B,2020-12-31,2021-02-05,


In [193]:
to_drop_idx = maximum_date[maximum_date["Order Status"] == "New Order"].index
maximum_date = maximum_date.drop(to_drop_idx)
maximum_date.loc[:, "Order Status"] = "Fulfilled"
maximum_date

Unnamed: 0,Orders,Sale Date,Reporting Date,Order Status
4,C,2021-01-01,2021-01-15,Fulfilled
10,F,2021-01-08,2021-01-22,Fulfilled
16,D,2021-01-04,2021-01-29,Fulfilled
17,G,2021-01-12,2021-01-29,Fulfilled
18,J,2021-01-15,2021-01-29,Fulfilled
21,B,2020-12-31,2021-02-05,Fulfilled
22,K,2021-01-19,2021-02-05,Fulfilled
23,L,2021-01-20,2021-02-05,Fulfilled


In [194]:
# The week after the last time an order appears in a report (the maximum date) is when the order is classed as 'Fulfilled' 
df = pd.concat([df, maximum_date], axis=0)
df = df.sort_values(by=["Orders", "Sale Date", "Reporting Date"])
df

Unnamed: 0,Orders,Sale Date,Reporting Date,Order Status
0,A,2020-12-29,2021-01-01,New Order
1,B,2020-12-31,2021-01-01,New Order
3,B,2020-12-31,2021-01-08,
8,B,2020-12-31,2021-01-15,
15,B,2020-12-31,2021-01-22,
21,B,2020-12-31,2021-01-29,
21,B,2020-12-31,2021-02-05,Fulfilled
2,C,2021-01-01,2021-01-01,New Order
4,C,2021-01-01,2021-01-08,
4,C,2021-01-01,2021-01-15,Fulfilled


In [195]:
# Any week between 'New Order' and 'Fulfilled' status is classed as an 'Unfulfilled Order' 
df = df.fillna("Unfulfilled Order")
df = df.reset_index(drop=True)
df

Unnamed: 0,Orders,Sale Date,Reporting Date,Order Status
0,A,2020-12-29,2021-01-01,New Order
1,B,2020-12-31,2021-01-01,New Order
2,B,2020-12-31,2021-01-08,Unfulfilled Order
3,B,2020-12-31,2021-01-15,Unfulfilled Order
4,B,2020-12-31,2021-01-22,Unfulfilled Order
5,B,2020-12-31,2021-01-29,Unfulfilled Order
6,B,2020-12-31,2021-02-05,Fulfilled
7,C,2021-01-01,2021-01-01,New Order
8,C,2021-01-01,2021-01-08,Unfulfilled Order
9,C,2021-01-01,2021-01-15,Fulfilled


### Output the data

In [196]:
df.to_csv("./output/Week33_output.csv")