## 2021: Week 26 Rolling Weekly Revenue

For any analyst, you are likely to get asked to use some complex calculations. For me, rolling or moving calculations are one that I've always preferred to do in the data preparation step rather than when trying to visualise the data if possible. It's saved me from some mistakes! 

This week's challenge is looking at creating moving calculations. By this let's use the example below, where on 5th January (yes British date format), if we wanted to understand a rolling week's values, you can include 3 days before the 5th (ie the 2nd, 3rd and 4th) as well as 3 days after the 5th (ie the 6th, 7th and 8th). 

![img](https://1.bp.blogspot.com/-MoKLMsFheJg/YNGknU47ZfI/AAAAAAAACNg/vnU1FF_3TDMZaulc0jLrkm8iE8zkR1ceACLcBGAsYHQ/s320/Screenshot%2B2021-06-22%2Bat%2B09.51.16.png)

Clearly you need to define what your rolling period should include or not. A rolling week could look backwards for 6 days inclusive of the current date or 7 days if you don't use the current date. You could look the same period forward but ultimately you have to articulate what you are covering to your audience. The nature of the data might also influence the decision you are taking. 

### Challenge

Create a rolling weekly total and average for each Prep Air destination and an overall number for all destinations. The rolling week is as detailed above, 3 days before and 3 days after a date as well as that day itself. 

### Input
![img](https://1.bp.blogspot.com/-2VNBiBM-aU4/YNGys3bL8aI/AAAAAAAACNo/_eZm3QyNKWcjA-c644nnWIoe7y6cLWPUQCLcBGAsYHQ/w400-h251/Screenshot%2B2021-06-22%2Bat%2B10.51.33.png)

### Requirment
- Input data
- Create a data set that gives 7 rows per date (unless those dates aren't included in the data set). 
    - ie 1st Jan only has 4 rows of data (1st, 2nd, 3rd & 4th)
- Remove any additional fields you don't need 
- Create the Rolling Week Total and Rolling Week Average per destination
    - Records that have less than 7 days data should remain included in the output
- Create the Rolling Week Total and Rolling Week Average for the whole data set
- Pull the data together for the previous two requirements
- Output the data

### Output
![img](https://1.bp.blogspot.com/-0-VO01JEnew/YNGy3KFpaSI/AAAAAAAACNs/8cCDJ8226hInO7sbhjVBj2md2eD0LQEbQCLcBGAsYHQ/w400-h169/Screenshot%2B2021-06-22%2Bat%2B10.52.12.png)

One table:

Four data fields:
- Destination
- Date
- Rolling Week Avg
- Rolling Week Total


360 rows (361 including headers)

In [72]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [73]:
df = pd.read_csv("./data/PD 2021 Wk 26 Input - Sheet1.csv")

In [74]:
df[df["Date"] == "01/01/2021"]

Unnamed: 0,Destination,Date,Revenue
0,London,01/01/2021,232572
90,New York,01/01/2021,236769
180,Perth,01/01/2021,137371


In [75]:
df[df["Destination"] == "London"]

Unnamed: 0,Destination,Date,Revenue
0,London,01/01/2021,232572
1,London,02/01/2021,105610
2,London,03/01/2021,149849
3,London,04/01/2021,164463
4,London,05/01/2021,129130
...,...,...,...
85,London,27/03/2021,166969
86,London,28/03/2021,243855
87,London,29/03/2021,163782
88,London,30/03/2021,127081


In [77]:
london = df.loc[df["Destination"] == "London", : ]
london.loc[:, "Date"] = pd.to_datetime(london.loc[:, "Date"], format="%d/%m/%Y")
london.sort_values(by="Date", inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(ilocs[0], value, pi)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return func(*args, **kwargs)


In [78]:
rolling_avg = london.set_index("Date")["Revenue"].rolling(window=7, center=True, min_periods=3).mean()
rolling_avg.name = "Rolling Week Avg"

In [80]:
london = london.set_index("Date").join(rolling_avg, how="left")
london

Unnamed: 0_level_0,Destination,Revenue,Rolling Week Avg
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2021-01-01,London,232572,163123.500000
2021-01-02,London,105610,156324.800000
2021-01-03,London,149849,168986.666667
2021-01-04,London,164463,159290.857143
2021-01-05,London,129130,149116.142857
...,...,...,...
2021-03-27,London,166969,160212.000000
2021-03-28,London,243855,168414.714286
2021-03-29,London,163782,173854.333333
2021-03-30,London,127081,180690.800000


In [81]:
newyork = df.loc[df["Destination"] == "New York", : ]
newyork.loc[:, "Date"] = pd.to_datetime(newyork.loc[:, "Date"], format="%d/%m/%Y")
newyork.sort_values(by="Date", inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(ilocs[0], value, pi)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return func(*args, **kwargs)


In [82]:
rolling_avg = newyork.set_index("Date")["Revenue"].rolling(window=7, center=True, min_periods=3).mean()
rolling_avg.name = "Rolling Week Avg"

In [83]:
newyork = newyork.set_index("Date").join(rolling_avg, how="left")
newyork

Unnamed: 0_level_0,Destination,Revenue,Rolling Week Avg
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2021-01-01,New York,236769,210739.250000
2021-01-02,New York,192002,215581.400000
2021-01-03,New York,227616,202301.833333
2021-01-04,New York,186570,200880.857143
2021-01-05,New York,234950,191463.714286
...,...,...,...
2021-03-27,New York,128248,178532.285714
2021-03-28,New York,144219,174800.428571
2021-03-29,New York,165175,163330.500000
2021-03-30,New York,235352,170670.200000


In [84]:
perth = df.loc[df["Destination"] == "Perth", : ]
perth.loc[:, "Date"] = pd.to_datetime(perth.loc[:, "Date"], format="%d/%m/%Y")
perth.sort_values(by="Date", inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(ilocs[0], value, pi)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return func(*args, **kwargs)


In [85]:
rolling_avg = perth.set_index("Date")["Revenue"].rolling(window=7, center=True, min_periods=3).mean()
rolling_avg.name = "Rolling Week Avg"

In [86]:
perth = perth.set_index("Date").join(rolling_avg, how="left")
perth

Unnamed: 0_level_0,Destination,Revenue,Rolling Week Avg
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2021-01-01,Perth,137371,112465.250000
2021-01-02,Perth,127151,115022.200000
2021-01-03,Perth,58638,108085.000000
2021-01-04,Perth,126701,102687.142857
2021-01-05,Perth,125250,104282.142857
...,...,...,...
2021-03-27,Perth,117829,100556.428571
2021-03-28,Perth,98244,102836.428571
2021-03-29,Perth,88599,95822.166667
2021-03-30,Perth,80576,98884.200000


In [88]:
df = pd.concat([london, newyork, perth], axis=0)
df

Unnamed: 0_level_0,Destination,Revenue,Rolling Week Avg
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2021-01-01,London,232572,163123.500000
2021-01-02,London,105610,156324.800000
2021-01-03,London,149849,168986.666667
2021-01-04,London,164463,159290.857143
2021-01-05,London,129130,149116.142857
...,...,...,...
2021-03-27,Perth,117829,100556.428571
2021-03-28,Perth,98244,102836.428571
2021-03-29,Perth,88599,95822.166667
2021-03-30,Perth,80576,98884.200000


In [91]:
all_total_rolling_week = df.groupby(["Date"])["Revenue"].sum().rolling(window=7, center=True, min_periods=3).sum()
all_avg_rolling_week = df.groupby(["Date"])["Revenue"].sum().rolling(window=7, center=True, min_periods=3).mean()

Date
2021-01-01    1945312.0
2021-01-02    2434642.0
2021-01-03    2876241.0
2021-01-04    3240012.0
2021-01-05    3114034.0
                ...    
2021-03-27    3075105.0
2021-03-28    3122361.0
2021-03-29    2598042.0
2021-03-30    2251226.0
2021-03-31    1838180.0
Name: Revenue, Length: 90, dtype: float64