### Pandas Lab: Time Shifts & Multi Level Indexing

This lab is designed to introduce you to working with time in a more granular way, and understanding how to build features when your data has hierarchies or panels.  

Ie, when you have repeated observations for the same objects.  This is an important concept because lots of statistical methods don't explicitly account for values which might naturally be correlated with one another over time.  

But lots of data **is** highly correlated over time!  

By the time you're done with this lab, you'll have built 10 columns that capture a variety of information about how an observed value is changing with respect to itself.

**Question 1:** To capture some other aspects of dates, create columns in your dataset that capture the following aspects of each timestamp:

  - What quarter it's in
  - What month it's in
  - What year it's in
  - The number of days passed in the `visit_date` column

If you want to try adding different pandas date parts, you can find them here:  https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#time-date-components

In [1]:
# your answer here

**Question 2:** Time Series Embedding

Lots of times if you're trying to predict the value of something tomorrow, the most import piece of information is what the value of something is today, and yesterday, and so on.

However, your data won't really "know" about those values unless they can be observed alongside the current observation.

To that end, make three columns that capture the value of the following:

 - What the previous recorded attendance for the previous observation
 - The attendance from two observations ago
 - The attendance from 7 observations ago (ie, week over week)

In [181]:
# your answer here

**Bonus:** Only attempt this one if you have extra time at the end of the entire lab.  

The `shift` method goes back a fixed number of rows, but not a fixed *amount of time.*  However, if you'd like you can in fact shift back a fixed amount of time.  

To do this you need access to two tools:
 - a date offset, which can be via the `pd.DateOffset` method
 - the `freq` argument, where it gets passed into
 
Here's a quick example of how it might work:

In [7]:
# this is some fake data that we are generating for the example
import pandas as pd
import numpy as np
dates = pd.date_range(start='2021-01-01', periods=300)
vals  = pd.DataFrame({
    'Date': dates,
    'Val': [np.random.choice(500) for i in range(300)],
})

# and here's what they look like
vals

Unnamed: 0,Date,Val
0,2021-01-01,234
1,2021-01-02,423
2,2021-01-03,275
3,2021-01-04,246
4,2021-01-05,494
...,...,...
295,2021-10-23,41
296,2021-10-24,65
297,2021-10-25,174
298,2021-10-26,488


In [8]:
# next we'll create a date offset
offset = pd.DateOffset(days=3)
# you can then use this to perform date math in various ways -- notice the days were shifted up 3 days
vals['Date'] + offset

0     2021-01-04
1     2021-01-05
2     2021-01-06
3     2021-01-07
4     2021-01-08
         ...    
295   2021-10-26
296   2021-10-27
297   2021-10-28
298   2021-10-29
299   2021-10-30
Name: Date, Length: 300, dtype: datetime64[ns]

In [14]:
# now, if you want to, you can pass in the offset into the shift method
# notice you need to set the Date column as the index
vals.set_index('Date')['Val'].shift(freq=offset)

Date
2021-01-04    234
2021-01-05    423
2021-01-06    275
2021-01-07    246
2021-01-08    494
             ... 
2021-10-26     41
2021-10-27     65
2021-10-28    174
2021-10-29    488
2021-10-30    481
Name: Val, Length: 300, dtype: int64

You might notice that this method changes *the index* vs. changing the actual values.  This means you'd need to merge these values back into the original dataset to match each value with its offset.

If you are feeling ambitious, try re-creating the previous columns using this method instead.

In [None]:
# your answer here

**Question 3:** Window Statistics

Lots of times, we want to capture some idea of momentum, or how some value changes with what's usually observed.

Ie, if we had 48 purchases in a store today, how does that number compare to what's happened in the last 14 days?  Are things trending up or trending down?  

This also allows us to get a clearer picture of general trends in values, even if there are irregular daily spikes.

To handle these sorts of issues, pandas has an entire section to calculate window statistics called `rolling`, it works like this:

In [9]:
# I'll create a sample dataframe with 30 days worth of values
import numpy as np
index = pd.date_range(start='01/01/2020', end='02/05/2020')
sample_df = pd.DataFrame(np.random.randn(36), index=index, columns=['Value'])
# and here's what it looks like
sample_df.head()

Unnamed: 0,Value
2020-01-01,-0.253379
2020-01-02,-0.838158
2020-01-03,-1.131807
2020-01-04,-1.708901
2020-01-05,-0.1963


In [11]:
# and now we'll see rolling 10 day averages
sample_df.rolling(10).mean()

Unnamed: 0,Value
2020-01-01,
2020-01-02,
2020-01-03,
2020-01-04,
2020-01-05,
2020-01-06,
2020-01-07,
2020-01-08,
2020-01-09,
2020-01-10,-0.366059


You can specify the number of observations to calculate, and choose your aggregator -- `mean()`, `min()`, `sum()`, etc, although `mean()` is the most common.

**Your Turn:** Calculate the rolling 7, 25, and 60 day moving averages for visits for each restaurant inside the dataset.

And be mindful of performing these on the appropriate levels of your dataset.

In [2]:
# your answer here

One additional note:  for a calculation such as this it's best if you shift the values up by one -- why might this be the case?