# 4. Rolling Windows

### Objectives
* Learn what a rolling window is
* Learn how to use the `rolling` method and know that it is very similar to `resample`
* Use both offset aliases and integers to specify a windows size

## Introduction
Often during time series analysis, we would like to calculate a statistic over a rolling window of time. For example, we might want to know the average of the last 3 observations.

## Visualization of Rolling Window
Here, we have a rolling window size of 3, which includes the current observation plus the preceding two. 

![][1]

[1]: images/rollingwindow3.png

Let's create the same data as a Pandas Series.

In [None]:
import pandas as pd
import numpy as np

s = pd.Series([4, 1, 12, 8, 10, 18, 2, 12])
s

We now use the `rolling` method to return the same result as seen in the above visualization. The rest of this notebook explains this method.

In [None]:
s.rolling(3, min_periods=1).agg(['mean', np.size]).round(1)

## Get Stock Market Data Again
With our stock data, we might want to know for each day, the average closing price for the last 5 trading days. The **`rolling`** method helps accomplish this task.

First, let's read in Amazon stock data from the last 5 years.

In [None]:
url = 'https://api.iextrading.com/1.0/stock/AMZN/chart/5y'
amzn = pd.read_json(url)
amzn = amzn.set_index('date')
amzn.head()

The **`rolling`** method works very similarly to **`resample`**. We pass it the offset alias of the length of our window and then aggregate as usual. It works best when you have a DatetimeIndex, otherwise you will need to specify the datetime column.

The result will always be a DataFrame (or Series) with the same number of rows as the original. The following takes the mean of the last 5 day period at each date.

In [None]:
amzn.rolling('5D').agg({'close': 'mean'}).head(10)

### Explanation
At each data point, the average of the **last** 5 days worth of data, which **includes the current day** are found. For instance, let's say the current day is Nov 10, 2017. Pandas will get all data back until Nov 6, 2017. It will aggregate all values found within this range. In this dataset where we have only one value per day, the maximum number of values to be aggregated in any window is 5.

This does not mean the window size is always going to contain 5 values. Most will contain less as there are no trading days on the weekend.

We can include an additional aggregation function, `np.size`, to find the number of values in each window. We should be able to use the string `'size'` but there appears to be a bug in Pandas and it's giving us an error.

In [None]:
amzn.rolling('5D').agg({'close': ['mean', np.size]}).head(10)

## Keep window size the same with an integer
Instead of using an offset alias, you can specify a specific window size with an integer. The following will always use the last 5 values (trading days in this case), regardless of how many actual days pass, to determine an average.

When using an integer for the window, the **`rolling`** method enforces that there must be that number of values present or else a missing value will be the result. This is what you are seeing below.

In [None]:
amzn.rolling(5).agg({'close': 'mean'}).head(10)

### Set the minimum window size
If you would like a non-missing value produced regardless of the window size, use the `min_periods` parameter to control it. Pandas defaults to a minimum period of 1 when using an offset alias.

In [None]:
amzn.rolling(5, min_periods=3).agg({'close': ['mean', np.size]}).head(10)

You can center the window around the current row with the `center` parameter. It will use an equal number of values before and after the current row.

In [None]:
amzn.rolling(5, min_periods=3, center=True).agg({'close': ['mean', np.size]}).head(10)

# Plotting
Let's find the trailing 50-day min, mean, and max of the closing price. Here, we will require at least 50 trading days worth of data.

In [None]:
rolling_stats = amzn.rolling(50).agg({'close': ['min', 'mean', 'max']})
rolling_stats.head()

Remove all rows that did not have 50 preceding days worth of data.

In [None]:
rolling_stats = rolling_stats.dropna()
rolling_stats.head()

Rename columns:

In [None]:
rolling_stats.columns = ['Min', 'Mean', 'Max']

Import matplotlib and choose a nice style:

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

plt.style.use("ggplot")

In [None]:
rolling_stats.plot(figsize=(16, 8), style=['-', '--', '-'], title='AMZN Rolling Windows')

## Resampling and Rolling Windows with a  Series - A bit easier
Resampling and rolling window calculations can be done on Series that have DatetimeIndexes. The syntax becomes a bit easier since you don't have to specify an aggregating column. If you are only applying one aggregating function to the group, you can call it directly as method. With Series **`s`**, the syntax will look like this:

```
>>> s.resample('5D').sum()
```

We select the closing price as a Series and proceed to call both the **`resample`** and **`rolling`** methods on it.

In [None]:
close = amzn['close']
close.head()

Find the mean over a two month period.

In [None]:
close.resample('2M').mean().head()

Find the rolling mean of the previous 5 trading days.

In [None]:
close.rolling(5).mean().head(10)

Multiple aggregation functions.

In [None]:
close.resample('2M').agg(['min', 'mean', 'max', np.size]).head()

# Exercises

## Problem 1
<span  style="color:green; font-size:16px">Use the employee dataset for this problem. Attempt to take a rolling average on salary using a 30 day time span on hire date. Does the error message make sense?</span>

## Problem 2
<span  style="color:green; font-size:16px">Set hire date as the index and then select the salary column as a Series. Sort the Series by date and drop the missing values. Now select a subset that only has hire dates from 1990 onwards. Then find a 1,000 day rolling average. Finally make a call to the `plot` method. Make sure you inline matplotlib if you did not do it earlier.</span>

## Problem 3
<span  style="color:green; font-size:16px">Read in the energy consumption dataset. Select just the residential source and plot a 12 month trailing rolling mean of the energy.</span>