# Rolling Windows

In the previous chapter, we placed each row into a single group and then performed an aggregation, returning a single row for each group. In this chapter, we will again group by a fixed period of time, but allow for the start and end period to change to align itself to the current row. The group is constantly sliding down, representing a different "window" of time. This type of analysis is referred to as rolling windows or moving averages.

Take a look at the image below, the rolling mean is the average of three values, the current original value and the two preceding it. The window size here is three. The first two rows lack enough data for the full window size, but a mean can still be calculated.

![1]

[1]: images/rolling_mean.png

## The `rolling` method

The `rolling` method allows you to create windows of time around each row of data. Let's begin by creating the same data from above as a Pandas Series.

In [None]:
import pandas as pd
s = pd.Series([4, 2, 12, 10, 8, 18, 4, 14])
s

### Integer window size

The `rolling` method is similar to `resample`, where the first argument represents the time period. You can pass either an integer or an offset alias as the first argument. As with `resample` (and also `groupby`), calling the `rolling` method by itself informs pandas of the window size. You'll need to chain an additional method to aggregate the values in the current time period. Below, we choose a window size of three, which uses the first value and previous two, and aggregate with the mean.

In [None]:
s.rolling(3).mean()

By default, missing values are returned whenever there are less values than the window size. Use the parameter `min_periods` to set a different minimum window size. This result now matches the image from the beginning of the chapter.

In [None]:
s.rolling(3, min_periods=1).mean()

Perform multiple aggregations by passing them as strings within a list to the `agg` method. The `size` method is, unfortunately, not available to `rolling` objects. Below, we use `count`, which returns the same number as `size` whenever there are no missing values. We use it here to show the window size.

In [None]:
s.rolling(3, min_periods=1).agg(['min', 'max', 'count'])

### Centering the window

Instead of using the current row as the last value in the window, it's possible to use it as the middle value by setting the `center` parameter to `True`. The current row along with previous and following two rows are used below to form a window size of five.

In [None]:
s.rolling(5, min_periods=1, center=True).agg(['min', 'max', 'count'])

## Rolling with offset aliases

Let's read in the stocks dataset, set the index to be a datetime and then select Exxon-Mobil as a Series.

In [None]:
stocks = pd.read_csv('../data/stocks/stocks10.csv', parse_dates=['date'], 
                     index_col='date')
stocks.head(3)

In [None]:
xom = stocks['XOM']
xom.head()

Using an integer window size of five works the same in this Series with a DatetimeIndex as it does with any other Series.

In [None]:
xom.rolling(5, min_periods=1).agg(['mean', 'count']).head(10)

Window size of a Series (or DataFrame) with a DatetimeIndex may be set with an offset alias. Here, we use `'5D'` for five days. You might assume that this result will be the same as the last, but it is not. When using an offset alias, only the rows within the time period are considered. Since this dataset does not contain weekends, the five day period will only include five observations if the current day is Friday. All the other trading days overlap with one or both weekend days, which do not have rows in the dataset. Take note of the window size under the `'count'` column. Using an integer window size always uses that number of rows in the group (except for the beginning and ending rows).

In [None]:
xom.rolling('5D', min_periods=1).agg(['mean', 'count']).head(10)

Using a window size of seven days will make the number of observations in each group equal to five as each seven day period will contain exactly one Saturday and one Saturday. The only periods that will have less than five observations are the weeks with federal holidays.

In [None]:
xom.rolling('7D', min_periods=1).agg(['mean', 'count']).head(10)

### Unavailable offset aliases

Because years, quarter, and months are not fixed time periods, pandas does not allow them to be used as offset aliases within rolling. Here, we verify that using a month does not work. Although weeks are always seven days, they are not allowed either.

In [None]:
xom.rolling('M').mean()

## DataFrame `rolling` method

The `rolling` method works similarly on DataFrames. Here we take the mean of each column using a 14 day window size.

In [None]:
stocks.rolling('14D', min_periods=1).mean().round(1).head(10)

Use the `agg` method to map specific columns to specific aggregations.

In [None]:
stocks.rolling('7D', min_periods=1).agg({'MSFT': 'mean', 'WMT': 'min'}).head()

### Moving averages

Rolling window averages are commonly referred to as a "moving average", particularly with stock market data. Let's create the 200 day moving average, minimum and maximum for Exxon-Mobil.

In [None]:
xom_stats = xom.rolling(200).agg(['max', 'mean', 'min']).dropna()
xom_stats.head()

Visualizing these results can give another perspective on the performance of the stock.

In [None]:
import seaborn as sns
sns.set_theme(rc={'figure.figsize': (5, 2.7), 'figure.dpi': 127}, 
              font_scale=0.8)
xom_stats.plot(figsize=(6, 3), style=['-', '--', '-'], title='XOM Rolling Windows');

## Exercises

Execute the following cell that reads in the temperature dataset and use it for the exercises.

In [None]:
temp = pd.read_csv('../data/weather/temperature.csv', 
                   parse_dates=['datetime'], index_col='datetime')
temp.head()

### Exercise 1

<span style="color:green; font-size:16px">Calculate a 6-hour moving average of temperature. Set the minimum number of rows used in the group to 1.</span>

### Exercise 2

<span style="color:green; font-size:16px">How many observations are there in each 30-day rolling window of time? Use the `count` method because the `size` method is not available.</span>

### Exercise 3

<span style="color:green; font-size:16px">Calculate the 30-day moving average for Los Angeles and Houston using a 1-row minimum. What percentage of the rows does Houston have a higher temperature?</span>

### Exercise 4

<span style="color:green; font-size:16px">Calculate the minimum, maximum, and mean temperatures for Houston using a rolling 14-day period and plot the results.</span>