# Window functions

Window functions perform calculations across a set of table rows (window). When a window size is specified, a window function will continuously  perform the same calculation as it slides through the specified window.

Window functions are used to calculate rolling statistics like rolling means and rolling sums. When applied to time series data, we are able to perform rolling statistics that are time aware.

Pandas has two main functions for windowing `.rolling()` and `.expanding()`. These functions can be called on series and dataframes.

**`.rolling()`** - This function expects the parameter `window` which specifies the size of the window. The function is then applied to the set rows matching the window. If window size of say 2 is specified the function applied considers the current row and the previous only.

**`.expanding()`** - Unlike the a rolling function this function continuously  expands its windows size starting with a specified minimum.

Read more on pandas window functions [here](https://pandas.pydata.org/pandas-docs/stable/computation.html#window-functions)

# Rolling function example


In [7]:
import pandas as pd
import numpy as np

# generate data with an interval of 30 seconds
df = pd.DataFrame(np.random.randint(low =1,high=5,size=2881),
                  np.random.randint(low =1,high=5,size=2881),
                     #index=pd.date_range(start='1/1/2018',
                      #                   end='1/2/2018',
                       #                  freq='30s'),
                      columns=['A','B'])
df.head()

ValueError: Shape of passed values is (1, 2881), indices imply (2, 2881)

In [2]:
#this calculates the sum of the current row and the previous row 
df['rolling_sum'] = df['A'].rolling(window=2).sum() 
df.head()

Unnamed: 0,A,rolling_sum
2018-01-01 00:00:00,3,
2018-01-01 00:00:30,2,5.0
2018-01-01 00:01:00,2,4.0
2018-01-01 00:01:30,3,5.0
2018-01-01 00:02:00,3,6.0


# Expanding function example

In [4]:
#this calculates the sum of the current row and the previous row 
df['expanding_sum'] = df['A'].expanding(min_periods=2).sum() 
df.head()

Unnamed: 0,A,rolling_sum,expanding_sum
2018-01-01 00:00:00,3,,
2018-01-01 00:00:30,2,5.0,5.0
2018-01-01 00:01:00,2,4.0,7.0
2018-01-01 00:01:30,3,5.0,10.0
2018-01-01 00:02:00,3,6.0,13.0


From the above example the rolling sum add the current row to the previous only while the expanding sum adds current row to all previous rows. The first rows are empty because the window size is 2.

The rolling function can be made time-aware by setting rolling frequency to a time based frequency.This uses the offset aliases found [here](http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases)

In the example below we set the window size to be 60 seconds, this means the function will be applied continuously  on only 60 seconds of data

In [8]:
df['rolling_sum_60seconds'] = df['A'].rolling('60s').sum()
df.head()

Unnamed: 0,A,rolling_sum,expanding_sum,rolling_sum_60seconds
2018-01-01 00:00:00,3,,,3.0
2018-01-01 00:00:30,2,5.0,5.0,5.0
2018-01-01 00:01:00,2,4.0,7.0,4.0
2018-01-01 00:01:30,3,5.0,10.0,5.0
2018-01-01 00:02:00,3,6.0,13.0,6.0


# Assessment

Using the data in Timeseries/manuplating_dates/data/internet_traffic provide the following statistics


*   Average houly bits transferred
*   Total daily bits transferred

The data shows in megabits the amount of data transferred by an ISP at 5 minute intervals

In [None]:
#write code here