# 3. Grouping by Time

### Objectives
* Group by time with **`resample`**
* Use offset aliases to determine amount of time
* Use the **`rolling`** method to calculate moving window statistics

## Introduction
In previous notebooks, we learned how to downsample/upsample time series data. In this notebook, we will group spans of time together to get a result. For instance, we can find out the number of up or down days for a stock within each trading month, or calculate the number of flights per day for an airline.

# Grouping by time
Pandas gives you the ability to group by a period of time. A concrete example can help here with the Amazon closing stock data. Note, that the date is set as the index.

In [None]:
import pandas as pd

url = 'https://api.iextrading.com/1.0/stock/AMZN/chart/5y'
amzn = pd.read_json(url)
amzn = amzn.set_index('date')
amzn.head()

### Find the average closing price of Amazon for every month
If we are interested in finding the average closing price of Amazon for every month, then we need to group by month and aggregate the closing price with the mean function.

### Grouping column, aggregating column, and aggregating method
This procedure is very similar to how we grouped and aggregated columns in previous notebooks. The only difference is that, our **grouping column** will now be a datetime column with an additional specification for the amount of time.

### Use the `resample` method
Instead of the **`groupby`** method, we use a special method for grouping time together called **`resample`**. We must pass the **`resample`** method an offset alias string. The rest of the process is the exact same as the **`groupby`** method. We call the **`agg`** method and pass it a dictionary mapping the **aggregating columns** to the **aggregating functions**.

### `resample` syntax
The first parameter we pass to **`resample`** is the offset alias. Here, we choose to group by month.

In [None]:
amzn.resample('M').agg({'close': 'mean'}).head(10)

### Use any number of aggregation functions
Map the aggregating column to a list of aggregating functions.

In [None]:
amzn.resample('M').agg({'close': ['size', 'min', 'mean', 'max']}).head(10)

## Offset Aliases iframe
The offset aliases are again embedded in the notebook as an iframe.

In [None]:
from IPython.display import IFrame
IFrame('http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases', width=800, height=500)

### Group by Quarter

In [None]:
amzn.resample('Q').agg({'close': ['size', 'min', 'mean', 'max']}).head()

### Label as the entire Period
Notice how the end date of both the month and day are used as the returned index labels for the time periods. We can change the index labels so that they show just the time period we are aggregating over by setting the `kind` parameter to 'period'.

In [None]:
amzn_period = amzn.resample('Q', kind='period').agg({'close': ['size', 'min', 'mean', 'max']})
amzn_period

## The PeriodIndex
We no longer have a DatetimeIndex. Pandas has a completely separate type of object for this called the **PeriodIndex**. The index label '2016Q1' refers to the entire period of the first quarter of 2016. Let's inspect the index to see the new type.

In [None]:
amzn_period.index

## The Period data type
Pandas also has a completely separate data type called a **Period** to represent **columns** of data in a DataFrmae that are specific **periods of time**. This is directly analagous to the PeriodIndex, but for DataFrame columns. Examples of a Period are the entire month of June 2014, or the entire 15 minute period from June 12, 2014 5:15 to June 12, 2014 5:30.

### Convert a datetime column to a Period
We can use the `to_period` available with the `dt` accessor to convert datetimes to Period data types. You must pass it an offset alias to denote the length of the time period. Let's convert the `date` column in the weather dataset to a monthly Period column .

In [None]:
weather = pd.read_csv('../data/weather.csv', parse_dates=['date'])
weather.head()

Let's make the conversion from datetime to period and assign the result as a new column in the DataFrame.

In [None]:
date = weather['date']
weather['date_period'] = weather['date'].dt.to_period('M')
weather.head()

### Why is the data type "object"?
Unfortunately, Pandas doesn't explicitly label the Period object as such when outputting the data types. But if we inspect each individual element, you will see that they are indeed Period objects.

In [None]:
weather.dtypes

Inspecting each individual element.

In [None]:
weather.loc[0, 'date_period']

### The `dt` accessor works for Period columns
Even though it is technically labeled as object, Pandas still has attributes and methods specific to periods.

In [None]:
weather['date_period'].dt.month.head()

In [None]:
weather['date_period'].dt.month.head()

In [None]:
# Return the span of time
weather['date_period'].dt.freq

# Anchored offsets
By default, when grouping by week, Pandas chooses to end the week on Sunday. Let's verify this by grouping by week and taking the resulting index label and determining its weekday name.

In [None]:
week_mean = amzn.resample('W').agg({'close': ['size', 'min', 'mean', 'max']})
week_mean.head()

In [None]:
week_mean.index[0].day_name()

### Anchor by a different day
You can anchor the week to any day you choose by appending a dash and then the first the letters of the day of the week. Let's anchor the week to Wednesday.

In [None]:
amzn.resample('W-WED').agg({'close': ['size', 'min', 'mean', 'max']}).head()

### Longer intervals of time with numbers appended to offset aliases
We can actually add more details to our offset aliases by using a number to specify an amount of that particular offset alias. For instance, **`5M`** will group in 5 month intervals.

In [None]:
amzn.resample('5M').agg({'close': ['size', 'min', 'mean', 'max']}).head()

Group by every 22 weeks anchored to Thursday.

In [None]:
amzn.resample('22W-THU').agg({'close': ['size', 'min', 'mean', 'max']}).head()

# Calling `resample` on a datetime column
The `resample` method can still work without a Datetimeindex. If there is a column that is of the datetime data type, you can use the `on` parameter to specificy that column. Let's reset the index and then call `resample` on that DataFrame.

In [None]:
amzn_reset = amzn.reset_index()
amzn_reset.head()

The only difference is that we specify the grouping column with the `on` parameter. The result is the exact same.

In [None]:
amzn_reset.resample('W-WED', on='date').agg({'close': ['size', 'min', 'mean', 'max']}).head()

# Exercises

## Problem 1
<span  style="color:green; font-size:16px">Read in stock data for Apple (AAPL) for the last 5 years. Set the date as the index and keep just the closing price and the volume columns.</span>

## Problem 2
<span  style="color:green; font-size:16px">In which week did AAPL have the greatest number of its shares traded?</span>

## Problem 3
<span  style="color:green; font-size:16px">With help from the `diff` method, find the quarter containing the most number of up days.</span>

## Problem 4
<span  style="color:green; font-size:16px">Find the mean price per year along with the minimum and maximum volume. Have the label for each row be the first day of the year.</span>

## Problem 5
<span  style="color:green; font-size:16px">Execute the cell below exactly as it is to read in the employee dataset. Then use `to_datetime` to convert the hire date column into a datetime.</span>

In [None]:
# execute this as is
emp = pd.read_csv('../data/employee.csv')

## Problem 6
<span  style="color:green; font-size:16px">Without putting `hire_date` into the index, find the mean salary based on `hire_date` over 5 year periods. Also return the number of salaries used in the mean calculation for each period.</span>