# 5. Grouping by Time and another Column

### Objectives

* Use `Grouper` to group by time to give the exact same output as `resample`
* Independently group a column and then by an amount of time with `groupby.resample`
* Simultaneously group by time and another column by using `Grouper` within `groupby`

## Group by an amount of time with `Grouper` instead of `resample`
In the previous notebooks, we learned how to group by an amount of time with `resample`. Let's do that again here finding the average salary of every employee based on a span of 5 years.

In [None]:
import pandas as pd
emp = pd.read_csv('../data/employee.csv', parse_dates=['hire_date'], index_col='hire_date')
emp.head()

In [None]:
emp.resample('5Y').agg({'salary': 'mean'}).round(-3)

## Replicating `resample` with `groupby` + `Grouper`
The following syntax is a bit strange, so it might take reading it a few times to understand what is going on. 

You can group by time within the `groupby` method but you must use the `pd.Grouper` type to specify the frequency (the offset alias).

## Specify the frequency
The main parameter for `Grouper` is `freq`. Set it to the offset alias.

In [None]:
tg = pd.Grouper(freq='5Y')

### Think of `pd.Grouper` like a dictionary that holds information
The only use case for this object is to pass it into `groupby`. It might be easier to just think of it as a dictionary that holds the frequency. Once we pass it to `groupby`, we can aggregate like we normally do and get the same result as we did with `resample`.

In [None]:
emp.groupby(tg).agg({'salary':'mean'}).round(-3)

# Grouping by an amount of time and another column
There are two different ways to group by time and another column. The difference is subtle but important and can make a difference in the result. The datetime column and the other column can either be grouped **together** or grouped **independently**.

Let's say we wanted to find the average salary over 5-year time periods for each gender.

### Group together
To group gender and a 5-year time span together, we must use `groupby`. We will simply pass a list of both the `Grouper` object and the column name to groupby. 

In [None]:
tg = pd.Grouper(freq='5Y')
groups = ['gender', tg]

emp.groupby(groups).agg({'salary':'mean'}).round(-3)

### Datetimes are the same
Notice, how the datetimes for both female and male groups are the same. This is not going to be the case below.

## Group independently
To group independently, we first group the non-datetime column with the `groupby` method. The GroupBy object has a `resample` method which allows you to then group by an amount of time **within** the groups you just created. You use it just like it was being called from a DataFrame. Notice how the hire dates for males and females are different.

In [None]:
emp.groupby('gender').resample('5Y').agg({'salary':'mean'}).round(-3)

In [None]:
emp[emp['gender'] == 'Female'].index.min()

## Different results
Its important to see that you will get different results depending on whether you group together or group independently. The reason the results are different is because the earliest male and female employees don't have a hire date that is an exact 5 year multiple difference. The earliest hire date for female employees was 1975 while it is 1958 for males. If the first male and female employees were both hired in 1958, then the returned datetime index would have been the same.

## Using a pivot table with `Grouper` for easier comparisons
You can pass a `Grouper` object to a pivot table to get a nice final product. This groups gender together with time.

In [None]:
emp.pivot_table(index=tg, columns='gender', values='salary').round(-3)

## Using `Grouper` on a datetime column
If your datetime column is not in the index, you can still use `Grouper`. Just specify the column name with the `key` parameter. See the example below with `hire_date` not in the index.

In [None]:
emp2 = pd.read_csv('../data/employee.csv', parse_dates=['hire_date'])
emp2.head()

In [None]:
tg2 = pd.Grouper(freq='10Y', key='hire_date')
emp2.groupby(['gender', tg2]).agg({'salary':'mean'})

# Exercises

## Problem 1
<span  style="color:green; font-size:16px">Read in the energy consumption dataset. Find the average energy consumption per sector per 10 year time span beginning from the first year of data. Return the results as both a groupby and a pivot table. Experiment with adding 'S' to the end of your offset alias. How does this change the results?</span>