<img src=../images/gdd-logo.png width=300px align=right> 

# Custom Aggregating Functions

In this notebook, we have a look at customizing aggregation functions. 

This is useful whenever the default aggregations like `.max()`, `.sum()`, `.mean()`, etc. are not enough and you want a bit more flexibility in your data manipulation.

This notebook covers:

* [Exercise: Find the range](#1)
* [Using the custom function](#2)

First of all, let's load Pandas and the dataset again:

In [None]:
import pandas as pd

chickweight = (
    pd.read_csv('../data/chickweight.csv')
    .rename(str.lower, axis='columns')
)

Let's say you wanted to find the range of a column. Unfortunately there is no range method on a column:

In [None]:
# chickweight['weight'].range()

Therefore you need to create a new function that does this. 
<a id='1'></a>
### <mark> Exercise: Find the range</mark>

1. In the cell below find the range of the `'weight'` column (without using groupby)

<details>
    
  <summary><span style="color:blue">Show hint</span></summary>
  
The range is given by **subtracting** the **min**imum from the **max**imum.
    
</details>

2. Now finish writing the function below called `my_range`. The function takes one column of data as an argument, so you should be able to call your function using `my_range(chickweight['weight'])`

<details>
    
  <summary><span style="color:blue">Show hint</span></summary>
  
The `column` will be `chickweight['weight']` when the function is called. So in the function you need to replace `chickweight['weight']` with `column`:
    
The series is what is given as an input, and it should return its range.


</details>

In [None]:
def my_range(column):
    # YOUR CODE HERE (you can delete the line with 'pass')
    pass

In [None]:
# my_range(chickweight['weight'])

**Bonus:** Create the same function, but this time using a `lambda` expression.

In [None]:
# %load ../answers/03_Aggregations/ex-range.py

<a id='2'></a>
## Using the custom function

Now that you have created a function that can find the range, we can use this function in the `.agg()` method:

In [None]:
def get_range(col):
    return col.max() - col.min()

(
    chickweight
    .groupby('time')
    .agg(weight_range = ('weight', get_range))
)

<mark>**Question:** Why don't you put `get_range` in quotations like you do with `'mean'`?</mark>

<details>
    
  <summary><span style="color:blue">Show answer</span></summary>
  
  Because `mean` is a built in method (eg. `df['col'].mean()`) that the `.agg()` method can look up within pandas. 
    
  `get_range` is a function that we need to reference directly.

</details>

You can also use a `lambda` function, to avoid having to define and name the function.

In [None]:
(
    chickweight
    .groupby('time')
    .agg(weight_range = ('weight', lambda col: col.max() - col.min()) )
)

Putting it all together:

In [None]:
(
    chickweight
    .groupby(['time', 'diet'])
    .agg(num_chickens = ('rownum', 'count'),
         weight_mean = ('weight', 'mean'),
         weight_range = ('weight', lambda col: col.max() - col.min()),
    )
)