<img src=../images/gdd-logo.png width=300px align=right>

# Transform

In principle using `.groupby()` and then merging our aggregates back to our original dataframe is quite common. Maybe you want to group by some information (things like say, average session length) and add this information to a raw dataset. 

To perform the aggregation first makes sense, but especially for large dataframes the join/merge operations that follow can be a bit expensive. There is an alternative.

In this section we will cover:

* [Overview of `transform()`](#overview)
* [<mark>Exercise: Use `transform()` and `assign()` to create new columns</mark>](#exercise)

Before we do anything though let's again import pandas read in our data.

In [None]:
import pandas as pd

chickweight = (
    pd.read_csv('../data/chickweight.csv') 
    .rename(str.lower, axis='columns')
)

<a id='overview'></a>

## Overview of `transform()`

<img src="../images/07_Transform/alternative.png" width="140" height="140" align="center"/>

The `.transform()` method allows us to do aggregation, as well as the join/merge in one go. 

To demonstrate how this works, let us imagine we want to add the mean chickweight per diet to the dataframe.

In [None]:
mean_weight_per_diet = (
    chickweight
    .groupby("diet")['weight']
    .mean()
)
mean_weight_per_diet

The problem here is that aggregating produces a dataframe which is a length to our original dataframe. 

In [None]:
chickweight.shape, mean_weight_per_diet.shape

This is why previously we used merge or join to combine this data.

In [None]:
(
    chickweight
    .merge(mean_weight_per_diet, on='diet', suffixes=('','_mean'))
)

However if we use the the `.transform()` method, we can calculate the mean chick weight information and add it to the chickweight datafame in one go.

When we use transform the the values corespond to the different diets.

In [None]:
(
    chickweight
    .groupby("diet")['weight']
    .transform('mean')
)

Importantly, the output is the same length as the original dataframe and 

In [None]:
(
    chickweight
    .groupby("diet")['weight']
    .transform('mean')
).shape

This means we can easily add this information as a new column in one go with the `.assign()` method.

In [None]:
(
    chickweight
    .assign(mean_weight_diet = lambda df: df.groupby("diet")['weight'].transform('mean'))
)

<a id='exercise'></a>
## <mark>Exercises: Use `.assign()` and `transform()` to create new columns</mark>

### Exercise 1

Take the original `chickweight` dataframe and create these columns on the raw data without performing a join: 

1. **mean_weight_diet**: which calculates the mean weight per diet 
2. **mean_weight_diet_time**: which calculates the mean weight per diet at a given time

**BONUS:**

3. **num_chickens_diet**: which calculates the total number of chickens per diet - explore what the `.nunique()` method does to do this.

In [None]:
# %load ../answers/07_Transform/ex-transforms.py

### Exercise 2: Find the fattest chicken per diet

Do you rember in the last notebook we tried to find the fattest chicken per diet?

Is it any easier now we know about the `transform` method?

In [None]:
# %load ../answers/07_Transform/ex-fattest-chick-transform.py