# Usage options

This section considers ways to compute things over groups and practical ways to overate with "gropby" objects.

## Iterating

You can iterate trow `pandas.DataFrame.groupby` retults. In each eteration you will get tuple of two values:

- Value of the grouping variable for this iteration;
- Sub-sampling from the original data set corresponding to the considered value of the grouping variable.

So in the following example I show the result of the first iteration under `pandas.DataFrameGroupby` result and then show a case of using it in the cycle. 

In [None]:
display(HTML("<b>Some iteration returns</b>"))
display(next(basic_frame.groupby("A").__iter__()))

display(HTML("<b>Whole cycle</b>"))
for a_val, subframe in basic_frame.groupby("A"):
    print("====" + a_val + "=====")
    display(subframe)

('a',
    A  B   C
 0  a  2  10
 1  a  1  20)

====a=====


Unnamed: 0,A,B,C
0,a,2,10
1,a,1,20


====b=====


Unnamed: 0,A,B,C
2,b,3,30
3,b,4,40


====c=====


Unnamed: 0,A,B,C
4,c,6,50
5,c,5,60


## `agg` - rule by dict

This is a way to apply aggregation functions using syntax `{<var_name_1>:<aggregation_function_1>, <var_name_2>:<aggregation_function_2>, ...}`.

So in the following example, I use the above syntax to aggregate max `B` values and sum of `C` values by `A` subsets:

In [None]:
display(HTML("<b>Aggregation</b>"))
display(basic_frame.groupby("A").agg({"B":"max", "C":"sum"}))

Unnamed: 0_level_0,B,C
A,Unnamed: 1_level_1,Unnamed: 2_level_1
a,2,30
b,4,70
c,6,110


## `apply` - combine results

<a href="https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.apply.html#pandas.core.groupby.DataFrameGroupBy.apply">Pandas documentation about apply function.</a>

### Basic idea

The peculiarity of this method is that it uses `pandas.DataFrame` as the input for the aggregation function.

The following example shows this: `example_funtion' just prints the input and it always prints a DataFrame for each "A" variable option.

In [None]:
def example_funtion(subdf):
    print("=========")
    print(subdf)
    return 5

res = basic_frame.groupby("A")[
    ["A", "B", "C"]
].apply(example_funtion)

   A  B   C
0  a  2  10
1  a  1  20
   A  B   C
2  b  3  30
3  b  4  40
   A  B   C
4  c  6  50
5  c  5  60


### Use case

So it's perfect for cases where you need to get, for each variant of variable A, some value of variable C conditioned on the value of variable B.

In particular, the following example shows how to obtain for each option of "A" the "C" value corresponding to the minimum "B" value.

- For `"A" == "a"` I got `"C" == 20`, because it corresponds to `"B"== 1`, which is the minimum for every `"A" == "a"`;
- For `"A" == "b"` I got `"C" == 30`, because it corresponds to `"B"== 3`, which is the minimum for every `"A" == "b"`;
- For `"A" == "c"` I got `"C" == 60`, because it corresponds to `"B"== 5`, which is the minimum for every `"A" == "c"`.

In [None]:
result = basic_frame.groupby("A")[["B", "C"]].apply(
    lambda subset: subset.loc[subset["B"].idxmin(), "C"]
)
display(HTML("<b>Result</b>"))
result.rename("C").to_frame()

Unnamed: 0_level_0,C
A,Unnamed: 1_level_1
a,20
b,30
c,60


### vs `agg`

Other common function may seem useless because this function can do everything they can. However, according to the <a href="https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.apply.html#pandas.core.groupby.DataFrameGroupBy.apply">pandas documentation</a>, they may work a little faster. I have not been able to test this yet.

## `transform`

This is a function that allows you to get aggregations as `pandas.Series`/`pandas.DataFrame` indexed like the original `pandas.DataFrame`.

For example, in the following cell, throw the `transform` function, for each record in the original `pandas.DataFrame` I got the mean value of `B` for each group in `A`.

In [None]:
temp_frame = basic_frame.copy()

temp_frame["mean B by A"] = (
    temp_frame.
    groupby("A")["B"].
    transform("mean")
)
display(temp_frame)

Unnamed: 0,A,B,C,mean B by A
0,a,2,10,1.5
1,a,1,20,1.5
2,b,3,30,3.5
3,b,4,40,3.5
4,c,6,50,5.5
5,c,5,60,5.5


Here I have a `pandas.DataFrame` that for each record from the original `pandas.DataFrame` matches the mean value of the `B` and `C` columns to the `A` column in a command.

In [None]:
display(
    temp_frame.
    groupby("A")[["B", "C"]].
    transform("mean")
)

Unnamed: 0,B,C
0,1.5,15.0
1,1.5,15.0
2,3.5,35.0
3,3.5,35.0
4,5.5,55.0
5,5.5,55.0
