## Aggregates

In [20]:
import pandas as pd

grocery_df = pd.read_csv("data/grocery_catalog.csv")

Single aggregates can be performed directly. Simply:
 - `groupby` to group rows by a key.
 - `["<column>"]` to select which values to aggregate.
 - `sum()`, `mean()` or any other 'reducing' function to compute the aggregate.

In [21]:
grocery_df.groupby("category", as_index=False)["quantity"].sum()

Unnamed: 0,category,quantity
0,Bakery,15
1,Dairy,8
2,Fruit,162
3,Vegetable,55


Aggregate over multiple columns with `agg()` or `aggregate()` (they are aliases).

In [22]:
grocery_df.groupby("category", as_index=False).agg({"quantity": "sum", "price": "mean"})

# note: "sum" and "mean" are strings, but they get mapped to optimized versions of those functions
# you could also just pass in actual functions (lambda, etc)
# grocery_df.groupby("category", as_index=True).agg(
#     {"quantity": lambda col: sum(col), "price": lambda col: sum(col) / len(col)}
# )

Unnamed: 0,category,quantity,price
0,Bakery,15,1.5
1,Dairy,8,1.95
2,Fruit,162,1.016667
3,Vegetable,55,0.725


In [23]:
# Use keyword arguments to name aggregate columns
grocery_df.groupby("category", as_index=False).agg(
    average_calories=("calories", "mean")
)

Unnamed: 0,category,average_calories
0,Bakery,470.0
1,Dairy,280.0
2,Fruit,98.333333
3,Vegetable,17.5
