## Performing Grouped Aggregations

Let us understand how to perform grouped or by key aggregations using Pandas.
* Here are the steps we need to follow:
  * Make sure data is read into Data Frame.
  * Identify the key on which data should be aggregated. If the data has to be aggregated on derived field which is not available as part of the Data Frame, then first we need to update data frame with the derived field.
  * Using the key group the values using `groupby` function on data frame. We can only pass column names from Data Frame as part of `groupby`.
  * Apply required aggregate functions to get aggregated results based up on the key.
* We can apply multiple aggregate functions at a time after creating grouped data frame.
* Pandas Data Frame exposes a function called as `rename` to provide aliases to the aggregated fields.

In [None]:
%run 06_csv_to_pandas_data_frame.ipynb

* Getting number of orders per day

In [None]:
orders

In [None]:
orders.groupby(orders['order_date'])

In [None]:
list(orders.groupby(orders['order_date'])['order_id'])[:3]

In [None]:
orders.groupby(orders['order_date'])['order_id'].count()

* Getting number of orders per status

In [None]:
orders.groupby('order_status')['order_status'].count()

* Computing revenue per order

In [None]:
order_items

In [None]:
list(order_items. \
    groupby('order_item_order_id')['order_item_subtotal'])[:5]

In [None]:
order_items. \
    groupby('order_item_order_id')['order_item_subtotal']. \
    sum()

In [None]:
order_items. \
    groupby('order_item_order_id')['order_item_subtotal']. \
    agg(['sum', 'min', 'max', 'count'])

In [None]:
order_items. \
    groupby('order_item_order_id')['order_item_subtotal']. \
    agg(['sum', 'min', 'max', 'count']). \
    rename(columns={'count': 'item_count', 'sum': 'revenue'})

In [None]:
order_items.rename(columns={'order_item_order_id': 'order_id'})

### Task 1

Get order_item_count and order_revenue for each order_id.

In [None]:
order_items

In [None]:
order_items. \
    groupby('order_item_order_id')['order_item_subtotal']. \
    agg(['sum', 'count']). \
    rename(columns={'sum': 'order_revenue', 'count': 'order_item_count'}). \
    reset_index()

### Task 2

Get order count by month using orders data for specific order_status.

In [None]:
orders

In [None]:
orders.order_date.str.slice(0, 7)

In [None]:
orders['order_month'] = orders.order_date.str.slice(0, 7)

In [None]:
orders

In [None]:
orders.query('order_status == "COMPLETE"'). \
    groupby('order_month')['order_id']. \
    count(). \
    sort_index()

### Task 3

Get order_revenue and order_quantity for each order_id. Add quantity of all items for each order_id to get order_quantity.

In [None]:
order_metrics = order_items. \
    groupby('order_item_order_id')[['order_item_subtotal', 'order_item_quantity']]. \
    agg(['sum'])

In [None]:
order_metrics.columns = ['order_revenue', 'order_quantity']

In [None]:
order_metrics