## Performing Aggregations

Let us understand how to perform grouped or by key aggregations using Pandas.
* We can apply multiple functions at a time.
* Pandas Data Frame exposes a function called as rename to provide aliases to the aggregated fields.

In [1]:
%run 06_csv_to_pandas_data_frame.ipynb

* Getting number of orders per day

In [2]:
orders.groupby(orders['order_date'])['order_id'].count()

order_date
2013-07-25 00:00:00.0    143
2013-07-26 00:00:00.0    269
2013-07-27 00:00:00.0    202
2013-07-28 00:00:00.0    187
2013-07-29 00:00:00.0    253
                        ... 
2014-07-20 00:00:00.0    285
2014-07-21 00:00:00.0    235
2014-07-22 00:00:00.0    138
2014-07-23 00:00:00.0    166
2014-07-24 00:00:00.0    185
Name: order_id, Length: 364, dtype: int64

* Getting number of orders per status

In [3]:
orders.groupby('order_status')['order_status'].count()

order_status
CANCELED            1428
CLOSED              7556
COMPLETE           22899
ON_HOLD             3798
PAYMENT_REVIEW       729
PENDING             7610
PENDING_PAYMENT    15030
PROCESSING          8275
SUSPECTED_FRAUD     1558
Name: order_status, dtype: int64

* Computing revenue per order

In [4]:
order_items. \
    groupby('order_item_order_id')['order_item_subtotal']. \
    sum()

order_item_order_id
1         299.98
2         579.98
4         699.85
5        1129.86
7         579.92
          ...   
68879    1259.97
68880     999.77
68881     129.99
68882     109.99
68883    2149.99
Name: order_item_subtotal, Length: 57431, dtype: float64

In [5]:
order_items. \
    groupby('order_item_order_id')['order_item_subtotal']. \
    agg(['sum', 'min', 'max', 'count']). \
    rename(columns={'count': 'item_count', 'sum': 'revenue'})

Unnamed: 0_level_0,revenue,min,max,item_count
order_item_order_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,299.98,299.98,299.98,1
2,579.98,129.99,250.00,3
4,699.85,49.98,299.95,4
5,1129.86,99.96,299.98,5
7,579.92,79.95,299.98,3
...,...,...,...,...
68879,1259.97,129.99,999.99,3
68880,999.77,149.94,250.00,5
68881,129.99,129.99,129.99,1
68882,109.99,50.00,59.99,2


In [6]:
order_items.rename(columns={'order_item_order_id': 'order_id'})

Unnamed: 0,order_item_id,order_id,order_item_product_id,order_item_quantity,order_item_subtotal,order_item_product_price
0,1,1,957,1,299.98,299.98
1,2,2,1073,1,199.99,199.99
2,3,2,502,5,250.00,50.00
3,4,2,403,1,129.99,129.99
4,5,4,897,2,49.98,24.99
...,...,...,...,...,...,...
172193,172194,68881,403,1,129.99,129.99
172194,172195,68882,365,1,59.99,59.99
172195,172196,68882,502,1,50.00,50.00
172196,172197,68883,208,1,1999.99,1999.99


### Task 1

Get order_item_count and order_revenue for each order_id.

In [9]:
order_items. \
    groupby('order_item_order_id')['order_item_subtotal']. \
    agg(['sum', 'count']). \
    rename(columns={'sum': 'order_revenue', 'count': 'order_item_count'}). \
    reset_index()

Unnamed: 0,order_item_order_id,order_revenue,order_item_count
0,1,299.98,1
1,2,579.98,3
2,4,699.85,4
3,5,1129.86,5
4,7,579.92,3
...,...,...,...
57426,68879,1259.97,3
57427,68880,999.77,5
57428,68881,129.99,1
57429,68882,109.99,2


### Task 2

Get order count by month using orders data for specific order_status.

In [14]:
orders['order_month'] = orders.order_date.str.slice(0, 7)

In [16]:
orders.query('order_status == "COMPLETE"'). \
    groupby('order_month')['order_id']. \
    count(). \
    sort_index()

order_month
2013-07     515
2013-08    1880
2013-09    1933
2013-10    1783
2013-11    2141
2013-12    1898
2014-01    1911
2014-02    1869
2014-03    1967
2014-04    1932
2014-05    1854
2014-06    1797
2014-07    1419
Name: order_id, dtype: int64