## Using groupby

Let us understand how we can use `itertools.groupby` to take care of aggregations by key.

* `itertools.groupby` can be used to get the data grouped by a key.
* It can be used to take care of use cases similar to following by using aggregate functions after grouping by key.
  * Get count by order status.
  * Get revenue for each order.
  * Get order count by month.
* We need to ensure data is pre-sorted by the key, so that all the values associated with each key are grouped together.

In [None]:
import itertools as iter

In [None]:
iter.groupby?

In [None]:
l = [1, 1, 3, 2, 1, 3, 2]

In [None]:
l_grouped = iter.groupby(l)

In [None]:
list(l_grouped)

In [None]:
l_sorted = sorted(l)

In [None]:
ls_grouped = iter.groupby(l_sorted)

In [None]:
list(ls_grouped)

```{note}
Rebuilding l_sorted and ls_grouped as ls_grouped will be flushed out after being read by `list(ls_grouped)`.
```

In [None]:
l_sorted = sorted(l)

In [None]:
ls_grouped = iter.groupby(l_sorted)

In [None]:
list(iter.starmap(lambda key, values: (key, len(list(values))), ls_grouped))

In [None]:
%run 02_preparing_data_sets.ipynb

### Task 1 - Order Count by Status

Get count by order status using orders data set.

In [None]:
orders[:3]

In [None]:
orders_sorted = sorted(orders, key=lambda k: k.split(',')[3])

In [None]:
orders_sorted[:3]

In [None]:
orders_grouped = iter.groupby(orders_sorted, lambda order: order.split(',')[3])

In [None]:
list(orders_grouped)[:3]

In [None]:
orders_sorted = sorted(orders, key=lambda k: k.split(',')[3])
orders_grouped = iter.groupby(orders_sorted, lambda order: order.split(',')[3])
order_count_by_status = iter.starmap(lambda key, values: (key, len(list(values))), orders_grouped)

In [None]:
list(order_count_by_status)

### Task 2 - Revenue per Order

Get revenue per order using order_items data set.

In [None]:
order_items[:4]

In [None]:
order_subtotals = map(lambda oi: (int(oi.split(',')[1]), float(oi.split(',')[4])), order_items)

In [None]:
list(order_subtotals)[:3]

In [None]:
order_subtotals = map(lambda oi: (int(oi.split(',')[1]), float(oi.split(',')[4])), order_items)
order_subtotals_sorted = sorted(order_subtotals)

In [None]:
order_subtotals_grouped = iter.groupby(order_subtotals_sorted, lambda rec: rec[0])

In [None]:
list(order_subtotals_grouped)[:3]

In [None]:
order_subtotals = map(lambda oi: (int(oi.split(',')[1]), float(oi.split(',')[4])), order_items)
order_subtotals_sorted = sorted(order_subtotals)

order_subtotals_grouped = iter.groupby(order_subtotals_sorted, lambda rec: rec[0])

item = list(order_subtotals_grouped)[0]

In [None]:
print(item[1]) # Contains similar to this [(2, 199.99), (2, 250.0), (2, 129.99)]

In [None]:
i = [(2, 199.99), (2, 250.0), (2, 129.99)]

In [None]:
list(map(lambda rec: rec[1], i))

In [None]:
sum(list(map(lambda rec: rec[1], i))) # this will go as part of first argument to starmap

In [None]:
order_subtotals = map(lambda oi: (int(oi.split(',')[1]), float(oi.split(',')[4])), order_items)
order_subtotals_sorted = sorted(order_subtotals)

order_subtotals_grouped = iter.groupby(order_subtotals_sorted, lambda rec: rec[0])

order_revenue = iter.starmap(
    lambda key, values: (key, round(sum(list(map(lambda rec: rec[1], values))), 2)), 
    order_subtotals_grouped
)

In [None]:
list(order_revenue)[:3]

```{note}
Alternative solution by avoiding first map.
```

In [None]:
order_items_sorted = sorted(order_items, key=lambda oi: int(oi.split(',')[1]))

order_items_grouped = iter.groupby(order_items_sorted, lambda oi: int(oi.split(',')[1]))

In [None]:
order_items[1:4]

In [None]:
values = order_items[1:4]

In [None]:
list(map(lambda rec: float(rec.split(',')[4]), values))

In [None]:
sum(list(map(lambda rec: float(rec.split(',')[4]), values)))

In [None]:
order_revenue = iter.starmap(
    lambda key, values: (key, round(sum(list(map(lambda rec: float(rec.split(',')[4]), values))), 2)), 
    order_items_grouped
)

In [None]:
list(order_revenue)[:3]