## Aggregations using reduce

Let us understand how to perform global aggregations using `reduce`.

* We can use `reduce` on top of `iterable` to return aggregated result.
* It takes aggregation logic and iterable as arguments. We can pass aggregation logic either as regular function or lambda function.
* `reduce` returns objects of type `int`, `float` etc. It is typically of type elements in the collection that is being processed.
* Unlike `map` and `filter` we need to import `reduce` from functools.

In [1]:
%run 02_preparing_data_sets.ipynb

In [2]:
orders[:10]

['1,2013-07-25 00:00:00.0,11599,CLOSED',
 '2,2013-07-25 00:00:00.0,256,PENDING_PAYMENT',
 '3,2013-07-25 00:00:00.0,12111,COMPLETE',
 '4,2013-07-25 00:00:00.0,8827,CLOSED',
 '5,2013-07-25 00:00:00.0,11318,COMPLETE',
 '6,2013-07-25 00:00:00.0,7130,COMPLETE',
 '7,2013-07-25 00:00:00.0,4530,COMPLETE',
 '8,2013-07-25 00:00:00.0,2911,PROCESSING',
 '9,2013-07-25 00:00:00.0,5657,PENDING_PAYMENT',
 '10,2013-07-25 00:00:00.0,5648,PENDING_PAYMENT']

In [3]:
len(orders)

68883

In [4]:
order_items[:10]

['1,1,957,1,299.98,299.98',
 '2,2,1073,1,199.99,199.99',
 '3,2,502,5,250.0,50.0',
 '4,2,403,1,129.99,129.99',
 '5,4,897,2,49.98,24.99',
 '6,4,365,5,299.95,59.99',
 '7,4,502,3,150.0,50.0',
 '8,4,1014,4,199.92,49.98',
 '9,5,957,1,299.98,299.98',
 '10,5,365,5,299.95,59.99']

In [5]:
len(order_items)

172198

In [6]:
orders[:10]

['1,2013-07-25 00:00:00.0,11599,CLOSED',
 '2,2013-07-25 00:00:00.0,256,PENDING_PAYMENT',
 '3,2013-07-25 00:00:00.0,12111,COMPLETE',
 '4,2013-07-25 00:00:00.0,8827,CLOSED',
 '5,2013-07-25 00:00:00.0,11318,COMPLETE',
 '6,2013-07-25 00:00:00.0,7130,COMPLETE',
 '7,2013-07-25 00:00:00.0,4530,COMPLETE',
 '8,2013-07-25 00:00:00.0,2911,PROCESSING',
 '9,2013-07-25 00:00:00.0,5657,PENDING_PAYMENT',
 '10,2013-07-25 00:00:00.0,5648,PENDING_PAYMENT']

In [7]:
order_items[:10]

['1,1,957,1,299.98,299.98',
 '2,2,1073,1,199.99,199.99',
 '3,2,502,5,250.0,50.0',
 '4,2,403,1,129.99,129.99',
 '5,4,897,2,49.98,24.99',
 '6,4,365,5,299.95,59.99',
 '7,4,502,3,150.0,50.0',
 '8,4,1014,4,199.92,49.98',
 '9,5,957,1,299.98,299.98',
 '10,5,365,5,299.95,59.99']

### Task 1
Use orders and get total number of records for a given month (201401). 
* Filter the data.
* Perform row level transformation by changing each record to 1.
* Use reduce to aggregate.

In [6]:
orders[:10]

['1,2013-07-25 00:00:00.0,11599,CLOSED',
 '2,2013-07-25 00:00:00.0,256,PENDING_PAYMENT',
 '3,2013-07-25 00:00:00.0,12111,COMPLETE',
 '4,2013-07-25 00:00:00.0,8827,CLOSED',
 '5,2013-07-25 00:00:00.0,11318,COMPLETE',
 '6,2013-07-25 00:00:00.0,7130,COMPLETE',
 '7,2013-07-25 00:00:00.0,4530,COMPLETE',
 '8,2013-07-25 00:00:00.0,2911,PROCESSING',
 '9,2013-07-25 00:00:00.0,5657,PENDING_PAYMENT',
 '10,2013-07-25 00:00:00.0,5648,PENDING_PAYMENT']

In [7]:
order = '1,2013-07-25 00:00:00.0,11599,CLOSED'

In [8]:
order.split(',')

['1', '2013-07-25 00:00:00.0', '11599', 'CLOSED']

In [9]:
order.split(',')[1]

'2013-07-25 00:00:00.0'

In [10]:
order.split(',')[1][:7]

'2013-07'

In [11]:
order.split(',')[1][:7].replace('-', '')

'201307'

In [12]:
int(order.split(',')[1][:7].replace('-', ''))

201307

In [8]:
orders_filtered = filter(
    lambda order: int(order.split(',')[1][:7].replace('-', '')) == 201307,
    orders
)

In [11]:
orders_mapped = map(
    lambda order: 1,
    orders
)

In [10]:
from functools import reduce
reduce?

[0;31mDocstring:[0m
reduce(function, sequence[, initial]) -> value

Apply a function of two arguments cumulatively to the items of a sequence,
from left to right, so as to reduce the sequence to a single value.
For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates
((((1+2)+3)+4)+5).  If initial is present, it is placed before the items
of the sequence in the calculation, and serves as a default when the
sequence is empty.
[0;31mType:[0m      builtin_function_or_method


In [12]:
reduce(
    lambda tot, ele: tot + ele,
    orders_mapped
)

68883

### Task 2

Use order items data set and compute total revenue generated for a given product_id.
* Filter for given product_id.
* Extract order_item_subtotal for each item.
* Aggregate to get the revenue for a given product id.

In [15]:
order_items[:10]

['1,1,957,1,299.98,299.98',
 '2,2,1073,1,199.99,199.99',
 '3,2,502,5,250.0,50.0',
 '4,2,403,1,129.99,129.99',
 '5,4,897,2,49.98,24.99',
 '6,4,365,5,299.95,59.99',
 '7,4,502,3,150.0,50.0',
 '8,4,1014,4,199.92,49.98',
 '9,5,957,1,299.98,299.98',
 '10,5,365,5,299.95,59.99']

In [16]:
order_item = '1,1,957,1,299.98,299.98'

In [17]:
order_item.split(',')

['1', '1', '957', '1', '299.98', '299.98']

In [18]:
order_item.split(',')[2]

'957'

In [19]:
int(order_item.split(',')[2])

957

In [20]:
float(order_item.split(',')[4])

299.98

In [20]:
items_for_product = filter(
    lambda order_item: int(order_item.split(',')[2]) == 502,
    order_items
)

In [21]:
list(items_for_product)[:10]

['3,2,502,5,250.0,50.0',
 '7,4,502,3,150.0,50.0',
 '20,8,502,1,50.0,50.0',
 '38,12,502,5,250.0,50.0',
 '42,14,502,1,50.0,50.0',
 '43,15,502,1,50.0,50.0',
 '60,20,502,5,250.0,50.0',
 '67,21,502,2,100.0,50.0',
 '70,24,502,1,50.0,50.0',
 '71,24,502,5,250.0,50.0']

In [26]:
items_for_product = filter(
    lambda order_item: int(order_item.split(',')[2]) == 502,
    order_items
)
item_subtotals = map(
    lambda order_item: float(order_item.split(',')[4]),
    items_for_product
)

In [27]:
list(item_subtotals)[:10]

[250.0, 150.0, 50.0, 250.0, 50.0, 50.0, 250.0, 100.0, 50.0, 250.0]

In [28]:
items_for_product = filter(
    lambda order_item: int(order_item.split(',')[2]) == 502,
    order_items
)
item_subtotals = map(
    lambda order_item: float(order_item.split(',')[4]),
    items_for_product
)
reduce(
    lambda total_revenue, item_revenue: total_revenue + item_revenue,
    item_subtotals
)

3147800.0

```{note}
We can also aggregate using functions such as `add`, `min`, `max` etc to get the aggregated results.
```

In [62]:
from operator import add
items_for_product = filter(
    lambda order_item: int(order_item.split(',')[2]) == 502,
    order_items
)
item_subtotals = map(
    lambda order_item: float(order_item.split(',')[4]),
    items_for_product
)
reduce(
    add,
    item_subtotals
)

3147800.0

In [65]:
items_for_product = filter(
    lambda order_item: int(order_item.split(',')[2]) == 502,
    order_items
)
item_subtotals = map(
    lambda order_item: float(order_item.split(',')[4]),
    items_for_product
)
reduce(
    min,
    item_subtotals
)

50.0

### Task 3

Use order items data set and get total number of items sold as well as total revenue generated for a given product_id.

In [68]:
t1 = (1, 200.0)

In [69]:
t2 = (2, 300.0)

In [25]:
res = (0, 0.0)

In [26]:
res = (res[0] + t1[0], res[1] + t1[1])

In [27]:
res

(1, 200.0)

In [28]:
res = (res[0] + t2[0], res[1] + t2[1])

In [29]:
res

(3, 500.0)

In [29]:
items_for_product = filter(
    lambda order_item: int(order_item.split(',')[2]) == 502,
    order_items
)

In [30]:
list(items_for_product)[:10]

['3,2,502,5,250.0,50.0',
 '7,4,502,3,150.0,50.0',
 '20,8,502,1,50.0,50.0',
 '38,12,502,5,250.0,50.0',
 '42,14,502,1,50.0,50.0',
 '43,15,502,1,50.0,50.0',
 '60,20,502,5,250.0,50.0',
 '67,21,502,2,100.0,50.0',
 '70,24,502,1,50.0,50.0',
 '71,24,502,5,250.0,50.0']

In [35]:
items_for_product = filter(
    lambda order_item: int(order_item.split(',')[2]) == 502,
    order_items
)
item_details = map(
    lambda order_item: (int(order_item.split(',')[3]), float(order_item.split(',')[4])),
    items_for_product
)

In [36]:
list(item_details)[:10]

[(5, 250.0),
 (3, 150.0),
 (1, 50.0),
 (5, 250.0),
 (1, 50.0),
 (1, 50.0),
 (5, 250.0),
 (2, 100.0),
 (1, 50.0),
 (5, 250.0)]

In [37]:
items_for_product = filter(
    lambda order_item: int(order_item.split(',')[2]) == 502,
    order_items
)
item_details = map(
    lambda order_item: (int(order_item.split(',')[3]), float(order_item.split(',')[4])),
    items_for_product
)
reduce(
    lambda tot, ele: (tot[0] + ele[0], tot[1] + ele[1]),
    item_details
)

(62956, 3147800.0)

In [67]:
items_for_product = filter(
    lambda order_item: int(order_item.split(',')[2]) == 502,
    order_items
)
item_details = map(
    lambda order_item: (int(order_item.split(',')[3]), float(order_item.split(',')[4])),
    items_for_product
)
reduce(
    lambda tot, ele: (tot[0] + ele[0], tot[1] + ele[1]),
    item_details
)

(62956, 3147800.0)

### Task 4

Create a collection with sales and commission percentage. Using that collection compute total commission amount. If the commission percent is None or not present, treat it as 0.
* Each element in the collection should be a tuple.
* First element is the sales amount and second element is commission percentage.
* Commission for each sale can be computed by multiplying commission percentage with sales (make sure to divide commission percentage by 100).
* Some of the records does not have commission percentage, in that case commission amount for that sale shall be 0

In [38]:
transactions = [(376.0, 8),
(548.23, 14),
(107.93, 8),
(838.22, 14),
(846.85, 21),
(234.84,),
(850.2, 21),
(992.2, 21),
(267.01,),
(958.91, 21),
(412.59,),
(283.14,),
(350.01, 14),
(226.95,),
(132.7, 14)]

In [39]:
type(transactions)

list

In [40]:
transactions[:6]

[(376.0, 8), (548.23, 14), (107.93, 8), (838.22, 14), (846.85, 21), (234.84,)]

In [41]:
sale = transactions[0]

In [42]:
type(sale)

tuple

In [43]:
commission_amount = round(sale[0] * (sale[1] / 100), 2)

In [44]:
commission_amount

30.08

In [45]:
sale = (234.84,)

In [46]:
commission_amount = round(sale[0] * (sale[1] / 100), 2) # errors out

IndexError: tuple index out of range

In [47]:
len(sale)

1

In [48]:
commission_pct = sale[1] / 100 if len(sale) == 2 else 0

In [49]:
commission_pct

0

In [58]:
transactions_fixed = map(
    lambda sale: sale[0] * (sale[1] / 100 if len(sale) == 2 else 0),
    transactions
)

In [59]:
list(transactions_fixed)

[30.080000000000002,
 76.75220000000002,
 8.634400000000001,
 117.35080000000002,
 177.8385,
 0.0,
 178.542,
 208.362,
 0.0,
 201.37109999999998,
 0.0,
 0.0,
 49.001400000000004,
 0.0,
 18.578]

In [61]:
transactions_fixed = map(
    lambda sale: sale[0] * (sale[1] / 100 if len(sale) == 2 else 0),
    transactions
)
reduce(
    lambda tot, ele: round(tot + ele, 2),
    transactions_fixed
)

1066.5

```{note}
Using `map` function call as argument.
```

In [60]:
reduce(
    lambda tot, ele: round(tot + ele, 2),
    map(
        lambda sale: sale[0] * (sale[1] / 100 if len(sale) == 2 else 0),
        transactions
    )
)

1066.5

In [72]:
round(
    reduce(
        add,
        map(
            lambda sale: sale[0] * (sale[1] / 100 if len(sale) == 2 else 0),
            transactions
        )
    ), 2
)

1066.51