# Overview of itertools:
Let us go through one of the important library to manipulate collections called as itertools.
* Functions such as `filter` and `map` are part of core Python libraries.
* `reduce` is part of `functools`.
* It is not possible to use these functions to perform advanced operations such as grouped aggregations, joins etc.
* Python have several higher level libraries which provide required functionality to perform advanced aggregations such as grouped aggregations, joins etc. `itertools` is one of the popular library to manipulate collections.

In [2]:
import itertools

In [3]:
itertools?

[0;31mType:[0m        module
[0;31mString form:[0m <module 'itertools' (built-in)>
[0;31mDocstring:[0m  
Functional tools for creating and using iterators.

Infinite iterators:
count(start=0, step=1) --> start, start+step, start+2*step, ...
cycle(p) --> p0, p1, ... plast, p0, p1, ...
repeat(elem [,n]) --> elem, elem, elem, ... endlessly or up to n times

Iterators terminating on the shortest input sequence:
accumulate(p[, func]) --> p0, p0+p1, p0+p1+p2
chain(p, q, ...) --> p0, p1, ... plast, q0, q1, ... 
chain.from_iterable([p, q, ...]) --> p0, p1, ... plast, q0, q1, ... 
compress(data, selectors) --> (d[0] if s[0]), (d[1] if s[1]), ...
dropwhile(pred, seq) --> seq[n], seq[n+1], starting when pred fails
groupby(iterable[, keyfunc]) --> sub-iterators grouped by value of keyfunc(v)
filterfalse(pred, seq) --> elements of seq where pred(elem) is False
islice(seq, [start,] stop [, step]) --> elements from
       seq[start:stop:step]
starmap(fun, seq) --> fun(*seq[0]), fun(*seq[1]), ..

# Task 1:
Get cumulative sales from list of transactions.

In [4]:
ns = [1, 2, 3, 4]

In [5]:
import itertools as iter

In [6]:
iter.accumulate?

[0;31mInit signature:[0m [0miter[0m[0;34m.[0m[0maccumulate[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
accumulate(iterable[, func]) --> accumulate object

Return series of accumulated sums (or other binary function results).
[0;31mType:[0m           type
[0;31mSubclasses:[0m     


In [7]:
list(iter.accumulate(ns))[:10]

[1, 3, 6, 10]

In [8]:
import operator as o
list(iter.accumulate(ns, o.mul))[:10]

[1, 2, 6, 24]

In [9]:
iter.chain?

[0;31mInit signature:[0m [0miter[0m[0;34m.[0m[0mchain[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
chain(*iterables) --> chain object

Return a chain object whose .__next__() method returns elements from the
first iterable until it is exhausted, then elements from the next
iterable, until all of the iterables are exhausted.
[0;31mType:[0m           type
[0;31mSubclasses:[0m     


In [10]:
l1 = [1, 2, 3, 4]
l2 = [4, 5, 6]

list(iter.chain(l1, l2))

[1, 2, 3, 4, 4, 5, 6]

In [11]:
help(iter.starmap)

Help on class starmap in module itertools:

class starmap(builtins.object)
 |  starmap(function, sequence) --> starmap object
 |  
 |  Return an iterator whose values are returned from the function evaluated
 |  with an argument tuple taken from the given sequence.
 |  
 |  Methods defined here:
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  __new__(*args, **kwargs) from builtins.type
 |      Create and return a new object.  See help(type) for accurate signature.
 |  
 |  __next__(self, /)
 |      Implement next(self).
 |  
 |  __reduce__(...)
 |      Return state information for pickling.



# Task - Compute Commission Amount:
Create a collection with sales and commission percentage. Using that collection compute total commission amount. If the commission percent is None or not present, treat it as 0.
* Each element in the collection should be a tuple.
* First element is the sales amount and second element is commission percentage.
* Commission for each sale can be computed by multiplying commission percentage with sales (make sure to divide commission percentage by 100).
* Some of the records does not have commission percentage, in that case commission amount for that sale shall be 0

In [12]:
transactions = [(376.0, 8),
(548.23, 14),
(107.93, 8),
(838.22, 14),
(846.85, 21),
(234.84, None),
(850.2, 21),
(992.2, 21),
(267.01, None),
(958.91, 21),
(412.59, None),
(283.14, None),
(350.01, 14),
(226.95, None),
(132.7, 14)]

In [16]:
iter.starmap(lambda rec: rec[0] * (rec[1]/100) if rec[1] else 0, transactions)

<itertools.starmap at 0x7fa1518925c0>

In [19]:
a = iter.starmap(lambda sale_amount, comm_pct: round(sale_amount * (comm_pct/100), 2) if comm_pct else 0, transactions)


In [20]:
for i in a: print(i)

30.08
76.75
8.63
117.35
177.84
0
178.54
208.36
0
201.37
0
0
49.0
0
18.58


# Using groupby:
Let us understand how we can use `itertools.groupby` to take care of aggregations by key.
* `itertools.groupby` can be used to get the data grouped by a key.
* It can be used to take care of use cases similar to following by using aggregate functions after grouping by key.
    * Get count by order status.
    * Get revenue for each order.
    * Get order count by month.
* We need to ensure data is pre-sorted by the key, so that all the values associated with each key are grouped together.

In [22]:
import itertools as iter

In [23]:
iter.groupby?

[0;31mInit signature:[0m [0miter[0m[0;34m.[0m[0mgroupby[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
groupby(iterable, key=None) -> make an iterator that returns consecutive
keys and groups from the iterable.  If the key function is not specified or
is None, the element itself is used for grouping.
[0;31mType:[0m           type
[0;31mSubclasses:[0m     


In [24]:
l = [1, 1, 3, 2, 1, 3, 2]

In [26]:
l_grouped = iter.groupby(l)

In [27]:
list(l_grouped)

[(1, <itertools._grouper at 0x7fa150ec60f0>),
 (3, <itertools._grouper at 0x7fa150ec6828>),
 (2, <itertools._grouper at 0x7fa151406208>),
 (1, <itertools._grouper at 0x7fa151406358>),
 (3, <itertools._grouper at 0x7fa151406320>),
 (2, <itertools._grouper at 0x7fa151406400>)]

In [28]:
l_sorted = sorted(l)

In [29]:
ls_grouped = iter.groupby(l_sorted)

In [30]:
list(iter.starmap(lambda key, values: (key, len(list(values))), ls_grouped))


[(1, 3), (2, 2), (3, 2)]

In [31]:
%run Data_Set_items.ipynb

# Task 1 - Order Count by Status:
Get count by order status using orders data set.

In [32]:
orders[:3]


['1,2013-07-25 00:00:00.0,11599,CLOSED',
 '2,2013-07-25 00:00:00.0,256,PENDING_PAYMENT',
 '3,2013-07-25 00:00:00.0,12111,COMPLETE']

In [33]:
orders_sorted = sorted(orders, key=lambda k: k.split(',')[3])

In [34]:
orders_sorted[:3]

['50,2013-07-25 00:00:00.0,5225,CANCELED',
 '112,2013-07-26 00:00:00.0,5375,CANCELED',
 '527,2013-07-28 00:00:00.0,5426,CANCELED']

In [35]:
orders_grouped = iter.groupby(orders_sorted, lambda order: order.split(',')[3])

In [36]:
list(orders_grouped)[:3]

[('CANCELED', <itertools._grouper at 0x7fa151892358>),
 ('CLOSED', <itertools._grouper at 0x7fa151892438>),
 ('COMPLETE', <itertools._grouper at 0x7fa150b81eb8>)]

In [37]:
orders_sorted = sorted(orders, key=lambda k: k.split(',')[3])
orders_grouped = iter.groupby(orders_sorted, lambda order: order.split(',')[3])
order_count_by_status = iter.starmap(lambda key, values: (key, len(list(values))), orders_grouped)

In [38]:
list(order_count_by_status)

[('CANCELED', 1428),
 ('CLOSED', 7556),
 ('COMPLETE', 22899),
 ('ON_HOLD', 3798),
 ('PAYMENT_REVIEW', 729),
 ('PENDING', 7610),
 ('PENDING_PAYMENT', 15030),
 ('PROCESSING', 8275),
 ('SUSPECTED_FRAUD', 1558)]

# Task 2 - Revenue per Order:
Get revenue per order using order_items data set.

In [39]:
order_items[:4]

['1,1,957,1,299.98,299.98',
 '2,2,1073,1,199.99,199.99',
 '3,2,502,5,250.0,50.0',
 '4,2,403,1,129.99,129.99']

In [40]:
order_subtotals = map(lambda oi: (int(oi.split(',')[1]), float(oi.split(',')[4])), order_items)

In [41]:
list(order_subtotals)[:3]

[(1, 299.98), (2, 199.99), (2, 250.0)]

In [42]:
order_subtotals = map(lambda oi: (int(oi.split(',')[1]), float(oi.split(',')[4])), order_items)
order_subtotals_sorted = sorted(order_subtotals)

In [43]:
order_subtotals_grouped = iter.groupby(order_subtotals_sorted, lambda rec: rec[0])

In [44]:
list(order_subtotals_grouped)[:3]

[(1, <itertools._grouper at 0x7fa151406f28>),
 (2, <itertools._grouper at 0x7fa15140b1d0>),
 (4, <itertools._grouper at 0x7fa15140b208>)]

In [45]:
order_subtotals = map(lambda oi: (int(oi.split(',')[1]), float(oi.split(',')[4])), order_items)
order_subtotals_sorted = sorted(order_subtotals)

order_subtotals_grouped = iter.groupby(order_subtotals_sorted, lambda rec: rec[0])

item = list(order_subtotals_grouped)[0]

In [46]:
print(item[1]) # Contains similar to this [(2, 199.99), (2, 250.0), (2, 129.99)]

<itertools._grouper object at 0x7fa143a011d0>


In [47]:
order_subtotals = map(lambda oi: (int(oi.split(',')[1]), float(oi.split(',')[4])), order_items)
order_subtotals_sorted = sorted(order_subtotals)

order_subtotals_grouped = iter.groupby(order_subtotals_sorted, lambda rec: rec[0])

order_revenue = iter.starmap(
    lambda key, values: (key, round(sum(list(map(lambda rec: rec[1], values))), 2)), 
    order_subtotals_grouped
)

In [48]:
list(order_revenue)[:3]

[(1, 299.98), (2, 579.98), (4, 699.85)]

In [50]:
order_items_sorted = sorted(order_items, key=lambda oi: int(oi.split(',')[1]))

order_items_grouped = iter.groupby(order_items_sorted, lambda oi: int(oi.split(',')[1]))

In [51]:
order_items[1:4]

['2,2,1073,1,199.99,199.99', '3,2,502,5,250.0,50.0', '4,2,403,1,129.99,129.99']

In [52]:
values = order_items[1:4]

In [53]:
list(map(lambda rec: float(rec.split(',')[4]), values))


[199.99, 250.0, 129.99]

In [54]:
order_revenue = iter.starmap(
    lambda key, values: (key, round(sum(list(map(lambda rec: float(rec.split(',')[4]), values))), 2)), 
    order_items_grouped
)

In [55]:
list(order_revenue)[:3]

[(1, 299.98), (2, 579.98), (4, 699.85)]