<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Itertools" data-toc-modified-id="Itertools-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Itertools</a></span><ul class="toc-item"><li><span><a href="#Accumulate--(vs.--functools.reduce)" data-toc-modified-id="Accumulate--(vs.--functools.reduce)-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Accumulate  (vs.  functools.reduce)</a></span></li><li><span><a href="#Compress" data-toc-modified-id="Compress-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Compress</a></span></li><li><span><a href="#Cycle" data-toc-modified-id="Cycle-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Cycle</a></span></li></ul></li><li><span><a href="#more_itertools" data-toc-modified-id="more_itertools-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>more_itertools</a></span><ul class="toc-item"><li><span><a href="#Divide" data-toc-modified-id="Divide-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Divide</a></span></li><li><span><a href="#Partition" data-toc-modified-id="Partition-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Partition</a></span></li><li><span><a href="#Split_at" data-toc-modified-id="Split_at-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Split_at</a></span></li><li><span><a href="#Bucket" data-toc-modified-id="Bucket-2.4"><span class="toc-item-num">2.4&nbsp;&nbsp;</span>Bucket</a></span></li><li><span><a href="#Consecutive_groups" data-toc-modified-id="Consecutive_groups-2.5"><span class="toc-item-num">2.5&nbsp;&nbsp;</span>Consecutive_groups</a></span></li><li><span><a href="#Collapse" data-toc-modified-id="Collapse-2.6"><span class="toc-item-num">2.6&nbsp;&nbsp;</span>Collapse</a></span></li><li><span><a href="#MAP_REDUCE" data-toc-modified-id="MAP_REDUCE-2.7"><span class="toc-item-num">2.7&nbsp;&nbsp;</span>MAP_REDUCE</a></span></li></ul></li></ul></div>

In [21]:
import datetime as dt
import itertools
import more_itertools
import sys
import time
from pathlib import Path
from pprint import pprint

import numpy as np
import pandas as pd

In [2]:
print(sys.executable)
print(sys.version)

C:\Users\r2d4\miniconda3\envs\py3\python.exe
3.8.3 (default, May 19 2020, 06:50:17) [MSC v.1916 64 bit (AMD64)]


## Itertools

### Accumulate  (vs.  functools.reduce)

Accumulates the results of some (binary) function, e.g. max or factorial. Returns an iterator with all intermediary results.

In [3]:
data = [3, 4, 1, 3, 5, 6, 9, 0, 1]

# Calculate the running max
result = itertools.accumulate(data, max)  

list(result)  # Returns an iterator

[3, 4, 4, 4, 5, 6, 9, 9, 9]

In [4]:
# Some more examples using operator (addition is default, if no function is passed)

import operator

print(list(itertools.accumulate(data, operator.mul)))
print(list(itertools.accumulate(data, operator.sub, initial=1000)))

[3, 12, 12, 36, 180, 1080, 9720, 0, 0]
[1000, 997, 993, 992, 989, 984, 978, 969, 969, 968]


Attention: If you don’t care about intermediate results, use `functools.reduce` , which keeps only the final value and is more memory efficient:

In [5]:
from functools import reduce

result = reduce(max, data)

result

9

<img src="diff_between_red_acc.jpg" width="50%" />

### Compress

One of many options to filtering sequences: `compress` takes an _iterable_ and a _boolean selector_ and returns an iterator containing items of the iterable where the corresponding element in the selector is True.

(see the more advanced `more_itertools.partition()` if you want to pass a callable.)

In [6]:
dates = [
    "2020-01-01",
    "2020-02-04",
    "2020-02-01",
    "2020-01-24",
    "2020-01-08",
    "2020-02-10",
    "2020-02-15",
    "2020-02-11",
]

counts = [1, 4, 3, 8, 0, 7, 9, 2]

bools = [n > 3 for n in counts]
result = itertools.compress(dates, bools)

list(result)  # Returns an iterator

['2020-02-04', '2020-01-24', '2020-02-10', '2020-02-15']

### Cycle

This function takes iterable and creates infinite cycle from it. 

In [7]:
players = ["Raph", "Esra"]

next_player = itertools.cycle(players).__next__

for i in range(3):
    print(next_player())

Raph
Esra
Raph


In [8]:
# # Example: Infinite Spinner
# for c in itertools.cycle("/-\|"):
#     print(c, end="\r")
#     time.sleep(0.2)

## more_itertools

### Divide

Divide the elements from *iterable* into *n* parts, maintaining order. If the length of *iterable* is not evenly divisible by *n*, then the
length of the returned iterables will not be identical (see example below.)

Note: This function will exhaust the iterable before returning and may require significant storage. If order is not important, see `distribute`, which does not first pull the iterable into memory.

In [9]:
data = ["first", "second", "third", "fourth", "fifth", "sixth", "seventh"]

[list(l) for l in more_itertools.divide(3, data)]

[['first', 'second', 'third'], ['fourth', 'fifth'], ['sixth', 'seventh']]

### Partition

This function is also dividing an iterable with a boolean selector (see `itertools.compress`), this time however, using a callable.

Returns a 2-tuple of iterables derived from the input iterable.
The first yields the items that have ``pred(item) == False``.
The second yields the items that have ``pred(item) == True``.

In [10]:
# Example: Split based on file extension

files = [
    "foo.jpg",
    "bar.exe",
    "baz.gif",
    "text.txt",
    "data.bin",
]

# Define the function to be applied (can only take 1 argument)
def is_allowed(x):
    ALLOWED_EXTENSIONS = ('jpg','jpeg','gif','bmp','png')
    return x.split(".")[1] in ALLOWED_EXTENSIONS

forbidden, allowed = more_itertools.partition(is_allowed, files)

print("Allowed:", list(allowed))
print("Forbidden:", list(forbidden))

Allowed: ['foo.jpg', 'baz.gif']
Forbidden: ['bar.exe', 'text.txt', 'data.bin']


In [11]:
# Example: split dates based on recency

dates = [ 
    dt.datetime(2015, 9, 15),
    dt.datetime(2020, 9, 16),
    dt.datetime(2020, 9, 17),
    dt.datetime(2019, 9, 1),
    dt.datetime(2020, 9, 2),
]

def is_old(x):
    return dt.datetime.now() - x < dt.timedelta(days=30)
    
old, recent = more_itertools.partition(is_old, dates)

print(list(old))
print(list(recent))

[datetime.datetime(2015, 9, 15, 0, 0), datetime.datetime(2019, 9, 1, 0, 0), datetime.datetime(2020, 9, 2, 0, 0)]
[datetime.datetime(2020, 9, 16, 0, 0), datetime.datetime(2020, 9, 17, 0, 0)]


### Split_at

Splits iterable into lists based on predicate. 

This works like basic split for strings, but here we have iterable instead of string and callable instead of a delimiter.

In [12]:
import more_itertools

list(more_itertools.split_at(range(10), lambda n: n % 2 == 1))

[[0], [2], [4], [6], [8], []]

### Bucket

Splits an iterable into multiple bucket iterators based on some condition using a *key* function.<br>
The whole bucket object is a generator that supports dict-like look-up.

In [13]:
iterable = ['a1', 'b1', 'c1', 'a2', 'b2', 'c2', 'b3']
s = more_itertools.bucket(iterable, key=lambda x: x[0])  # Bucket by 1st character

print(sorted(list(s)))  # Get the keys
print(list(s['a']))

['a', 'b', 'c']
['a1', 'a2']


In [14]:
ser_1 = pd.Series(dtype='object')
ser_2 = pd.Series(dtype='object')
arr_1 = np.ndarray(1)

iters = [ser_1, ser_2, arr_1]

iter_buckets = more_itertools.bucket(iters, key=lambda x: type(x))

list(iter_buckets[pd.core.series.Series])

[Series([], dtype: object), Series([], dtype: object)]

### Consecutive_groups


Yields groups of consecutive items (numbers, dates, letters, booleans or any other orderable objects) using :func:`itertools.groupby`. The *ordering* function determines whether two items are adjacent by returning their position.

By default, the ordering function is the identity function.

In [15]:
# Note: In this example, we have a list of dates. To be able to pass these dates to the 
# consecutive_groups function, we first have to convert them to ordinal numbers. 
# Then using list comprehension we iterate over groups of consecutive ordinal dates 
#created by consecutive_groups and convert them back to datetime using map and fromordinal functions.


dates = [ 
    dt.datetime(2020, 1, 15),
    dt.datetime(2020, 1, 16),
    dt.datetime(2020, 1, 17),
    dt.datetime(2020, 2, 1),
    dt.datetime(2020, 2, 2),
    dt.datetime(2020, 2, 4)
]

ordinal_dates = [d.toordinal() for d in dates]

groups = [list(map(dt.datetime.fromordinal, group)) 
          for group in more_itertools.consecutive_groups(ordinal_dates)]

for group in groups:
    print(group)

[datetime.datetime(2020, 1, 15, 0, 0), datetime.datetime(2020, 1, 16, 0, 0), datetime.datetime(2020, 1, 17, 0, 0)]
[datetime.datetime(2020, 2, 1, 0, 0), datetime.datetime(2020, 2, 2, 0, 0)]
[datetime.datetime(2020, 2, 4, 0, 0)]


In [16]:
# Example for letters where we pass the ordering function

from string import ascii_lowercase

iterable = 'abcdfgilmnop'
ordering = ascii_lowercase.index
for group in more_itertools.consecutive_groups(iterable, ordering):
    print(list(group))

['a', 'b', 'c', 'd']
['f', 'g']
['i']
['l', 'm', 'n', 'o', 'p']


### Collapse

Flattens an iterable with multiple levels of nesting (e.g., a list of lists of tuples) into non-iterable types. You can specify how many levels you want to flatten and which types to flatten only.

In [17]:
# Example: Get all nodes of tree into flat list
tree = [40, [25, [10, 3, 17], [32, 30, 38]], [78, 50, 93]]  # [Root, SUB_TREE_1, SUB_TREE_2, ..., SUB_TREE_n]

print(list(more_itertools.collapse(tree)))
print(list(more_itertools.collapse(tree, levels=1)))

[40, 25, 10, 3, 17, 32, 30, 38, 78, 50, 93]
[40, 25, [10, 3, 17], [32, 30, 38], 78, 50, 93]


### MAP_REDUCE

Return a dictionary that 
1. maps the items in an *iterable* to categories defined by *keyfunc*, 
2. transforms them with *valuefunc*, 
3. and then summarizes them by category with *reducefunc*.

*valuefunc* defaults to the identity function if it is unspecified. If *reducefunc* is unspecified, no summarization takes place. (Omitting certain functions can be useful to produce intermediate steps in the MapReduce process, as shown below.)

In [23]:
data = 'This sentence has words of various lengths in it, both short ones and long ones'.split()

keyfunc = lambda x: len(x)

result = more_itertools.map_reduce(data, keyfunc)

pprint(result)

defaultdict(None,
            {2: ['of', 'in'],
             3: ['has', 'it,', 'and'],
             4: ['This', 'both', 'ones', 'long', 'ones'],
             5: ['words', 'short'],
             7: ['various', 'lengths'],
             8: ['sentence']})


In [24]:
valuefunc = lambda x: 1

result = more_itertools.map_reduce(data, keyfunc, valuefunc)

pprint(result)

defaultdict(None,
            {2: [1, 1],
             3: [1, 1, 1],
             4: [1, 1, 1, 1, 1],
             5: [1, 1],
             7: [1, 1],
             8: [1]})


In [26]:
reducefunc = sum

result = more_itertools.map_reduce(data, keyfunc, valuefunc, reducefunc)

pprint(result)

defaultdict(None, {4: 5, 8: 1, 3: 3, 5: 2, 2: 2, 7: 2})


In [None]:
### ...

In [None]:
### Side_effect

