<div style="position: relative;">
<img src="https://user-images.githubusercontent.com/7065401/98728503-5ab82f80-2378-11eb-9c79-adeb308fc647.png"></img>

<h1 style="color: white; position: absolute; top:27%; left:10%;">
     Advanced Python
</h1>
<h2 style="color: white; position: absolute; top:36%; left:10%;">
    Iterators, Generators, Context Managers, and Decorators
</h2>


<h3 style="color: #ef7d22; font-weight: normal; position: absolute; top:58%; left:10%;">
    David Mertz, Ph.D.
</h3>

<h3 style="color: #ef7d22; font-weight: normal; position: absolute; top:63%; left:10%;">
    Data Scientist
</h3>
</div>

# Itertools

The module `itertools` is a collection of powerful—and carefully designed—functions for performing *iterator algebra*.  That is, these permit *composition* of iterators in sophisticated ways while minimizing concrete instantiation of terms of iterable sequences. In addition to the basic functions in the module itself, the [module documentation](https://docs.python.org/3/library/itertools.html) provides a number of short recipes for additional functions using two or three of the basic module functions in combination. *Be aware that it is easy to get these recipes subtly wrong*. The third-party module `more_itertools` provides additional functions that are likewise designed to avoid common pitfalls and edge cases.

The basic goal of using the building blocks inside `itertools` is to avoid performing computations before they are required, to avoid the memory requirements of large collections, to avoid potentially slow I/O until strictly necessary, and so on. Iterators are lazy sequences rather than realized collections; when combined with functions or recipes in `itertools`, they retain this property.

In [None]:
from itertools import (
    accumulate, 
    count,
    tee,
    takewhile, 
    dropwhile, 
    islice, 
    cycle
)
from math import inf, sqrt, log, isclose

The built-in functions `zip()`, `enumerate()`, `filter()`, `range()`, and `map()` can be consided "honorary itertools" since they can also work with infinite or lazy iterators.  Built-ins like `all()`, `any()`, `sum()`, `min()`, `max()`, and `functools.reduce()` also act on iterables, but all of them, in the general case, need to exhaust the iterator rather than remain lazy.

## Massaging infinite streams of data

Let's create an infinite sequence similar to the primes from the last lesson.  Unlike primes, the Fibonacci sequence requires very little state (i.e. a list of primes found already), only the last two numbers of the sequence.  But like primes, it is an infinite sequence of numbers.

For this implementation, we build in an optional stopping point to more easily get just a finite number of them.

In [None]:
def fibonacci(max=inf, a=1, b=2):
    while a < max:
        yield a
        a, b = b, a+b

In [None]:
for n, f in zip(range(10), fibonacci()):
    print(f"Sequence {n}; Fib={f}")

In [None]:
for f_ln in map(log, fibonacci(max=60)):
    print(f_ln)

In [None]:
map(log, fibonacci(max=60))

## Combining tools

Here is a quick example of combining a few itertools functions. Let's keep a running sum of sum arbitrary iterator. We can create a single lazy iterator to generate both the current number and this sum.

In [None]:
def item_with_total(iterable):
    "Generically transform a stream of numbers into (n, num, running_sum)"
    s, t = tee(iterable)   # unpacking tuples
    yield from zip(count(), t, accumulate(s)) 

This new construct `yield from` is equivalent to:

```python
for n, item, total in zip(count(), t, accumulate(s)):
    yield n, item, total
```

We can apply our new generator function, `item_with_total()` to an arbitrary numeric iterable.

In [None]:
for n, item, total in item_with_total(range(101, 110)):
    print("%3d. Item: %3d; Total: %3d" % (n+1, item, total))

In [None]:
for n, item, total in item_with_total([37, 45, 22, 11, 86, 51]):
    print("%3d. Item: %3d; Total: %3d" % (n+1, item, total))

In [None]:
for n, fib, total in item_with_total(fibonacci(60)):
    print("%3d. Item: %3d; Total: %3d" % (n+1, fib, total))

In [None]:
item_with_total(fibonacci(60))

## Infinite sequences for convergent sums

Below, we represent a common alternating series whose sum converges to $\ln(2)$.  The sequence by itself is similar to those we have seen for Fibonacci numbers or primes.  But we can use `itertools` to do more with it.

$\ln(2) = \frac{1}{1} - \frac{1}{2} + \frac{1}{3} - \frac{1}{4} + \frac{1}{5} - \cdots $

In [None]:
def ln2_terms():
    sign = 1.
    for denom in count(1):
        yield sign/denom
        sign *= -1

In [None]:
log(2), list(islice(ln2_terms(), 0, 6))

We can rephrase this function a bit more concisely and idiomatically (should be slightly faster also).

In [None]:
def ln2_terms():
    for denom, sign in enumerate(cycle([1.,-1.]), 1):
        yield sign/denom

list(islice(ln2_terms(), 0, 6))

We might also spell this using (an infinite) generator comprehension.

In [None]:
terms = (sign/denom for (denom, sign) in enumerate(cycle([1.,-1.]), 1))
list(islice(terms, 0, 6))

In [None]:
(sign/denom for (denom, sign) in enumerate(cycle([1.,-1.]), 1))

### Measure convergence

We can use a bit of functional style to define a few convenience functions.

In [None]:
from functools import partial
close_to_ln2 = partial(isclose, log(2), rel_tol=0.01)
close_to_ln2(0.8)

In [None]:
close_to_ln2(0.6931)

In [None]:
far_from_ln2 = lambda x: not close_to_ln2(x)
far_from_ln2(0.8)

In [None]:
running_sum = accumulate(ln2_terms())
for x in takewhile(far_from_ln2, running_sum):
    print(x)

Actually, that does not converge all that quickly!  The default delta of `1e-09` for `isclose` takes quite a lot of elements before it reaches floating point maximum accuracy.  How many exactly?

In [None]:
def val_not_close_to(target, rel_tol):
    def compare(tup):
        return not isclose(target, tup[1], rel_tol=rel_tol)
    return compare

# Use dropwhile() to exhaust some elements from infinite sequence
close_nums = dropwhile(val_not_close_to(log(2), 1e-6), 
                       enumerate(accumulate(ln2_terms())))
list(islice(close_nums, 0, 3))

In [None]:
list(islice(close_nums, 0, 3))

In [None]:
next(close_nums)

In [None]:
%%time
# Let's get something more precise than 1-in-millionth error
# ... it'll take a while!
within_e8 = dropwhile(val_not_close_to(log(2), 1e-8), 
                      enumerate(accumulate(ln2_terms())))
within_e8

In [None]:
%time next(within_e8)

In [None]:
%time next(within_e8)

Probably time to hit our math textbooks to find a faster convergence if we want a 1-in-a-billionth error.  Clearly the `math` module has a faster method of taking natural logs.