<div style="position: relative;">
<img src="https://user-images.githubusercontent.com/7065401/98728503-5ab82f80-2378-11eb-9c79-adeb308fc647.png"></img>

<h1 style="color: white; position: absolute; top:27%; left:10%;">
     Advanced Python
</h1>
<h2 style="color: white; position: absolute; top:36%; left:10%;">
    Iterators, Generators, Context Managers, and Decorators
</h2>


<h3 style="color: #ef7d22; font-weight: normal; position: absolute; top:58%; left:10%;">
    David Mertz, Ph.D.
</h3>

<h3 style="color: #ef7d22; font-weight: normal; position: absolute; top:63%; left:10%;">
    Data Scientist
</h3>
</div>

## Generator Functions

Class-based iterables and iterators can be a powerful approach, but it is often much more compact to write iterables using *generator functions*.  A generator function looks much like a regular function, merely containing somewhere inside it at least one `yield` statement rather than `return` statements (technically, a final `return` is permitted as a way of raising `StopIteration`).

Let us look a completely trivial example quickly:

In [1]:
from typing import Iterable, Iterator

def genfunc():
    yield 1
    yield 2
    return 3
    yield 4

nums = genfunc()

In [2]:
print("Type of `genfunc`:", type(genfunc))
print("Type of `genfunc()`:", type(nums))
print("Is genfunc() an iterable?", isinstance(nums, Iterable))
print("Is genfunc() an iterator?", isinstance(nums, Iterator))

Type of `genfunc`: <class 'function'>
Type of `genfunc()`: <class 'generator'>
Is genfunc() an iterable? True
Is genfunc() an iterator? True


Let's use our iterable/iterator.  Notice that the `return` statement, no matter what value it "returns" actually only stops iteration.

In [3]:
for n in nums:
    print(n)

1
2


<img src="primes-vs-composites.png" align="right"/>

## A non-trivial generator

It may be a bit cliche to use, but an iterable over all the prime numbers still provides a nice example of an infinite iterable.  Notably, of course, it is simply not possible to put infinitely many items in a concrete collection like a list.  In fact, we construct this using a supporting generator function, which illustrates how they often combine nicely.

Here we implement the [Sieve of Eratosthenes](https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes), which works by successively striking out each multiple of a prime from the infinite sequence of all the natural numbers.  The *statefulness* of this generator involves keeping an internal list of discovered primes, so memory usage *will* grow as the iterator produces values.

In [4]:
from math import sqrt, ceil

def up_to(seq, lim):
    "Elements of an iterable not exceeding a limit"
    for n in seq:
        if n <= lim:
            yield n
        else:
            break
                
def get_primes():
    "Pretty good Sieve of Eratosthenes"
    # Skip the even nums, stop at sqrt(candidate)
    yield 2
    candidate = 3
    found = []
    while True:
        lim = int(ceil(sqrt(candidate)))
        if all(candidate % prime != 0 for prime in up_to(found, lim)):
            yield candidate
            found.append(candidate)
        candidate += 2

One thing to keep in mind about an iterable is that it only goes in one direction.  Once we have yielded a particular item, it is not necessarily possible to find the last one again.  In many contexts, such as a stream of data from an external source, this is the most that can be possible.  Of course, you are free to cache previous elements in your own code, if that is needed.

Let's read a few primes, then read a few more after that to illustrate.

In [5]:
# All infinitely many prime numbers
primes = get_primes()

In [6]:
n = 1
while n < 5:
    p = next(primes)
    print(f"Sequence {n}; Prime={p}")
    n += 1

Sequence 1; Prime=2
Sequence 2; Prime=3
Sequence 3; Prime=5
Sequence 4; Prime=7


We can keep looking for more primes, but the ones we have seen are "used up."

In [7]:
while n < 10:
    p = next(primes)
    print(f"Sequence {n}; Prime={p}")
    n += 1  

Sequence 5; Prime=11
Sequence 6; Prime=13
Sequence 7; Prime=17
Sequence 8; Prime=19
Sequence 9; Prime=23


Creating a new copy of "all the primes" is as easy as calling the generator function again.  Of course, in some cases, the underlying stream may have changed, so making a new copy will not necessarily produce the same answers.

We need to take care not to iterate dirctly over infinite iterators without an "escape hatch."  Various approaches are possible, but if you do not include one, your program will not terminate until you manually kill it.

In [8]:
for p in get_primes():
    if p > 100:
        break
    print(p, end=' ')

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 

Of course, we actually already provided a good way to escape with the `up_to()` generator function defined above.

In [9]:
for p in up_to(get_primes(), 100):
    print(p, end=' ')

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 

<img src="click-counter.jpg" align="right"/>

## Injecting data

As well as maintaining internal state between yields, a generator function can have state injected into it.  This also uses the `yield` statement, but combines it with an assignment within the function body.  

This course will not delve into it, but allowing values to be injected into generators, enables an easy way to construct *coroutines* in Python.  That is, one generator can suspend at a certain yield, and another generator can take over execution, perhaps with data that became available via the first generator.  Over several Python versions, and about a decade, this capability evolved into the `asyncio` standard library module and the introduction of the `async` and `await` keywords to enable a full-fledged coroutine framework.  But that is a different INE course.

For a simple example, let's emulate in Python one of those physical clickers used by attendants to measure occupancy during concerts and other events.  Often we want a single click to indicate one person entered, but sometimes we would like to mark that a group of several people entered at once.  This iterator will become exhausted once we reach a maximum occupancy (by default, the largest 64-bit integer, around 9 quintillian). This is not an *infinite* iterator, but it can be a very, very long sequence.

In [10]:
import sys

def counter(max_occupancy=sys.maxsize):
    "Counter for positive stepwise accumulation"
    count=0
    while True:   # loop forever
        add = yield count
        if add is not None:
            if not isinstance(add, int) or add <= 0:
                print(f"Count by positive amount, not {add}", file=sys.stderr)
            else:
                count += add
        if count >= max_occupancy:
            yield count
            print(f"Maximum occupancy exceeded: {count}")
            return

Let's count.

In [11]:
audience = counter(75)  # the theater holds 75
next(audience)

0

We can inquire as to the number currently seated with `next()` as much as we like. This inquiry does not itself change the count.

In [12]:
for _ in range(5):
    print(next(audience), end=' ')

0 0 0 0 0 

The interface does *some* error checking.

In [13]:
audience.send(-1)
audience.send(1)

Count by positive amount, not -1


1

Let's allow a few people in one-by-one.

In [14]:
for seated in audience:
    print(seated, end=" ")
    if seated >= 10:
        break    
    audience.send(1)

1 2 3 4 5 6 7 8 9 10 

Let's let in a group of 5 people.

In [15]:
seated = audience.send(5)
print("Currently seated:", seated)

Currently seated: 15


We will start letting in small groups of various sizes.

In [16]:
from random import choice
for seated in audience:
    group_size = choice([1, 3, 5, 7])
    print("Currently seated:", seated)
    audience.send(group_size)

Currently seated: 15
Currently seated: 16
Currently seated: 21
Currently seated: 24
Currently seated: 27
Currently seated: 32
Currently seated: 33
Currently seated: 38
Currently seated: 41
Currently seated: 42
Currently seated: 49
Currently seated: 56
Currently seated: 61
Currently seated: 66
Currently seated: 67
Currently seated: 68
Currently seated: 73
Maximum occupancy exceeded: 80
