## Iteration Protocols

In [5]:
iterable = ['Spring', 'Summer', 'Autumn', 'Winter']
print(type(iterable))

<class 'list'>


In [6]:
iterator = iter(iterable)
print(type(iterator))

<class 'list_iterator'>


In [7]:
next(iterator)

'Spring'

In [8]:
next(iterator)

'Summer'

In [9]:
next(iterator)

'Autumn'

In [10]:
next(iterator)

'Winter'

In [12]:
# after the last element, StopIteration exception is raised
next(iterator)

StopIteration: 

### Simple usage

In [13]:
def first(iterable):
    iterator = iter(iterable)
    try:
        return next(iterator)
    except StopIteration:
        return "Your iterable is empty :("

In [14]:
first([10, 9, 8])

10

In [17]:
first({'ham': 1, 'ls': 2, 'usi': 3})

'ham'

In [18]:
first([])

'Your iterable is empty :('

## Generators functions

* Iterables defined by functions
* Lazy evaluation, i.e, only compute the next value on demand
* Can model infinite sequences of values (e.g. data from sensor or log files)
* Can be used together to compose pipelines

Generator function:

> A function that uses `yield` keyword at least once in its definition.

They may include return statements.

In [19]:
# Example
def gen123():
    yield 1
    yield 2
    yield 3

In [21]:
g = gen123()
g

<generator object gen123 at 0x00000268E518CA50>

A generator object is an iterator. So we can use the iterator protocol to iterate over it.

In [22]:
next(g)

1

In [23]:
next(g)

2

In [24]:
next(g)

3

In [25]:
next(g)

StopIteration: 

It works just like any other iterator.

**As iterators must also be iterables, generators can be used in all the usual Python constructs which expects iterable objects, such as for loops.**

In [26]:
for v in gen123():
    print(v)

1
2
3


## Maintaining States in Generators

In [40]:
def take(num_elem, iterable):
    counter = 0
    for item in iterable:
        print('take :: counter:', counter)
        if counter == num_elem:
            return
        counter += 1
        yield item

In [41]:
def distinct(iterable):
    seen = set()
    for item in iterable:
        print('distinct :: item:', item)
        if item in seen:
            continue
        yield item
        seen.add(item)

In [48]:
# Using two generators to find 3 first distinct elements in iterable
def run_pipeline():
    items = [3, 3, 6, 6, 6, 2, 1, 1]
    for unique_item in take(3, distinct(items)):
        print(f'\n{unique_item}\n')

In [49]:
run_pipeline()

distinct :: item: 3
take :: counter: 0

3

distinct :: item: 3
distinct :: item: 6
take :: counter: 1

6

distinct :: item: 6
distinct :: item: 6
distinct :: item: 2
take :: counter: 2

2

distinct :: item: 1
take :: counter: 3


## Laziness and Infinite Sequences

Generators are **lazy**, meaning that computation only happens when the next result is requested.

They only do enough work to produce requested data.

This allows them to model **infinite (or very large) sequences**. 

The data is not stored as a whole in the memory, but only as requested.

Example of large sequences:
* Sensor readings
* Mathematical sequences
* Contents of large files

**Implementation Example**

Let's implement an infinite mathematical sequence called Lucas series.

The first two elements are 2 and 1. The next elements are the sum of the two previous ones.

In [1]:
def lucas():
    yield 2
    a, b = 2, 1
    while True:
        yield b
        a, b = b, a+b

In [7]:
for x in lucas():
    if x < 1e6:
        print(x, end=', ')
    else:
        break

2, 1, 3, 4, 7, 11, 18, 29, 47, 76, 123, 199, 322, 521, 843, 1364, 2207, 3571, 5778, 9349, 15127, 24476, 39603, 64079, 103682, 167761, 271443, 439204, 710647, 

## Generator expressions

Single line expression similar to list/set comprehension.

In [12]:
million_squares = (x*x for x in range(1, int(1e6)+1))
million_squares

<generator object <genexpr> at 0x0000019B8880C7B0>

In [13]:
print(list(million_squares)[-10:])

[999982000081, 999984000064, 999986000049, 999988000036, 999990000025, 999992000016, 999994000009, 999996000004, 999998000001, 1000000000000]


In [14]:
print(list(million_squares)[-10:])

[]


If you used the generator once, it becomes empty. 

To recreate it you must execute the expression again.

In [16]:
# Using a function with a generator expression
sum(x*x for x in range(1, int(1e6)+1))

333333833333500000

Fast, doesn't use large memory space because of lazy evaluation of generators.

OBS: the parenthesis () inside a function call are optional.

In [30]:
# Now a function with a condition
sum(x for x in range(1, 1001) if x%2==0), sum(x for x in range(1, 1001) if x%3==0)

(250500, 166833)