# Generators

## Resources:
 * [Python Generators at Programiz.com](https://www.programiz.com/python-programming/generator)

## What is Generator?

**Generator** is a function that **returns an iterator** which we can iterate over (one value at a time).

Tecnically speaking, to create a generator function we have to define a regular function with **at least one `yield`** statement which is similar to `return` statement. The difference is the following:
 * `return` terminates the function entirely
 * `yield` pauses the function saving all its states and if called again continues from these states
 
Note, a generator function may contain multiple `yield` and `return` statements.

Example of a simple generator function:

In [42]:
def gnr3():
    '''A simple generator function
    with 3 `yield` statements'''
    n = 1
    print("1st yield")
    yield n # <-- as soon as we use `yield` in definition - function becomes generator
    
    n += 1
    print("2nd yield")
    yield n
    
    n += 1
    print("3rd yield")
    yield n

g = gnr3() # <-- function is called, but no execution started
print(g) # <-- function returns iterator - object of 'generator' type
print(next(g)) # <-- get 1st item of the iterator
print(next(g)) # <-- get 2nd item of the iterator
# or
print(g.__next__())   # <-- get 3rd item of the iterator
print(g.__next__())   # <-- no items left, will raise error `StopIteration`

<generator object gnr3 at 0x0000028431870E48>
1st yield
1
2nd yield
2
3rd yield
3


StopIteration: 

Note, in the example above value of variable `n` is remembered between each call.

Properties of iterator are applied to output of generator function since it returns iterator:
 * generator output can be iterated only once; to restart the process we need to reinitialize generator object
 * generator output can be used in `for` loop directly as in the following example

In [22]:
for i in gnr3():
    print(i)

1st yield
1
2nd yield
2
3rd yield
3


## Generator Expressions

**Generator expression** creates an **anonymous generator**, very similar to as `lambda` function cretaes anonymous function.

Syntax of the generator expression is very similar to list comprehension, as in the follows:
```
(expression for item in iterable)
```
notice **round** brackets.

Note, the major difference between a list comprehension and a generator expression is that while list comprehension produces the entire list, **generator expression produces one item at a time**.

In [46]:
lst = [1,2,3,4,5,6]
g = (i+1 for i in lst) # <-- use generator expression
print (g)
print(next(g)) # <-- get 1st item of the iterator
print(next(g)) # <-- get 2nd item of the iterator
# or
print(g.__next__()) # <-- get 3rd item of the iterator
print(g.__next__()) # <-- get 4th item of the iterator
print(*g) # <-- get all items left

<generator object <genexpr> at 0x0000028431870F48>
2
3
4
5
6 7


We can pass generator expression to a function list to create a list:

In [47]:
lst = [1,2,3,4,5,6]
g = (i+1 for i in lst) # <-- use generator expression
lst_new = list(g)
print(lst_new)

[2, 3, 4, 5, 6, 7]


Exactly as in list comprehension the generator expression could have:
 * **multiple** `for...in` statements
 * conditions applied to iterable item
 * condition in the form `if...else` could be applied as a part of `expression`
 * generator expression could be used **inside** another generator expression
 * generator expression works for **any iterable object**

## Why We Need Generators?

Generators are **easy to implement**: 
 * methods like `__iter__()` and `__next__()` are implemented automatically
 * when the function terminates, `StopIteration` is raised automatically on further calls

Let's compare definitions of
 * iterable class which gives us next power of 2 in each iteration
 * generator function with the same properties
 
The example is taken from [programiz.com](https://www.programiz.com/python-programming/generator):

In [23]:
class PowTwo:
    """Class to implement an iterator
    of powers of two"""

    def __init__(self, max = 0):
        self.max = max

    def __iter__(self):
        self.n = 0
        return self

    def __next__(self):
        if self.n <= self.max:
            result = 2 ** self.n
            self.n += 1
            return result
        else:
            raise StopIteration

x = PowTwo(3) # <-- returns iterable object
it = iter(x) # <-- creates iterator

print(next(it))
print(next(it))
# or
print(it.__next__())
print(it.__next__())

1
2
4
8


In [27]:
def PowTwoGnr(max = 0):
    """Generator function to implement an iterator
    of powers of two"""
    n = 0
    while n <= max:
        yield 2 ** n
        n += 1

it = PowTwoGnr(3) # <-- explicitly returns iterator 

print(next(it))
print(next(it))
# or
print(it.__next__())
print(it.__next__())

1
2
4
8


Note, generator function gives the same result but is much simpler to implement and most boiller plate code is not explicit as in iterable class definition.

Generators are **memory efficient**:
 * normal function to return a sequence will create the entire sequence in memory before returning the result
 * generator function produces one item at a time

Since generator produces only one item at a time, it can represent **infinite stream** of data which otherwise cannot be stored in memory. For instance, infinite stream of odd numbers:

In [30]:
def AllOdd():
    n = 1
    while True:
        yield n
        n += 2

odd = AllOdd()
print(next(odd))
print(next(odd))
# or
print(odd.__next__())
print(odd.__next__())

1
3
5
7


## Pipeline Generator

Generators are lazily evaluated producing items only when asked for therefore they can be used to pipeline a series of operations as in the following example:

In [38]:
with open('../data/test/test_14L.txt') as file:
    letter_2 = (line[1] for line in file) # <-- creates iterator over letters in 2nd place in each line
    vowel = (l for l in letter_2 if l in ['a','o','i','u','e']) # <-- creates iterator ove vowel letters only
    print(*vowel) # <-- evaluates

o o u o o o o


Note, instaed of a small file we can run this loop over entire corpus of English literature and it will take very little memory since we never read it all but access line by line and letter by letter.