# Iterables, Iterators and Generators

## Why sequences are iterable
Whenever the interpreter needs to iterate over and objects `x`, it calls `iter(x)`. The `iter` built-in fuction will:
1. Check if the object implements `__iter__` and if so calls it
2. if not, but `__getitem__` is, Python creates an iterator that attempts to fetch items in order, starting from index 0
3. If that fails, Python raises TypeError: object not iterable

So because all sequences in Python implement `__getitem__`, they are all iterable. However, it is recommended to implement `__iter__` anyway because this behaviour exists for backwards compatibility and may be removed in the future.

## Iterables vs Iterators
>iterable: Any object from which the built-in function `iter()` can obtain an iterator.

The standard interface for an iterator has 2 methods:
- `__next__`: Returns the next available item, raising `StopIteration` if there are no more items
- `__iter__`: returns self. Allows iterators to be used when an iterable is expected, for example in a for loop.

So iterators are also iterable, but iterables are not iterators (because they don't have a `__next__` method).

The best way to check if an object is an iterator is with `isinstance(x, abc.Iterator)`, which works even if it's not a real or virtual subclass of `abc.Iterator`, because it checks internally if the given object implements the `__next__` and `__iter__` methods.

There is no way to check whether there are remaining items (or how many), other to call `next()` and catch `StopIteration` when it happens. Also, it is not possible to reset a depleted iterator---you must call `iter()` on the iterable again.

It is a **bad idea** to turn an iterable (like `Sentence`) into an iterator by implementing a `__next__` method, because it should be possible to obtain multiple *independent* iterators from the same iterable, and each iterator must keep its own internal state.

In [13]:
# Implementing a for loop by hand
# For loop
s = 'ABC'
for c in s:
    print(c)
    
# cumbersome way
it = iter(s)
while True:
    try:
        c = next(it)
        print(c)
    except StopIteration:
        del it
        break

A
B
C
A
B
C


In [16]:
# First example: a Sentence class that is a Sequence and therefore is also an iterable, but indirectly.
import re
import reprlib

RE_WORD = re.compile('\w+')

class Sentence:
    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text)
        
    def __getitem__(self, index):
        return self.words[index]
    
    def __len__(self):
        return len(self.words)
    
    def __repr__(self):
        return f'Sentence({reprlib.repr(self.text)})'
    

In [20]:
# Turn into an explicit iterable by implementing the iterator protocol instead
class SentenceV2:
    
    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text)
        
    def __repr__(self):
        return f'Sentence({reprlib.repr(self.text)})'
    
    def __iter__(self):
        # This could be replaced with
        # return iter(self.words)
        for word in self.words:
            yield word

In [21]:
s = SentenceV2("My name is Cristobal and I'm in a plane")
print(s)
for w in s:
    print(w)

Sentence("My name is C...'m in a plane")
My
name
is
Cristobal
and
I
m
in
a
plane


## Generators

Any function that has a `yield` keyword in its body. It turns into a *generator fatory*---each time it's called it will return a new generator.

>Generators are iterators that produce the values of the expression passed to the `yield` statement

### Generator expressions
Like a list comprehension, but returning a generator instead of a list. same syntax as listcomps, but using `()` instead of `[]`.

In [26]:
words = 'my name is cristobal'.split()
l1 = [word for word in words if len(word)>2]
for word in l1:
    print(word)

g1 = (word for word in words if len(word)>2)
for word in g1:
    print(word)

name
cristobal
name
cristobal


## Generators in the standard library
Some generators in the standard library module `itertools`:

### `itertools.count`
An infinite generator that returns values from a given `start` at a given `step`.

In [34]:
import itertools

gen = itertools.count(start=0, step=2)

print(next(gen))
print(next(gen))

0
2


### `itertools.takewhile`

Returns a generator that consumes another generator, while a given predicate evaluates to true. A predicate is a one-argument function that returns a `bool`, and will be applied to each item in the input to determine whether the item is included in the output.

In [35]:
gen = itertools.takewhile(lambda x: x < 10, itertools.count(start=0, step=3))
print([x for x in gen])

[0, 3, 6, 9]
