# Iteration: the untold story

## Inside Python's Iteration Protocol

### Naomi Ceder
#### 2021-06-23 

#### Former chair, Python Software foundation

#### https://naomiceder.tech, @naomiceder**

### Iteration, the untold story

Description: How the iteration protocol works in Python

Topics:
* Intro/What makes iteration special in Python
* The iteration protocol
   * iterables - create a functioning iterable class (live coding)
   * iterators - create a functioning iterator class
   * generators - create a iterator with a generator function
* Conclusions & Questions

Level: This should be understandable for Python beginners and interesting for more advanced Pythonistas.

## Before we start 

This notebook can (will) be found at https://github.com/nceder/talks

*The Quick Python Book*, 3rd ed - http://bit.ly/quick-python (variables project is free)

*Exploring Python Fundamentals*, Manning liveproject - https://www.manning.com/bundles/exploring-python-fundamentals-pt1-ser


## Iteration = repetition with code and data

## Iteration protocol

### “Python’s most powerful useful feature”

-- Dave Beazley, "[Iterations of Evolution: The Unauthorized Biography of the For-Loop](https://www.youtube.com/watch?v=2AXuhgid7E4)"

In [None]:
# for loop (Python style)
a_list = [1, 2, 3, 4]

for item in a_list:
    print(item)

## Obvious, right?

It wasn't always so obvious...

## It *used* to be surprising

### Python and `for` loops

The `for` statement in Python differs a bit 
from what you may be
used to in C or Pascal.  Rather than always iterating over an
arithmetic progression of numbers (like in Pascal), or leaving the user
completely free in the iteration test and step (as C), Python's for 
statement iterates over the items of any sequence (e.g., a list
or a string), in the order that they appear in the sequence.

-- Python V 1.1 Docs, 1994

### And it works the same for different types
* `for key in a_dictionary:`
* `for char in a_string:`
* `for record in query_results:`
* `for line in a_file:`

etc...

## How does that work?

* **How does a `for` loop know the “next” item?**
* **How can `for` loops use so many different types?**
* **What makes an object “work” in a `for` loop?**

## Iteration protocol

* iteration in Python relies on a **protocol**, not types (from Python 2.2)
* It's a good example of Python's “duck typing” - anything that follows the protocol can be iterated over

### Iteration Protocol: 
* for iteration you need an **iterable** object
* and an **iterator** (which Python usually handles for you)

## iterable

An object capable of returning its members **one at a time.** Examples of iterables include **all sequence types** (such as `list`, `str`, and `tuple`) and **some non-sequence types** like `dict`, file objects, and objects of any **classes you define** with an **`__iter__()`** method or with a **`__getitem__()`** method that implements Sequence semantics.

Iterables can be used in a `for` loop and in many other places where a sequence is needed (`zip()`, `map()`, …). When an iterable object is passed as an argument to the built-in function `iter()`, it returns an **iterator** for the object. This iterator is good for **one pass** over the set of values. When using iterables, it is usually **not necessary to call `iter()`** or deal with iterator objects yourself. The `for` statement **does that automatically for you,** creating a **temporary unnamed variable** to hold the iterator for the duration of the loop. *See also iterator, sequence, and generator.*

--Python glossary

## Iterable
* returns members one at a time
* e.g, `list`, `str`, `tuple` (sequence types)
* any class with `__iter__()` method that returns iterator
* **or** any class with `__getitem__()` with sequence semantics
* `for` statement creates an unnamed iterator from iterable automatically

### An iterable...

must return an iterator when the `iter()` function is called on it.

#### There are 2 ways an object can return a iterator - it can
* have a **`__getitem__()`** method with Sequence semantics - i.e., access items by integer index in [ ].
* implement an **`__iter__()`** method that returns an iterator (more on this soon)


### Is it an iterable?
* Does it have an `__iter__()` method?

In [None]:
# check with hasattr
a_list = [1, 2, 3, 4]

hasattr(a_list, "__iter__")

* Does it have `__getitem__()` that is sequence compliant? (harder to decide)

## Let’s make an iterable -  `Repeater`

A object that can be iterated over and returns the same value for the specified number of times.

We'll implement it using `__getitem__()`

```
repeat = Repeater("hello", 4)

for i in repeat:
    print(i)

hello
hello
hello
hello
```

In [None]:
class Repeater:
    def __init__(self, value, limit):
        self.value = value
        self.limit = limit
        
    def __getitem__(self, index):
        if 0 <= index < self.limit:
            return self.value
        else:
            raise IndexError
        
        

In [None]:
repeat = Repeater("hello", 4)

# does it have an __iter__ method?
hasattr(repeat, "__iter__")

In [None]:
# __getitem__ with sequence semantics?

repeat[0]

In [None]:
# can the iter() function return an iterator?

iter(repeat)

In [None]:
# for loop

for item in repeat:
    print(item)

### Behind the scenes

* an iterator is being created from the `repeat` object
* it can return the items using integer indexes starting from 0
* it continues until an IndexError is thrown
* each time it is iterated on a new iterator is created and it starts from the beginning

In [None]:
# list comprehension

[x for x in repeat]

In [None]:
class Repeater:
    def __init__(self, value, limit):
        self.value = value
        self.limit = limit
        
    def __getitem__(self, index):   # The bit we need for an iterable
        if 0 <= index < self.limit:
            return self.value
        else:
            raise IndexError # only needed if we want iteration to end

### Yes, it's really that simple...

* ONLY the `__getitem__()` method was needed
* an IndexError is needed to end iteration


## But... what's an *Iterator*?

The Python `for` loop relies on being able to get a **next** item, but...

* the **iterable** doesn't know which item is next
* **the loop itself doesn't care** exactly where in the series that item is (or what type it is)
* the loop relies on the **iterator** to keep track of what's next
* any object that can do that can be iterated over, i.e., it is an **iterator**


An **iterator** has a `__next__()` method (in Python 2 `next()`) that tracks and returns the next item in the series, and you use the `next()` function to return the next item for iteration.

### iterator

An object representing **a stream of data**. Repeated **calls to the iterator’s `__next__()`** method (or passing it to the built-in function `next()`) **return successive items** in the stream. When **no more data are available a StopIteration exception is raised** instead. At this point, the iterator object is exhausted and any further calls to its `__next__()` method just raise StopIteration again... 

..Iterators are required to have an `__iter__()` method that returns the iterator object itself so **every iterator is also iterable** and may be used in most places where other iterables are accepted. One notable exception is code which attempts multiple iteration passes. **A container object (such as a list) produces a fresh new iterator each time** you pass it to the `iter()` function or use it in a for loop. Attempting this **with an iterator will just return the same exhausted iterator object** used in the previous iteration pass, making it appear like an empty container.

--Python glossary

### Iterator
* has `__next__()` method
* calls to `__next__()` method (`next()` function) return successive items
* raises `StopIteration` when no more data
* further calls just raise `StopIteration`
* must have `__iter__()` method, which returns self
* iterators are therefore iterables
* once exhausted they do not “refresh”

### Let’s make a iterator - `RepeatIterator`

* implement `__next__()` method to return next item
* implement `__iter__()` method to return itself

In [None]:
class RepeatIterator:
    def __init__(self, value, limit):
        self.value = value
        self.limit = limit
        self.count = 0
        
    def __next__(self):
        if self.count < self.limit:
            self.count += 1
            return self.value
        else:
            raise StopIteration
            
    def __iter__(self):
        return self

In [None]:
repeat_iter = RepeatIterator("Hi", 4)

# __getitem__ with sequence semantics?
repeat_iter[0]

In [None]:
 repeat_iter = RepeatIterator("Hi", 4) 
# does it have an __iter__ method?
 hasattr(repeat_iter, "__iter__")

In [None]:
# does it return next item using next() function?

next(repeat_iter)

In [None]:
# calling iter on it, returns object itself
print(repeat_iter)

repeat_iter_iter = iter(repeat_iter)
print(repeat_iter_iter)

In [None]:
# calling iter() on iterable always returns new iterator
print(repeat)
old_repeat_iter = iter(repeat)
print(old_repeat_iter)

In [None]:
# after 1 next(), how many repetitions left?


for item in repeat_iter:
    print(item) 


### So making an iterator is pretty easy, too...“
* `__next__()` method 
* `__iter__()` method that returns self
* “exhaustion” after one pass

In [None]:
# Let's loop again

for item in repeat_iter:
    print(item)


In [None]:
# one more next?
next(repeat_iter) 


### Behind the scenes
* `for` called `iter()` on repeat_iter, which returned itself (to anonymous var)
* `for` called `next()` on iterator to get values for loop
* `for` caught `StopIteration` and stopped iterating

### (but you probably want to use a generator instead)

## generator functions

Generator functions are functions that behave like iterators. 

* They save their state, so that they can know which is next
* They use the `yield` keyword, instead of `return` (`yield` makes a function a generator)
* generator functions return iterators

In [None]:
def repeat_gen(value, limit):
    for i in range(limit):
        yield value
    
repeat_gen_obj = repeat_gen("olá", 5)
print(repeat_gen_obj)
#for item in repeat_gen_obj: 
#    print(item) 
    

#for x in repeat_gen_obj:
#    print("x =", x)

#repeat_gen_obj = repeat_gen("olá", 5)
#print(repeat_gen_obj)#next(gen_ob)
#print(repeat_gen_obj)
#print(hasattr(repeat_gen_obj, '__next__'))
#print(hasattr(repeat_gen_obj, '__iter__'))  

### Behind the scenes

* executing generator function (with `yield`) creates generator object
* generator object is an iterator
* generator object saves state at each call of `yield`

## Iteration in Python

* is a **protocol** (since Python 2.2)
* requires an **iterable** to iterate over
* requires an **iterator** (often automatically created behind the scenes) to track what's **next**
* **iterators can be used as iterables,** but don't "renew"
* use **generator** functions to create iterators, rather than classes


## Resources

* [Python Tutorial - iterators](https://docs.python.org/2.7/tutorial/classes.html#iterators)
* [Python Tutorial - generators](https://docs.python.org/2.7/tutorial/classes.html#generators)
* [Python Tutorial - generator expressions](https://docs.python.org/2.7/tutorial/classes.html#generator-expressions)
* [Iterator types documentation](https://docs.python.org/dev/library/stdtypes.html#iterator-types)
* [Iterators, Functional Programming HOWTO](https://docs.python.org/dev/howto/functional.html#iterators)
* [Iterations of Evolution: The Unauthorized Biography of the For-Loop](https://www.youtube.com/watch?v=2AXuhgid7E4) - Dave Beazley, PyCon Pakistan 2017