# Iterators

## Iterable objects and iterators

You have probably noticed that most container objects can be looped over using a `for` statement. 

The `for` loop works with all **iterable** object (an object which one can iterate over).

In [2]:
for element in [1, 2, 3]:
    print(element)
    
print("-"*100)

with open("./P10-iterators.ipynb") as f:
    for line in f:
        print(line)
        if "id" in line:
            break
        

1
2
3
----------------------------------------------------------------------------------------------------
{

 "cells": [

  {

   "cell_type": "markdown",

   "id": "07601fbe",



An **iterable** object returns an **iterator** when passed to the **iter()** function. 

An **iterator** is an object, which is used to iterate over an *iterable* object using it's **\_\_next_\_()** method.

The **\_\_next_\_()** method is invoked, behind the scenes, via the function **next()** (the function `next()` itself is usually invoked automatically).

The following `for` loop:

In [83]:
d=[2,3,4]
for e in d:
    print(e, end=" ")

2 3 4 

Is equivalent to:

In [84]:
d=[2,3,4]

d_iterator=iter(d) #d_iterator=d.__iter__()
while True:
    try:
        e=next(d_iterator)# e=d_iterator.__next__()
        print(e, end=" ")
    except StopIteration:
        break

# sum(), map(), min(), max(), filter()

2 3 4 

Behind the scenes, the `for` statement calls the function **iter()** on the *iterable* object. 

The function *iter()* returns an **iterator** object that defines the method **next()**. The method *next()* accesses elements in the container one at a time.

When there are no more elements, *next()* raises a **StopIteration** exception which tells the `for` loop to terminate.

## The iterator protocol

To add an iterator to a class, you only have to define a **\_\_iter_\_()** method. 

Behind the scenes, the *iter()* function calls the *\_\_iter_\_()* method on the given object.

The *\_\_iter_\_()* method should return an object with a **\_\_next_\_()** method.

The *\_\_next_\_()* method is the one invoked by *next()*.

**Note**: If the class defines *\_\_next_\_()*, then *\_\_iter_\_()* can just return *self*:


In [9]:
class Reverse:
    "Iterator for looping over a sequence backwards"
    def __init__(self, data):
        self.data = data
    def __iter__(self):
        self.index = len(self.data)
        return self
    def __next__(self):
        if self.index == 0:
            raise StopIteration
        self.index = self.index - 1
        return self.data[self.index]
    
for char in Reverse('spam'):
    print(char,sep='')
print("-"*40)

d=Reverse([1,3,5,7,9])
for i in d:
    print(i,sep='')
print("-"*40)
for i in d:
    print(i,sep='')

m
a
p
s
----------------------------------------
9
7
5
3
1
----------------------------------------
9
7
5
3
1


## The itertools module

The **itertools** module, in the standard library, provides lot of interesting tools to work with iterators.

Here are some examples (this is not an exhaustive list of all available functions).

**chain()** – chains multiple iterators together.


In [14]:
import itertools

d1=[1, 2, 3]
d2=[4, 5, 6]
it1 = iter(d1)
it2 = iter(d2)
for e in itertools.chain(it1, it2):
    print(e, end=" ")
# or more directly:
print()
for e in itertools.chain(d1, d2):
    print(e, end=" ")

1 2 3 4 5 6 
1 2 3 4 5 6 

**zip_longest()** – this function accepts any number of iterables as arguments and a `fillvalue` keyword argument that defaults to `None`. This function returns an iterator that aggregates elements from each of the iterables. If the iterables are of uneven length, missing values are filled-in with `fillvalue`.

In [87]:
x = [1, 2, 3, 4, 5]
y = ['a', 'b', 'c']
print("zip_longest:")
for e in itertools.zip_longest(x, y):
    print(e, end="")
print("\nversus zip:")
for e in zip(x, y):
    print(e, end="")

zip_longest:
(1, 'a')(2, 'b')(3, 'c')(4, None)(5, None)
versus zip:
(1, 'a')(2, 'b')(3, 'c')

In [12]:
fr=["ciel", "mer", "terre"]
en=["sky", "sea", 'earth']
fr_en=dict(zip(fr,en))
print(fr_en)
en_fr=dict(zip(en,fr))
print(en_fr)

{'ciel': 'sky', 'mer': 'sea', 'terre': 'earth'}
{'sky': 'ciel', 'sea': 'mer', 'earth': 'terre'}


**combinations()** –  this function takes two arguments, an iterable *inputs* and a positive integer *n*, and produces an iterator over tuples of all combinations of *n* elements in *inputs*.

In [17]:
cards=[7,8,9,10,"jack","queen", "king", "ace"]
for e in itertools.combinations(cards, 2):
    print(e, end="")

(7, 8)(7, 9)(7, 10)(7, 'jack')(7, 'queen')(7, 'king')(7, 'ace')(8, 9)(8, 10)(8, 'jack')(8, 'queen')(8, 'king')(8, 'ace')(9, 10)(9, 'jack')(9, 'queen')(9, 'king')(9, 'ace')(10, 'jack')(10, 'queen')(10, 'king')(10, 'ace')('jack', 'queen')('jack', 'king')('jack', 'ace')('queen', 'king')('queen', 'ace')('king', 'ace')

**combinations_with_replacement()** –  this function returns successive n-length combinations of elements in the iterable allowing individual elements to have successive repeats.

In [18]:
cards=[7,8,9,10,"jack","queen", "king", "ace"]
for e in itertools.combinations_with_replacement(cards, 2):
    print(e, end="")

(7, 7)(7, 8)(7, 9)(7, 10)(7, 'jack')(7, 'queen')(7, 'king')(7, 'ace')(8, 8)(8, 9)(8, 10)(8, 'jack')(8, 'queen')(8, 'king')(8, 'ace')(9, 9)(9, 10)(9, 'jack')(9, 'queen')(9, 'king')(9, 'ace')(10, 10)(10, 'jack')(10, 'queen')(10, 'king')(10, 'ace')('jack', 'jack')('jack', 'queen')('jack', 'king')('jack', 'ace')('queen', 'queen')('queen', 'king')('queen', 'ace')('king', 'king')('king', 'ace')('ace', 'ace')

**permutations()** –  this function returns successive n-length permutations of elements in the iterable.

In [19]:
cards=[7,8,9,10,"jack","queen", "king", "ace"]
for e in itertools.permutations(cards, 2):
    print(e, end="")

(7, 8)(7, 9)(7, 10)(7, 'jack')(7, 'queen')(7, 'king')(7, 'ace')(8, 7)(8, 9)(8, 10)(8, 'jack')(8, 'queen')(8, 'king')(8, 'ace')(9, 7)(9, 8)(9, 10)(9, 'jack')(9, 'queen')(9, 'king')(9, 'ace')(10, 7)(10, 8)(10, 9)(10, 'jack')(10, 'queen')(10, 'king')(10, 'ace')('jack', 7)('jack', 8)('jack', 9)('jack', 10)('jack', 'queen')('jack', 'king')('jack', 'ace')('queen', 7)('queen', 8)('queen', 9)('queen', 10)('queen', 'jack')('queen', 'king')('queen', 'ace')('king', 7)('king', 8)('king', 9)('king', 10)('king', 'jack')('king', 'queen')('king', 'ace')('ace', 7)('ace', 8)('ace', 9)('ace', 10)('ace', 'jack')('ace', 'queen')('ace', 'king')

**counts()** –  this function counts, starting by default with the number 0. You can start counting from any number you like by setting the `start` keyword argument (defaults is 0). You can even set a `step` keyword argument to determine the interval between numbers returned (defaults is 1).
In some ways, `count()` is similar to the built-in `range()` function, but `count()` always returns an infinite sequence.

In [21]:
list(zip(itertools.count(start=1, step=2), ['a', 'b', 'c', 'd']))

[(1, 'a'), (3, 'b'), (5, 'c'), (7, 'd')]

**repeat()** –  this function returns, repetitively, the same value. You can set a stopping point by passing a positive integer as a second argument.

In [69]:
print(list(zip(itertools.repeat(10), ['a', 'b', 'c'])))
                                
for e in itertools.repeat(1, 5):
    print(e, end=" ")

[(10, 'a'), (10, 'b'), (10, 'c')]
1 1 1 1 1 

**cycle()** –  This function takes an iterable inputs as an argument and returns an infinite iterator over the values in inputs that returns to the beginning once the end of inputs is reached. 

In [70]:
print(list(zip(itertools.cycle((0,1,2)), ['a', 'b', 'c', 'd', 'e', 'f'])))

[(0, 'a'), (1, 'b'), (2, 'c'), (0, 'd'), (1, 'e'), (2, 'f')]


**accumulate()** – This function takes two arguments, an iterable *inputs* and a binary function *func* (a function with 2 arguments), and returns an iterator over accumulated results of applying *func* to elements of *inputs*.

In [71]:
import operator #numpy.cumsum(array)
list(itertools.accumulate([1, 2, 3, 4, 5], operator.add))

[1, 3, 6, 10, 15]

**product()** – This function takes any number of iterables as arguments and returns an iterator over tuples in the Cartesian product.

In [72]:
for e in itertools.product([1, 2], [10,20,30]):
    print(e, end=" ")

(1, 10) (1, 20) (1, 30) (2, 10) (2, 20) (2, 30) 

**islice()** – This function function works much the same way as slicing a sequence. You pass it an iterable, a starting, and stopping point, and, it returns a slice. You can optionally include a step value, as well. The biggest difference with a real slice is that `islice()` returns an iterator.

In [23]:
name="Jean-Philippe"
print(name[2:6])

for c in itertools.islice("Jean-Philippe", 2, 6):
    print(c, end=" ")

an-P
a n - P 

**filterfalse()** – This function takes two arguments: a function that returns True or False (called a predicate), and an iterable inputs. It returns an iterator over the elements in inputs for which the predicate returns False.

In [75]:
for c in itertools.filterfalse(lambda x: x <= 0, [0, 1, -1, 2, -2, 3, -10, 8]):
    print(c, end=" ")

1 2 3 8 

**takewhile()** – This function takes a predicate and an iterable inputs as arguments and returns an iterator over inputs that stops at the first instance of an element for which the predicate returns `False`.

**dropwhile()** – The dropwhile() function does the opposite of `takewhile()`. It returns an iterator beginning at the first element for which the predicate returns `False`.

In [24]:
for c in itertools.takewhile(lambda x: x < 3, [0, 1, 2, 3, 4, 5]):
    print(c, end=" ")
print("\n","-"*80, sep="")
for c in itertools.dropwhile(lambda x: x < 3, [0, 1, 2, 3, 4, 5]):
    print(c, end=" ")

0 1 2 
--------------------------------------------------------------------------------
3 4 5 

**groupby()** – This function takes an iterable inputs and a key to group by, and returns an iterator over tuples whose first components are keys and second components are iterators over the grouped data.

**Note**: As `groupby()` traverses the data, it aggregates elements until an element with a different key is encountered, at which point it starts a new group. When working with `groupby()`, you need to sort your data on the same key that you would like to group by. 

In [77]:
data = [{'name': 'Alan', 'age': 34},
        {'name': 'Marco', 'age': 34},
        {'name': 'Dylan', 'age': 15},
        {'name': 'Kevin', 'age': 17},
        {'name': 'Mathias', 'age': 34},
        {'name': 'Yohan', 'age': 17}
       ]

for key, grp in itertools.groupby(data, key=lambda x: x['age']):
    print(f'{key}: {list(grp)}')

print()
data=sorted(data, key = lambda x: x['age'])
for key, grp in itertools.groupby(data, key=lambda x: x['age']):
    print(f'{key}: {list(grp)}')

34: [{'name': 'Alan', 'age': 34}, {'name': 'Marco', 'age': 34}]
15: [{'name': 'Dylan', 'age': 15}]
17: [{'name': 'Kevin', 'age': 17}]
34: [{'name': 'Mathias', 'age': 34}]
17: [{'name': 'Yohan', 'age': 17}]

15: [{'name': 'Dylan', 'age': 15}]
17: [{'name': 'Kevin', 'age': 17}, {'name': 'Yohan', 'age': 17}]
34: [{'name': 'Alan', 'age': 34}, {'name': 'Marco', 'age': 34}, {'name': 'Mathias', 'age': 34}]
