Deprecated --> need an update........

# What are generators and iterators in Python ?

 > Why do I want iterators/generators, why not use lists? - because I can process data that is too big for memory.

Source: <https://www.reddit.com/r/Python/comments/40idba/easy_way_to_make_an_iterator_from_a_generator_in/>

An iterator is a class which implements `__next__`. An iterable is a class which implements `__getitem__`. Source <https://stackoverflow.com/questions/9884132/what-exactly-are-iterator-iterable-and-iteration>.

 * **A generator** is a function or a class method which yields items (indefinitely or not).
 * **An iterator** is a class which implements `__iter__` and `__next__`. `__iter__` must return a generator which yield items and `__next__` must return an item (and others at each call).

There are also iterable which implements `__getitem__` and `__len__`

TODO An iterator and its `__next__` method is automatically used as a generator when the `__iter__` method is called.

TODO https://stackoverflow.com/questions/9884132/what-exactly-are-iterator-iterable-and-iteration

TODO https://www.quora.com/Whats-the-difference-between-iterators-and-generators-in-Python

## Generators

We define a simple generator (a function which yield elements):

In [1]:
def my_generator():
    for i in range(3):
        yield i

When calling this function, the generator will return several values and continue until it will reach the end. We can iterate it, either using a for loop:

In [2]:
items = my_generator()
for item in items:
    print(item)

0
1
2


Or using the next function:

In [3]:
items = my_generator()
hasMoreItems = True
while hasMoreItems:
    try:
        item = next(items)
        print(item)
    except StopIteration:
        hasMoreItems = False

0
1
2


In that case we see the value of the `yield` statement: the generator funct will stop at each `yield` until `next` is called by the user (it can take a long time between each `next` call, so all items wait to be generated on-the-fly).

This way to iterate the generator (using the `next` call) is useful in cases you want to get next items at very specifics steps. For example if you need next items only when you receive a specific request and if this request contains the specific word "a": you just need to keep the `items` instance of the generator and call the next item when necessary.

## Iterators

We define an iterator class which only need to implement the `__iter__` and `__next__` methods:

In [4]:
class MyIterator:
    def __init__(self):
        self.current = -1
        self.maximum = 2
    def __iter__(self):
        return self
    def __next__(self): # next(self) in Python 2
        if self.current >= self.maximum:
            raise StopIteration
        else:
            self.current += 1
            return self.current

The class doesn't yield anything but return the next item each time the user of an instance of this class call the `__next__` method. An iterator class is usefull when you need to **provide specific methods** like `def get_current(self): return self.current`.

And we iterate it the same way:

In [5]:
# Using a for loop:
items = MyIterator()
for item in items:
    print(item)
# Using the next function:
items = MyIterator()
hasMoreItems = True
while hasMoreItems:
    try:
        item = next(items)
        print(item)
    except StopIteration:
        hasMoreItems = False

0
1
2
0
1
2


## More explanations

Generators and iterators are usefull to have a lazy list of items, which mean each item is generated and returned on-the-fly only when the user need it. If you have an infinite list of items, you no more need to generate it before. The value of genetors/iterators is they works like a Python list in a for loop. An other advantage is that if you need to generate items, you can just define a generator and use the `yield` statement instead of storing each item in a list and return it at the very end of the function. In the case of an iterator, it is even more relevant when you need to interact with the iterator during the generation of items (this can be done using the "next" iteration way).

## Convert a generator to a list

You basicaly cannot access to specific items without iterate a generator (or an iterator):

In [6]:
items = my_generator()
print(items[1])

TypeError: 'generator' object is not subscriptable

But you can cast the generator instance to a list:

In [7]:
items = list(my_generator())
print(items[1])

1


The problem is that the cast will consum all items, which can **take a long time** (depending on your items generation behind) and **maybe doesn't fit in RAM** (in case the total weight or your items is > 50Go for example).

## We cannot iterate several times

If you try to **iterate several time** a generator (or an iterator) instance, you will get only one batch of items:

In [8]:
items = my_generator()
for i in items: print(i)
for i in items: print(i)

0
1
2


The same for the iterator:

In [9]:
items = MyIterator()
for i in items: print(i)
for i in items: print(i)

0
1
2


The solution is **use the generator function** (or the iterator init) instead of the instance:

In [10]:
for i in my_generator(): print(i)
for i in my_generator(): print(i)

0
1
2
0
1
2


But imagine you need to **give your generator to a specific function** (from a specific library) which need to **pass several times over all items**. Most libraries take a generator instance (not the generator function itself), so it won't work in that case. For instance, [Doc2Vec from Gensim](https://radimrehurek.com/gensim/models/doc2vec.html) init take `documents` which is an iterable. If you stream your documents from disk, your genererator need to be multi-iterable for the algorithm to pass all docs several times. Moreover, imagine your generator take parameters:

In [11]:
def my_generator2(start, nbItems=10):
    for i in range(start, start + nbItems):
        yield i

All instances of your generator will be **specific to the params you give**: `items = my_generator2(100)`, `items = my_generator2(200)`... The solution is to **wrap your generator** or iterator in a iterator class which will retain args and kwargs. Here the class I use:

In [12]:
class AgainAndAgain():
    def __init__(self, generator_func, *args, **kwargs):
        self.generator_func = generator_func
        self.args = args
        self.kwargs = kwargs
    def __iter__(self):
        return self.generator_func(*self.args, **self.kwargs)

Source: <https://www.reddit.com/r/Python/comments/40idba/easy_way_to_make_an_iterator_from_a_generator_in/>

Now you can make any generator and iterator **multi-iterable** by giving the generator funct (or iterator class) and all args/kwargs:

In [13]:
items = AgainAndAgain(my_generator2, 100, nbItems=3)
for i in items: print(i)
for i in items: print(i)

100
101
102
100
101
102


The for loop will automaticaly call the `__iter__` method of `AgainAndAgain`, so the code above is equivalent to:

In [66]:
items = AgainAndAgain(my_generator2, 100, nbItems=3)
for i in iter(items): print(i)
for i in iter(items): print(i)

100
101
102
100
101
102


So in the case you want to iterate the `AgainAndAgain` instance, you will need to convert it to a 

# Breaking a for loop - do we continue or restart the generator/iterator ?

**The answer:** using an instance of a generator/iterator, we continue iterating

In the case you use a for loop to iterate a generator instance or an iterator instance, you will be able to continue:

In [14]:
items = my_generator2(0)
for i in items:
    print(i)
    break
for i in items:
    print(i)
    break

0
1


But, in order to be consistent with the list behavior which restart at each for loop begin:

In [15]:
a = [0, 1, 2]
for i in a:
    print(i)
    break
for i in a:
    print(i)
    break

0
0


you can wrap the generator in `AgainAndAgain`:

In [16]:
items = AgainAndAgain(my_generator2, 0)
for i in items:
    print(i)
    break
for i in items:
    print(i)
    break

0
0


## TODO Warning we need to call itemsGenerator = iter(items) each time we want to restart using next(). And the for loop automaticaly call next

In [47]:
items = AgainAndAgain(my_generator2, 3)

In [46]:
for i in items:
    print(i)
    break
for i in items:
    print(i)
    break

3
3


In [62]:
itemsGenerator = iter(items)

In [65]:
print(next(itemsGenerator))

5


Other sources:
 * <https://stackoverflow.com/questions/2776829/difference-between-pythons-generators-and-iterators/2776865#2776865>
 * <https://stackoverflow.com/questions/19151/build-a-basic-python-iterator>

Yo,

J'ai lu ton tuto sur les itérateurs et les générateurs, et j'ai quelques remarques. Déjà, j'ai trouvé que globalement c'était très bien parce que ça met bien en évidence leur utilisation concrète et les avantages (qui ne sont pas évidents à première vue) à les utiliser.

Il y a quelques trucs que tu pourrais améliorer cependant :

    Est-ce que yield est forcément lié à un générateur ? Tu peux avoir un yield dans autre chose qu'un générateur ? Même question pour next ? Peut-être plus insister sur le fait que yield et next sont liés.
    Est-ce que les itérateurs/générateurs sont liés à Python uniquement ou ce sont des objets qu'on peut retrouver dans d'autres langages ?
    J'ai l'impression que c'est pas encore totalement clair dans le tuto que quand tu parcours une liste, tu charges tout en RAM (si je ne m'abuse) et que du coup ça peut être pbtique dans différents cas de figure
    Si j'ai bien compris, un générateur est un itérateur, mais plus high-level, c'est bien ça ? Et le niveau encore au-dessus, ce sont les listes par exemple ? Une liste est un itérateur donc (c'est juste qu'on ne voit pas que derrière, il y a la classe, le __iter__, le __next__...) ?
    À quoi sert le __iter__ dans un itérateur ?
    Peut-être qu'à la toute fin, il serait pas mal de donner un exemple où tu as plusieurs paramétrisations différentes pour différentes instances du même générateur (tu dis plus haut que ça peut poser un souci et que faire un itérateur d'itérateur, ça résout tout)
    Il y a quelques coquilles au fil de la lecture, je pourrai te les indiquer quand on se verra IRL

++