# Iterables, iterators and generators

So far we have used for loops with many different kinds of things.

```python
>>> for name in ['theelous3', 'RubyPinch', 'go|dfish']:
...     print(name)
...
theelous3
RubyPinch
go|dfish
>>> for letter in 'abc':
...     print(letter)
...
a
b
c
>>>
```



For looping over something is one way to **iterate** over it. 

Some other things also iterate, for example, `' '.join(['a', 'b', 'c'])` iterates
over the list `['a', 'b', 'c']`. 

If we can iterate over something, then that "something" is called **iterable**. 

For example, strings and lists are
iterable, but integers and floats are not.

```python
>>> for thing in 123:
...     print(thing)
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'int' object is not iterable
>>>
```



# Iterators

Lists and strings don't change when we iterate over them.

```python
>>> word = 'hi'
>>> for character in word:
...     print(character)
...
h
i
>>> word
'hi'
>>>
```

> **Definition**: Iterators are iterables that remember their position.

#### Examples of Iterators

So far, we've been already using **iterators** even if we did not call them with their names.

This is because most of the time iterators have been used anonimously withing `for` loops
(isn't this the ultimate use of iterators for the majority of cases, anyway?).

These functions are (in order of appearance):

* `enumerate(iterable)`
* `range(start, end, step)`
* `dictionary.keys()`
* `dictionary.values()`
* `dictionary.items()`

However, it is always possible to call iterators explicitly, and get an **iterator** object in return.

The very same object can be used in a for loop.

```python
>>> e = enumerate('hello')
>>> for pair in e:
...     print(pair)
...
(0, 'h')
(1, 'e')
(2, 'l')
(3, 'l')
(4, 'o')
>>> for pair in e:
...     print(pair)
...
>>>
```

As reported in the exampl above, the **iterator** actually rememebers the current position. 
So, Iterators can only be used once, so we need to create a new iterator if
we want to do another for loop.

In other words: 

> Once the iterator has been fully consumed, there is no way to get recommence from the beginning
> of the sequence, without creating a new iterator object.

**Note**: **All iterators are iterables, but not all iterables are iterators.**

> **Question**: Any idea for that? What is an example of an iterables that is **not** an iterator?

### Iterating manually

Iterators have a magic method called `__next__` that gets next value and
moves the iterator forward.

**Note**: Please remember this when we will talk about Classes, objects, and magic methods.

```python
>>> e = enumerate('abc')
>>> e.__next__()
(0, 'a')
>>> e.__next__()
(1, 'b')
>>> e.__next__()
(2, 'c')
>>>
```



There's also a built-in `next()` function that does the same thing:

```python
>>> e = enumerate('abc')
>>> next(e)
(0, 'a')
>>> next(e)
(1, 'b')
>>> next(e)
(2, 'c')
>>>
```

In this example, `e` remembers its position, and every time we call `next(e)` it
gives us the next element and moves forward. 

When it has no more values to give us, calling `next(e)` raises a `StopIteration` Exception:

```python
>>> next(e)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>>
```



There is usually not a good way to check if the iterator is at the end,
and it's best to just try to get a value from it and
[**catch**](../basics/exceptions.md#catching-exceptions) `StopIteration`.

**That's actually what `for` loops do.**

### Converting to iterators

Now we know what iterating over an iterator does. But how about
iterating over a list or a string? 

They are **not** iterators, so we can't call `next()` on them:

```python
>>> next('abc')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' object is not an iterator
>>>
```

There's a built-in function called `iter()` that converts anything
iterable to an iterator.

```python
>>> i = iter('abc')
>>> i
<str_iterator object at 0x7f987b860160>
>>> next(i)
'a'
>>> next(i)
'b'
>>> next(i)
'c'
>>> next(i)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>>
```

Calling `iter()` on anything non-iterable gives us an error.

```python
>>> iter(123)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'int' object is not iterable
>>>
```

Finally, if we try to convert an iterator to an iterator using `iter()` we just
get back the same iterator.

---


# Generators

It's possible to create a custom iterator with a class that defines an
`__iter__` method that returns self and a `__next__` method that gets
the next value. 

I'm not going to talk about it now because there's a
much simpler way to implement iterators. 

Let's make a function that
creates an iterator that behaves like `iter([1, 2, 3])` using the
`yield` keyword:

```python
>>> def first_generator():
...     yield 1
...     yield 2
...     yield 3
...
>>>
```

**Note about syntax**: We can only `yield` inside a function, yielding elsewhere raises an
error.

```python
>>> yield 'hi'
  File "<stdin>", line 1
SyntaxError: 'yield' outside function
>>>
```



Let's try out our `first_generator` function and see how it works.

```python
>>> first_generator()
<generator object first_generator at 0xb723d9b4>
>>>
```

Putting a `yield` anywhere in a function makes it return **generators**.
**Generators are iterators** with some more features that we don't need
to care about.

The **generator** that `simple_generator` returns work just like other iterators:

```python
>>> g = simple_generator()
>>> g
<generator object simple_generator at 0xb72300f4>
>>> next(g)
1
>>> next(g)
2
>>> next(g)
3
>>> next(g)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> for number in simple_generator():
...     print(number)
...
1
2
3
>>>
```



This whole thing may feel kind of insane. If we add some parts between
the yields, when do they run? How does Python know when to run what?

Let's find out.

```python
>>> def printygen():
...     print("starting")
...     yield 1
...     print("between 1 and 2")
...     yield 2
...     print("between 2 and 3")
...     yield 3
...     print("end")
...
>>> p = printygen()
>>>
```

That's weird! We called it, but it didn't print "starting"!

Let's see what happens if we call `next()` on it.

```python
>>> got = next(p)
starting
>>> got
1
>>>
```

Now it started, but it's frozen! It's just stuck on that `yield 1`.

An easy way to think about this is to compare it to our computers.
When we suspend a computer it goes into some kind of stand-by mode,
and we can later continue using the computer all of our programs are
still there just like they were when we left.

A similar thing happens here. Our function is running, but it's just
stuck at the yield and waiting for us to call `next()` on it again.

```python
>>> next(p)
between 1 and 2
2
>>> next(p)
between 2 and 3
3
>>> next(p)
end
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>>
```


Here's a drawing of what's going on:

![A picture of printygen.](../images/freeze-melt.png)

The good news is that **usually we don't need to worry about when
exactly the parts between the yields run**. Actually we don't even need
to use `iter()` and `next()` most of the time, but I think it's nice to
know how for loops work.

`yield` is useful when we want the function to output so many things
that making a list of them would be too slow or the list wouldn't fit in
the computer's memory. So instead of this...

```python
def retrieve_results():
    results = []
    # code that appends things to result
    return results
```

...we can do this:

```python
def lazy_retrieve_ results():
    # code that yields stuff
```

Both of these functions can be used like this:

```python
for thing in retrieve_results():
    # do something with thing
```

however the **lazy** version is more efficient.

It's actually possible to create an iterator that yields an infinite
number of things:

```python
>>> def count():
...     current = 1
...     while True:
...         yield current
...         current += 1
...
>>> c = count()
>>> next(c)
1
>>> next(c)
2
>>> next(c)
3
>>> next(c)
4
>>>
```

[The itertools module](https://docs.python.org/3/library/itertools.html)
contains many useful things like this. For example, `itertools.count(1)`
does the same thing as our `count()`.

### Take away message:

Python is a language with **batteries included**: _iow_ what is called the 
**Python standard library** is massive and contains a lots of modules with functions and solutions 
for usually whatsoever problem.

So, **if and only if** you know what you're doing (meaning "you know already **how** what you're 
trying to do (should) work, and you would be able to implement it, in principle)  
it is better to resuse _existing_ implementations, **without reinventing the wheel**.

## Summary

- An iterable is something that we can for loop over.
- An iterator is an iterable that remembers its position.
- For loops create an iterator of the iterable and call its `__next__`
    method until it raises a StopIteration.
- Functions that contain yields return generators. Calling `next()` on a
  generator runs it to the next yield and gives us the value it yielded.
- [The itertools module](https://docs.python.org/3/library/itertools.html)
  contains many useful iterator-related things.

---

# List Comprehensions

>I thought carefully before including this section. If you are brand new to programming, 
> list comprehensions may look confusing at first. 

List comprehensions are a **shorthand way** of creating and working with lists. 

It is good to be aware of list comprehensions, because you will see them in other people's code, 
and they are really useful when you understand how to use them. 

That said, if they don't make sense to you yet, don't worry about using them right away. 

When you have worked with enough lists, you will want to use comprehensions. 

For now, it is good enough to know they exist, and to recognize them when you see them. 

If you like them, go ahead and start trying to use them now.

### Numerical Comprehensions

Let's consider how we might make a list of the first ten square numbers. We could do it like this:

```python 

# Store the first ten square numbers in a list.
# Make an empty list that will hold our square numbers.
squares = []

# Go through the first ten numbers, square them, and add them to our list.
for number in range(1,11):
    new_square = number**2
    squares.append(new_square)
    
# Show that our list is correct.
for square in squares:
    print(square)
```

This should make sense at this point. If it doesn't, go over the code with these thoughts in mind:
- We make an empty list called *squares* that will hold the values we are interested in.
- Using the *range()* function, we start a loop that will go through the numbers 1-10.
- Each time we pass through the loop, we find the square of the current number by raising it to the second power.
- We add this new value to our list *squares*.
- We go through our newly-defined list and print out each square.

Now let's make this code more efficient. We don't really need to store the new square in its own variable *new_square*; we can just add it directly to the list of squares. The line

```
    new_square = number**2
```

is taken out, and the next line takes care of the squaring:

```python 
# Store the first ten square numbers in a list.
# Make an empty list that will hold our square numbers.
squares = []

# Go through the first ten numbers, square them, and add them to our list.
for number in range(1,11):
    squares.append(number**2) 
```

List comprehensions allow us to **collapse the first three lines of code into one line**. 

Here's what it looks like:

```python 
>>> # Store the first ten square numbers in a list.
>>> squares = [number**2 for number in range(1,11)]
```

It should be pretty clear that this code is more efficient than our previous approach, but it may not be 
clear what is happening. 

Let's take a look at everything that is happening in that first line:

We define a list called `squares`.

Look at the second part of what's in square brackets:

    for number in range(1,11)

This sets up a loop that goes through the numbers `1-10`, storing each value in the variable `number`. 

Now we can see what happens to each `number` in the loop:

    number**2

Each number is raised to the second power, and this is the value that is stored in the list we defined. 

We might read this line in the following way (pseudocode):

```
    squares = [raise 'number' to the second power, for each 'number' in the range 1-10]
```

### Another example
It is probably helpful to see a few more examples of how comprehensions can be used. Let's try to make the first ten even numbers, the longer way:

```python 
# Make an empty list that will hold the even numbers.
evens = []

# Loop through the numbers 1-10, double each one, and add it to our list.
for number in range(1,11):
    evens.append(number*2)
```

Here's how we might think of doing the same thing, using a list comprehension:

```
evens = [multiply each *number* by 2, for each *number* in the range 1-10]
```

Here is the same line in code:

```python 
# Make a list of the first ten even numbers.
evens = [number * 2 for number in range(1,20)]
```

### Non-numerical comprehensions

At this point you might think that **list comprehension** is only limited to numerical computations.

Of course, it is **not**. We can use comprehensions with non-numerical lists as well. 

List comprehension has to be used everytime we would need to 
**create a new list by iterating another iterable**.

In this example we will be using a **dictionary** as the reference iterable to iterate, and we 
are going to create a list of tuples. More specifically a list of `(key, value)` pairs.

```python 
>>> country_map = {'IT': 'Italy', 'UK': 'United Kingdom', 'DE': 'Germany', 'DK': 'Denmark', 'FR': 'France'}
>>> country_list = [(code, country_name) for code, country_name in country_map.items()]
>>> print(country_map)
{'IT': 'Italy',
 'UK': 'United Kingdom',
 'DE': 'Germany',
 'DK': 'Denmark',
 'FR': 'France'}
>>> print(country_list)
[('IT', 'Italy'),
 ('UK', 'United Kingdom'),
 ('DE', 'Germany'),
 ('DK', 'Denmark'),
 ('FR', 'France')]
```

---

<a name='generator_expressions'></a>Generator Expressions
===

Similarly to **list comprehensions**, in Python it is possible to define **generator expressions**.

Generator expressions use the **same** syntax as list comprehensions, 
but are enclosed in **parentheses** rather than **brackets**.

Despite the very slight difference in the syntax, **generator expressions** save memory as they 
**yield** items one by one using the iterator protocol instead of building 
a whole list just to feed another constructor.


### GenExps in Function Calls

If the generator expression is the single argument in a function call, 
there is no need to duplicate the enclosing parentheses.

```python 
>>> symbols = '$¢£¥€¤'
>>> tuple(ord(symbol) for symbol in symbols)  # generator expression to initialise a tuple
(36, 162, 163, 165, 8364, 164)
>>>
```

The `ord` function is a built-in function that returns the Unicode code point for a one-character 
string.

#### Array Constructor

The array constructor takes two arguments, so the parentheses around the generator expression are mandatory. 

The first argument of the array constructor defines the storage type used for the numbers in the array.

``` python
import array
array.array('I', (ord(symbol) for symbol in symbols))
```

### Cartesian Product in a generator expression

```python 
>>> colors = ['black', 'white']
>>> sizes = ['S', 'M', 'L']
>>> for tshirt in ('%s %s' % (c, s) for c in colors for s in sizes):
...    print(tshirt)
...
black S
black M
black L
white S
white M
white L
```