## Iterators

An iterator is an object which allows a programmer to traverse through all elements of a collection, regardless of its specific implementation. 

Often an important piece of data analysis is repeating a similar calculation, over and over, in an automated fashion. For example, you may have a table of names that you'd like to split into first and last, or of dates that you'd like to convert to some standard format. One of Python's answers to this is the *iterator* syntax. We've seen this already with the `range` iterator:

In [1]:
for i in range(10):
    print(i, end=' ')

0 1 2 3 4 5 6 7 8 9 

Here, we're going to dig a bit deeper. It turns out that `range` is not a list, but something called an *iterator*, and learning how it works is key to understanding a wide class of very useful Python functionality.

### Iterating Over Lists

Iterators are perhaps most easily understood in the concrete case of iterating through a list. Consider the following:

In [2]:
for value in [2, 4, 6, 8, 10]:
    # do some operation
    print(value + 1, end=' ')

3 5 7 9 11 

The familiar "`for x in y`" syntax allows us to repeat some operation for each value in the list. The fact that the syntax of the code is so close to its English description ("*for [each] value in [the] list*") is just one of the syntactic choices that makes Python such an intuitive language to learn and use.

But the face-value behavior is not what's *really* happening. When you write something like "`for val in L`", the Python interpreter checks whether it has an *iterator* interface, which you can check yourself with the built-in `iter` function:

In [3]:
iter([2, 4, 6, 8, 10])

<list_iterator at 0x7fcd9e1682e0>

It is this iterator object that provides the functionality required by the `for` loop. The `iter` object is a container that gives you access to the next object for as long as it's valid, which can be seen with the built-in function `next`:

In [31]:
I = iter([2, 4, 6, 8, 10])
print(I)

<list_iterator object at 0x7fcd9f50a1f0>


In [32]:
print(next(I))

2


What is this purpose of this level of indirection? Well, it turns out this is incredibly useful because it allows Python to treat things as lists that are *not actually lists*.

### `range()`: a List is not Always a List

Perhaps the most common example of this indirect iteration is the `range()` function, which returns not a list, but a special `range()` object:

In [6]:
range(20)

range(0, 20)

`range()`, like a list, exposes an iterator:

In [7]:
iter(range(10000))

<range_iterator at 0x7fcd9e168210>

So, Python knows to treat it as *if* it's a list:

In [8]:
for i in range(20):
    print(i, end=' ')

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 

The benefit of the iterator indirection is that *the full list is never explicitly created*. We can see this by doing a range calculation that would overwhelm our system memory if we actually instantiated it:

In [10]:
N = 10 ** 12
for i in range(N):
    if i >= 1000:
        break
    print(i, end=', ')

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 

If `range` were to actually create that list of one trillion values, it would occupy tens of terabytes of machine memory: a waste, given the fact that we're ignoring all but the first 1,000 values.

In fact, there's no reason that iterators ever have to end at all. Python's `itertools` library contains a `count` function that acts as an infinite range:

In [11]:
from itertools import count

for i in count():
    if i >= 1000:
        break
    print(i, end=', ')

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 

Had we not thrown in a loop break here, it would go on happily counting until the process is manually interrupted or killed (using, for example, `ctrl-C`).

### Useful Iterators

This iterator syntax is used nearly universally in Python built-in types as well as the more data science-specific objects we'll explore in later sections. Here, we'll cover some of the more useful iterators in the Python language:

#### `enumerate`

Often you need to iterate not only the values in an array, but also keep track of the index. You might be tempted to do things this way:

In [12]:
L = [2, 4, 6, 8, 10]
for i in range(len(L)):
    print(i, L[i])

0 2
1 4
2 6
3 8
4 10


Although this does work, Python provides a cleaner syntax using the `enumerate` iterator:

In [13]:
D = [2, 4, 6, 8, 10]
for i, val in enumerate(D):
    print(i, val)

0 2
1 4
2 6
3 8
4 10


This is the more "Pythonic" way to enumerate the indices and values in a list.

#### `zip`

Other times, you may have multiple lists that you want to iterate over simultaneously. You could certainly iterate over the index as in the non-Pythonic example we looked at previously, but it is better to use the `zip` iterator, which zips together iterables:

In [34]:
L = [2, 4, 6, 10, 17]
R = [15, 6, 67, 12, 15]
for lval, rval in zip(L, R):
    print(lval, rval)

2 15
4 6
6 67
10 12
17 15


Any number of iterables can be zipped together, and if they are different lengths, the shortest will determine the length of the `zip`.

#### `map` and `filter`

The `map` iterator takes a function and applies it to the values in an iterator:

In [15]:
# find the first 10 square numbers
square = lambda x: x ** 2
for val in map(square, range(10)):
    print(val, end=' ')

0 1 4 9 16 25 36 49 64 81 

The `filter` iterator looks similar, except it only passes through values for which the filter function evaluates to True:

In [16]:
# find values up to 10 for which x % 2 is zero
is_even = lambda x: x % 2 == 0
for val in filter(is_even, range(10)):
    print(val, end=' ')

0 2 4 6 8 

### Specialized Iterators: `itertools`

We briefly looked at the infinite `range` iterator, `itertools.count`. The `itertools` module contains a whole host of useful iterators. As an example, consider the `itertools.permutations` function, which iterates over all permutations of a sequence. “Permutation” refers to all the possible combinations in which a set or string can be ordered or arranged:

In [18]:
from itertools import permutations
p = permutations(range(3))
print(p)

<itertools.permutations object at 0x7fcd9f50c860>


In [19]:
print(list(p))

[(0, 1, 2), (0, 2, 1), (1, 0, 2), (1, 2, 0), (2, 0, 1), (2, 1, 0)]


Similarly, the `itertools.combinations` function iterates over all unique combinations of `N` values within a list:

In [20]:
from itertools import combinations
c = combinations(range(4), 2)
print(c)

<itertools.combinations object at 0x7fcd9f50ccc0>


In [21]:
print(list(c))

[(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)]


Somewhat related is the `product` iterator, which iterates over all sets of pairs between two or more iterables:

In [22]:
from itertools import product
p = product('ab', range(3))
print(p)

<itertools.product object at 0x7fcd9f509c80>


In [23]:
print(list(p))

[('a', 0), ('a', 1), ('a', 2), ('b', 0), ('b', 1), ('b', 2)]


In [24]:
p = product('abc', range(6))
print(p)

<itertools.product object at 0x7fcd9f5071c0>


In [25]:
print(list(p))

[('a', 0), ('a', 1), ('a', 2), ('a', 3), ('a', 4), ('a', 5), ('b', 0), ('b', 1), ('b', 2), ('b', 3), ('b', 4), ('b', 5), ('c', 0), ('c', 1), ('c', 2), ('c', 3), ('c', 4), ('c', 5)]
