# Data Science Day 5

## Iterators

- An iterator is an object which allows a programmer to traverse through all the elements of a collection, regardless of its specific implementation


In [1]:
for i in range(10):
    print(i, end=' ')

0 1 2 3 4 5 6 7 8 9 

- range( ) is not a list, but is an iterator

### Iterating over lists

In [2]:
for value in [2, 4, 6, 8, 10]:
    # do some operation
    print(value + 1, end=' ')

3 5 7 9 11 

- The "for x in y" syntax allows us to repeat some operation for each value in the list
- When you write something like "for val in L", the Python interpreter checks whether it has an iterator interface, which can be checked with the built-in iter function:

In [3]:
iter([2, 4, 6, 8, 10])

<list_iterator at 0x107b2db50>

- It is this iterator object that provides the functionality required by the for loop
- The iter object is a container that gives you access to the next object for as long as it's valid, which can be seen with the built-in function next:

In [4]:
I = iter([2, 4, 6, 8, 10])

In [5]:
print(next(I))

2


In [6]:
print(next(I))

4


In [7]:
print(next(I))

6


- This allows Python to treat things as lists that are not actually lists

### range( ): A List is Not Always a List

- The range( ) function returns not a list, but a special range( ) object:

In [8]:
range(10)

range(0, 10)

- range, like a list, exposes an iterator:

In [9]:
iter(range(10))

<range_iterator at 0x107b2d030>

- Python knows to treat it as if it's a list:

In [10]:
for i in range(10):
    print(i, end=' ')

0 1 2 3 4 5 6 7 8 9 

- The benefit of the iterator indirection is that the full list is never explicitly created
- We can see this by doing a range calculation that would overwhelm our system memory if we actually instantiated:

In [12]:
N = 10 ** 12
for i in range(N):
    if i >= 10: break
    print(i, end=', ')

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 

- If range were to actually create that list of one trillion values, it would occupy tens of terabytes of machine memory
- There's no reason that iterators ever have to end at all
- Python's itertools library contains a count function that acts as an infinite range:

In [13]:
from itertools import count

for i in count():
    if i >= 10:
        break
    print(i, end=', ')

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 

- Had we not thrown in a loop break, it would go on counting until the process is manually interruped or killed (using ctrl+C, for example)

### Useful Iterators

- Often you need to iterate not only the values in an array, but also keep track of the index
- One way to do this is:

In [14]:
L = [2, 4, 6, 8, 10]
for i in range(len(L)):
    print(i, L[i])

0 2
1 4
2 6
3 8
4 10


- Althought this does work, Python provides a clearer syntax using the enumerate iterator

In [19]:
for i, val in enumerate(L):
    print(i, val)

0 2
1 4
2 6
3 8
4 10


- Other times, you may have multiple lists that you want to iterate over simultaneously
- You could iterate over the index as in the non-Pythonic example we looked at previously, but it is better to use the zip iterator, which zips together iterables:

In [20]:
L = [2, 4, 6, 8, 10]
R = [3, 6, 9, 12, 15]
for lval, rval in zip(L, R):
    print(lval, rval)

2 3
4 6
6 9
8 12
10 15


- Any number of iterables can be zipped together, and if they are different lengths, the shortest will determine the length of the zip

- The map iterator takes a function and applies it to the values in an iterator:

In [21]:
# find the first 10 square numbers
square = lambda x: x ** 2
for val in map(square, range(10)):
    print(val, end=' ')

0 1 4 9 16 25 36 49 64 81 

- The filter iterator looks similar, except it only passes through values for which the filter function evaluates to True:

In [22]:
# find values up to 10 for which x % 2 is zero
is_even = lambda x: x % 2 == 0
for val in filter(is_even, range(10)):
    print(val, end=' ')

0 2 4 6 8 

- The map and filter functions, along with the reduce function (which lives in Python's functools module) are fundamental components of the functional programming style

### Iterators as function arguments

- We saw that * args and ** kwargs can be used to pass sequences and dictionaries to functions
- It turns out that the * args syntax works not just with sequences, but with any iterator:

In [23]:
print(*range(10))

0 1 2 3 4 5 6 7 8 9


- So, for example, we can compress the map sample from before into the following:

In [24]:
print(*map(lambda x: x ** 2, range(10)))

0 1 4 9 16 25 36 49 64 81


- zip( ) can zip together any number of iterators or sequences:

In [25]:
L1 = (1, 2, 3, 4)
L2 = ('a', 'b', 'c', 'd')

In [26]:
z = zip(L1, L2)
print(*z)

(1, 'a') (2, 'b') (3, 'c') (4, 'd')


In [29]:
z = zip(L1, L2)
new_L1, new_L2 = zip(*z)
print(new_L1, new_L2)

(1, 2, 3, 4) ('a', 'b', 'c', 'd')


### Specialized Iterators: itertools

- The itertools module contains a whole host of useful iterators
- For example, consider the itertools.permutations function, which iterates over all permutations of a sequence:

In [30]:
from itertools import permutations
p = permutations(range(3))
print(*p)

(0, 1, 2) (0, 2, 1) (1, 0, 2) (1, 2, 0) (2, 0, 1) (2, 1, 0)


- Similarly, the itertools.combinations function iterates over all unique combinations of N values within a list:

In [32]:
from itertools import combinations
c = combinations(range(4), 2)
print(*c)

(0, 1) (0, 2) (0, 3) (1, 2) (1, 3) (2, 3)


- Somewhat related is the product iterator, which iterates over all sets of pairs between two or more iterables:

In [33]:
from itertools import product
p = product('ab', range(3))
print(*p)

('a', 0) ('a', 1) ('a', 2) ('b', 0) ('b', 1) ('b', 2)


## List Comprehensions

- List comprehensions provide a concise way to create lists
- It consists of brackets containing an expression followed by a for clause, then zero or more for or if clauses
- The expressions can be anything, meaning you can put in all kinds of objects in lists
- The list comprehension always returns a result list:

In [34]:
[i for i in range(20) if i % 3 > 0]

[1, 2, 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19]

- The result of this is a list of numbers which excludes multiples of 3

### Basic List Comprehensions

- List comprehensions are a way to compress a list-building for loop into a single short, readable line
- For example, here is a loop that constructs a list of the first 12 square integers:

In [35]:
L = []
for n in range(12):
    L.append(n ** 2)
L

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]

- The list comprehension equivalent of this is the following:

In [36]:
[n ** 2 for n in range(12)]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]

- The basic syntax is [expr for var in iterable], where expr is any valid expression, var is a variable name, and iterable is any iterable Python object

### Multiple Iteration

- Sometimes you want to build a list not just from one value, but from two
- To do this, simply add another for expression in the comprehension:

In [37]:
[(i, j) for i in range(2) for j in range(3)]

[(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)]

- Notice that the second for expression acts as the interior index, varying the fastest in the resulting list
- This type of construction can be extended to three, four, or more iterators within the comprehension, though at some point, code readability will suffer

### Conditionals on the Iterator

- You can further control the iteration by adding a conditional to the end of the expression
- In the first example of this section, we iterated over all numbers from 1 to 20, but left out multiples of 3:

In [38]:
[val for val in range(20) if val % 3 > 0]

[1, 2, 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19]

- The expression (i % 3 > 0) evaluates to True unless val is divisible by 3
- The equivalent loop syntax is as follows:

In [39]:
L = []
for val in range(20):
    if val % 3:
        L.append(val)
L

[1, 2, 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19]

### Conditionals on the Value

- Python has something similar to the single-line conditional enabled by the ? operator in C, which is most often used within list comprehensions, lambda functions, and other places where a simple expression is desired:

In [40]:
val = -10
val if val >= 0 else -val

10

- We see that this simply duplicates the functionality of the built-in abs( ) function, but the construction lets you do some really interesting things within list comprehensions:

In [42]:
[val if val % 2 else -val
for val in range(20) if val % 3]

[1, -2, -4, 5, 7, -8, -10, 11, 13, -14, -16, 17, 19]

- Note the line break within the list comprehension before the for expression: this is valid in Python and is often a nice way to break up long list comprehensions for greater readability
- What we're doing is constructing a list, leaving out multiples of 3, and negating all multiples of 2
- Once you understand the dynamics of list comprehensions, it's straightforward to move on to other types of comprehensions
- The syntax is largely the same, the only difference is the type of bracket you use
- For example, with curly braces, you can create a set with a set comprehension:

In [43]:
{n**2 for n in range(12)}

{0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121}

- Recall that a set is a collection that contains no duplicates
- The set comprehension respects this rule and eliminates any duplicate entries:

In [44]:
{a % 3 for a in range(1000)}

{0, 1, 2}

- With a slight tweak, you can add a colon (:) to create a dict comprehension:

In [45]:
{n:n**2 for n in range(6)}

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25}

- Finally, if you use parentheses rather than square brackets, you get what's called a generator expression:

In [46]:
(n**2 for n in range(12))

<generator object <genexpr> at 0x107ba7f20>

- A generator expression is essentially a list comprehension in which elements are generated as-needed rather than all at once