## Item 9: Consider Generator Expressions for Larger Comprehensions

List comprehensions used on large data inputs can consume large amounts of memory and cpu capacity, causing the program to crash.

For example, say we want to read a file and return the number of characters on each line. Doing this with a _list comprehension_ would require holding the length of wvery line of the file in memory.
For large files or never ending network sockets, _list comprehensions_ would be very problematic.
The following is a _list comprehension_ handling a relatively small text file.

In [2]:
value = [len(x) for x in open('gpl.txt')]
print(value)

[47, 47, 1, 70, 62, 59, 1, 37, 1, 65, 35, 1, 72, 71, 72, 73, 71, 72, 70, 20, 1, 67, 71, 71, 69, 69, 58, 1, 69, 73, 73, 66, 1, 67, 65, 72, 72, 19, 1, 70, 69, 66, 1, 72, 71, 68, 70, 30, 1, 67, 73, 63, 66, 72, 70, 73, 69, 73, 55, 1, 71, 67, 72, 70, 72, 55, 1, 65, 21, 1, 44, 1, 18, 1, 72, 1, 74, 36, 1, 69, 64, 50, 1, 73, 73, 70, 52, 1, 71, 16, 1, 67, 62, 70, 69, 69, 56, 1, 70, 73, 66, 1, 69, 68, 67, 69, 68, 69, 69, 57, 1, 18, 1, 68, 68, 16, 1, 71, 68, 69, 58, 1, 71, 69, 65, 67, 67, 66, 70, 68, 66, 64, 1, 70, 68, 71, 67, 71, 70, 67, 69, 67, 70, 69, 41, 1, 64, 67, 8, 1, 66, 11, 1, 24, 1, 68, 66, 68, 69, 70, 69, 70, 1, 64, 69, 71, 70, 71, 69, 71, 68, 67, 63, 1, 68, 70, 22, 1, 65, 1, 69, 70, 64, 62, 10, 1, 70, 73, 68, 71, 69, 70, 24, 1, 32, 1, 69, 63, 68, 58, 71, 69, 58, 1, 70, 60, 1, 41, 1, 70, 66, 73, 1, 71, 36, 1, 64, 71, 66, 31, 1, 64, 65, 70, 67, 64, 69, 67, 1, 70, 71, 67, 35, 1, 70, 69, 69, 68, 67, 68, 70, 66, 24, 1, 33, 1, 68, 55, 71, 22, 1, 69, 67, 60, 47, 1, 69, 65, 67, 71, 69, 65, 67

An alternative solution is to use _generator expressions_, a generalization of _list comprehensions_ generators.
Generators don't materialize the whole output sequence when you're run. They instead produce an iterator object that yields  one item at a time from the expression.
the syntax is similar to _list comprehensions_ except that the epression is placed within () characters.

In [8]:
it = (len(x) for x in open('gpl.txt'))
print(it)

<generator object <genexpr> at 0x7fd184a620d0>


The returned iterator can be advanced one step at a time to produce the next output from the generator expression as needed using the _next_ built-in function.
Memory use is kept to a minimum as the iterator produces only one item a t a time and for as long as you request the next item.

In [7]:
print(next(it))
print(next(it))
print(next(it))

47
47
1


_generators_ can be used together to produce combined outcomes.
Here we use the _generator_ used above as the input for another _generator_.

In [11]:
roots = ((x, x**0.5) for x in it)

In [12]:
print(next(roots))
print(next(roots))
print(next(roots))

(47, 6.855654600401044)
(1, 1.0)
(70, 8.366600265340756)


These chained generators execute very quickly in python. This is very usefull for large and constant streams of data processing.
One major cavet is that the iterator returned by generators are stateful, so they can only be consumed once.

##### Things to remember

- _list comprehensions_ can be problematic with large data or continuous data streams.
- Generator expressions avoid memory issues by producing outputs one at a time as an iterator.
- Generator expressions can be composed by passing the iterator from one generator expression into the _for_ subexpression of another.
- Generator expressions execute very quickly when chained together.

---

## Item 10: Prefer _enumerate_ Over _range_

The _range_ built-in function is useful for loops that iterate over a set of integers.

In [76]:
from random import randint
random_bits = 0
count = 0
for i in range(64):
    if randint(0, 1):
        random_bits |= 1 << i
        count += 1

print(random_bits)
print(count)

821403141336220275
30


A popular method of iterating over a collection and refering to the index is to use _range(len(seq))_ .

In [77]:
flavour_list = ['vanilla', 'chocolate', 'pecan', 'strawberry']

for i in range(len(flavour_list)):
    flavour = flavour_list[i]
    print('%d: %s' % (i + 1, flavour))

1: vanilla
2: chocolate
3: pecan
4: strawberry


This looks clumsy ans also adds extra functions that can be avoided. Additionally, you have to get the length of the array and index into it. It is also visually noisy.

The built-in _enumerate_ function will iterate through the iterator with a lazy generator. This generator yields pairs of the loop index and the next value from the iterator. The resulting code is much clearer.

In [78]:
for i, flavour in enumerate(flavour_list):
    print('%d: %s' % (i + 1, flavour))

1: vanilla
2: chocolate
3: pecan
4: strawberry


This can be made even shorter by specifying the index start number setting the _start_ keyword argument (1 in this case).

In [79]:
for i, flavour in enumerate(flavour_list, 1):
    print('%d: %s' % (i, flavour))

1: vanilla
2: chocolate
3: pecan
4: strawberry


##### Things to remember

- _enumerate_ provides consice syntax for looping over an iterator and getting the index from the iterator as you go.
- Prefer _enumerate_ instead of looping over a _range_ and indexing into a sequence.
- The _enumerate_ function has the keyword argument _start=0_ by default. This can be set to any integer as the starting index of the sequence.

---

## Item 11: Use _zip_ to Process Iterators in Parallel