# Item 32: Consider Generator Expressions for Large List Comprehensions

The problem with list comprehensions is that may create new `list` instances containing one item for each value in input sequences. This is fine for small inputs, but for large inputs, this behavior could consume significant amounts of memory and cause a program to crash.

Say, for example, that we want to read from a file and return the number of characters on each line. Doing this with a list comprehension would require holding the length of every line of the file in memory. If the file is enormous, or perhaps a never-ending network socket, usig list comprehensions would be problematic. 

In [1]:
# Here we use a list comprehension in a way that can only handle small input values
value = [len(x) for x in open('my_file.txt')]
print(value)

[85, 31, 31, 2, 87, 3, 86, 59, 59, 142]


To solve this issue, Python providses *generator expressions*, which are a generalization of list comprehensions and generators. Generator expressions don't materialize the whole output sequence when they're run. Instead, generator expressions evaluate to an iterator that yields one item at a time from the expression

You create a generator expression by putting list-comprehension-like syntax between `()` characters.

In [4]:
# Here we use a generator expression that is equivalent to the code above. The only thing to note here is that
# generator expressions immediately evaluate to iterators and do not make forward progress
it = (len(x) for x in open('my_file.txt'))
print(it)

<generator object <genexpr> at 0x7ff4b03f1510>


In [5]:
# The returned iterator can be advanced one step at time to produce the next output from the generator expression,
# as needed (using the next built-in function). We can consume as much as we want from the generator without risking
# a blowup in memory usage
print(next(it))
print(next(it))
print(next(it))

85
31
31


Another powerful outcome of generator expressions is that they can be composed together.

In [6]:
# Here, we take the iterator returned by the generator expression above and use it as the input for another generator
# expression
roots = ((x, x**0.5) for x in it)

In [7]:
# Each time we advance the iterator, it also advances the interior iterator, creating a domino effect of looping, 
# evaluating conditional expressions, and passing around inputs and outputs, all while being as memory efficient
# as possible
print(next(roots))
print(next(roots))
print(next(roots))

(2, 1.4142135623730951)
(87, 9.327379053088816)
(3, 1.7320508075688772)


Chaining generator together like this, executes very quickly in Python. When we're looking for a way to compose that's on a large stream of input, generator expressions are a great choice. The only gotcha is that iterators returned by generator expressions are stateful, so we must be carefl not to use these iterators more than once.