# Item 30: Consider Generators Instead o Returning Lists

In [3]:
# The simplest choice for a functiin that produces a sequence of reuslts is to return a list of items
def index_words(text):
    result = []
    if text:
        result.append(0)
    for index, letter in enumerate(text):
        if letter == ' ':
            result.append(index + 1)
    return result

In [6]:
address = 'Four score and seven years ago...'
result = index_words(address)
print(result[:10])

[0, 5, 11, 15, 21, 27]


There are two problems with the previous function:
- Code is a bit noisy and dense. Each time a new result is found, we call the `append` method. While the function body is ~130 characters (without whitespace), only ~75 characters are important.
- It requires all results to be stored in the `list` before being returned. For huge inputs, this can cause a program to run out of memory and crash.

In [7]:
# A better way to write this function is by using a generator. Generators are produced by functions that use yield
# expressions
def index_words_iter(text):
    if text:
        yield 0
    for index, letter in enumerate(text):
        if letter == ' ':
            yield index + 1

When called, a generator function does not actually run but it immediately returns an iterator. With each call to the `next` built-in function, the iterator advaces the generator to its next `yield` expression. Each value passed to `yield` by the generator is returned by the iterator to the caller. 

In [8]:
it = index_words_iter(address)
print(next(it))
print(next(it))

0
5


The `index_words_iter` function is significantly easier to read because all interactions with the result `list` have been eliminated. Results are passed to `yield` expressions instead.

In [10]:
# We can easily convert the iterator returned by the generator to a list by passing it to the list built-in function
result = list(index_words_iter(address))
print(result[:10])

[0, 5, 11, 15, 21, 27]


In contrast, a generator version of the function can be easily adapted to take inputs of arbitrary length due to its bounded memory requirements.

In [11]:
# Say we define a generator that streams input from a file on line at a time and yields outputs one word at a time
def index_file(handle):
    offset = 0
    for line in handle:
        if line:
            yield offset
        for letter in line:
            offset += 1
            if letter == ' ':
                yield offset

In [12]:
from itertools import islice

In [None]:
# The working memory for this function is limited to the maximum lenght of one line of input. Running the generator
# produces the same results

with open('address.txt', 'r') as f:
    it = index_file(f)
    results = islice(it, 0, 10)
    print(list(results))

The only gotcha with defining generators like this is that the callers must be aware that the iterators returned are stateful and can't be reused.