Item 30 Consider Generators Instead of Returning Lists

Things to Remember
- Using generators can be clearer than the alternative of having a function return a list of accumulated results.
- The iterator returned by a generator produces the set of values passed to yield expression within the generator's body.
- Generators can produce a sequence of outputs for arbitrarily large inputs because their working memory doesn't include all inputs and outputs.  

In [None]:
# you want to find the index of every word in a string
def index_words(text):
    result = []
    if text:
        result.append(0)
    for index, letter in enumerate(text): # enuadds counter to an iterable and returns it
        if letter == ' ':
            result.append(index + 1) # the index where the next word begins
    return result
text = 'Using generators can be clearer than the alternative of having a function return a list of accumulated results'
result = index_words(text)
print(result[:10])


Problems with the above approach
- the code is a bit dense and noisy
- it's doing too much work includings maintaining a result list. Consequently, the main logic, finding the index of every word, has been deemphasized
- it requires all results to be stored in the list before returned. For huge inputs, this can cause a program to run out of memory and crash

In [None]:
# using generator
def index_words_iter(text):
    if text:
        yield 0
    for index, letter in enumerate(text):
        if letter == ' ':
            yield index + 1

it = index_words_iter(text)
print(next(it))
print(next(it))

- a generator function does not actually run but instead immediately returns an iterator
- with each call to the next built-in function, the iterator advances the generator to its next yield expression
- each value passed to yield by the generator is returned by the iterator to the caller
- this function is significantly easier to read because all interactions with the result list have been eliminated; results are passed to yield expressions instead

In [None]:
# convert the iterator to a list
result = list(index_words_iter(text))
print(result[:10])

In [None]:
# - use generator to deal with inputs of arbitrary length
# - the following code define a generator that streams 
#   input from a fie one line at a time and yields outputs
#   one word at a time
import itertools 
def index_file(handle):
    offset = 0
    for line in handle: 
        if line:
            yield offset # one line at a time
        for letter in line:
            offset += 1
            if letter == ' ':
                yield offset # one letter at a time

with open('text.txt') as f:
    it = index_file(f)
    results = itertools.islice(it, 0, 10)
    print(list(results)) 


- with statement ensures clean-up code is executed; 
  in this case f will be automatically closed after the with statement
- itertools.islice make an iterator that returns selected elements from the iterable.
- callers must be aware that the iterators returned are stateful and can't be reused (as they have been exhausted)