In [1]:
def index_words(text):
    result = []
    if text:
        result.append(0)
    for index, letter in enumerate(text):
        if letter == ' ':
            result.append(index + 1)
    return result

address = 'Fout score and seven years ago...'
result = index_words(address)
print(result)

[0, 5, 11, 15, 21, 27]


### There are two problems with the index_words function.
1. the code is a bit dense and noisy. Each time a new reult is found, I call the append method. A better way to write this functions that use yield expressions 

In [2]:
def index_words_iter(text):
    if text:
        yield 0
    for index, letter in enumerate(text):
        if letter == ' ':
            yield index + 1

when called, a generator function does not actually run but instead immediately returns an iterator. With each call to the next built-in function, the iterator advances the generator to its next yeild expression. Each value passed to yield by the generator is returned by the iterator to the caller

In [3]:
it = index_words_iter(address)
print(next(it))
print(next(it))

0
5


In [4]:
#you can easily convert the iterator returned by the generator to a list by passing
# it to the list built-in function
result = list(index_words_iter(address))
print(result[:])


[0, 5, 11, 15, 21, 27]


2. The second problem with index_words is that it requires all resluts to be sotred in the list before being returned. For huge inputs, this can cause a program to run out of memory and crash.

### A generator version of this function can easily be adapted th take inputs of  arbitrary length due to its bounded memroy requirements.

In [8]:
def index_file(handle):
    offset = 0
    for line in handle:
        if line:
            yield offset
        for letter in line:
            offset += 1
            if letter == ' ':
                yield offset
                


In [15]:
import itertools
with open('address.txt', 'r') as f:
    it = index_file(f)
    #print(list(it))
    results = itertools.islice(it, 0, 10)
    print(list(results))
    print(list(it))
    #print(list(it))

[0, 5, 11, 16, 20, 26, 29, 34, 38, 42]
[47, 51, 56, 63, 70, 76, 80, 87, 90, 96, 100, 108, 112, 118, 119, 122, 129, 134, 138, 146, 151, 160, 163, 167, 171, 176, 179, 181, 188, 193, 197, 205, 209, 213, 219, 227, 232, 233, 243, 250, 253, 258, 262, 266, 271, 275, 282, 285, 289, 293, 298, 302, 307, 312, 319, 322, 330, 334, 341, 346, 350, 351, 358, 362, 366, 371, 376, 380, 387, 390, 394, 399, 402, 406, 413, 416, 424, 430, 434, 445, 449, 456, 461, 465, 468, 469, 474, 479, 489, 493, 498, 501, 506, 512, 516, 523, 526, 532, 537, 544, 547, 549, 553]
