In [1]:
def normalize(numbers):
    total = sum(numbers)
    result = []
    for value in numbers:
        percent = 100 * value / total
        result.append(percent)
    return result

# Give this function a list of lists, and it works as expected.
visits = [15, 35, 80]
percentages = normalize(visits)
print(percentages)
assert sum(percentages) == 100.0

[11.538461538461538, 26.923076923076923, 61.53846153846154]


In [3]:
# However, if we pass in a generator, the function would produce no results
def read_visits(data_path):
    with open(data_path) as f:
        for line in f:
            yield int(line)

it = read_visits('')
percentages = normalize(it)
print(percentages)

# Looking back in the normalize function, we can see that the iterator will be exhausted by the sum() call,
# and the for loop will not actually iterate.

FileNotFoundError: [Errno 2] No such file or directory: ''

Calling into an iterator, you can't tell the difference between an iterator that has no output, and an interator that had output but is exhausted.
(Possibly the StopIteration exception is raised only once?)

To combat the generator problem above, we could call list(numbers) at the start of the funcion, but this would fail for very large inputs.

### Iterator protocol
The Iterator protocol is how Python for loops and related expressions traverse the contents of a container type. When Python sees a statement like <code>for x in somelist:</code>, it actually calls the built-in method <code>iter(somelist)</code>. That, in turn, calls the <code>somelist.__iter__()</code> special method. This method must return an iterator object (which itself implements the <code>__next__()</code> special method. Then, the for loop repeatedly calls the <code>next()</next> built-in method until there's a StopIteration exception.

In [5]:
# Use the Iterator protocol by implementing the __iter__ method using yield.
class ReadVisits:
    def __init__(self, data_path):
        self.data_path = data_path

    def __iter__(self):
        with open(self.data_path) as f:
            for line in f:
                yield int(line)

# This function works correctly when passed to the original function, because each call to __inter__ returns a new Iterator object.