# Item 31: Be Defensive When Iterating Over Arguments

When a function takes a `sequence` of items as a parameter, you may need to iterate over that `sequence` multiple times. This is not a problem if the `sequence` is a `list` object, but if it's an iterator or generator you may see strange behavior.

The following is a normalization function that converts the input sequence to a sequence of percentages of the total amount.

In [None]:
def normalize(numbers):
    total = sum(numbers)
    result = []
    for value in numbers:
        percent = 100 * value / total
        result.append(percent)
    return result

If given a `list` of numbers as inputs this function works fine:

In [None]:
values = [15, 35, 80]
percentages = normalize(values)
print(percentages)
assert sum(percentages) == 100.0

However, let's say I want to scale this up to read a large number of values from a file, and I use a generator to read the values from a file one at a time.

In [None]:
def read_values(data_path):
    with open(data_path) as f:
        for line in f:
            yield int(line)

In [None]:
path = '../data/values.txt'

In [None]:
it = read_values(path)
percentages = normalize(it)
print(percentages)

This happened because the call to `sum` in `normalize` exhausted the iterator and the subsequent iteration to calculate the percentages produced an empty list. 

If you iterate over an iterator or generator that has already raised a `StopIteration` exception you won't get any results the second time around.

In [None]:
it = read_values(path)
print(list(it)[:3])
print(list(it))  # already exhausted

One way to address this issue is to have your function explicitly exhaust the iterator and copy its output to a `list` object:

In [None]:
def normalize_copy(numbers):
    numbers_copy = list(numbers)
    total = sum(numbers_copy)
    result = []
    for value in numbers_copy:
        percent = 100 * value / total
        result.append(percent)
    return result

In [None]:
it = read_values(path)
percentages = normalize_copy(it)
print(percentages[:3])
print(sum(percentages))

The problem here is that the copy of the iterator contents could be extremely large (probably why an iterator or generator was used in the first place). 

We could work around this by having normalize take a function that returns a new iterator every times it's called: 

In [None]:
def normalize_func(get_iter):
    total = sum(get_iter())
    result = []
    for value in get_iter():
        percent = 100 * value / total
        result.append(percent)
    return result    

In [None]:
percentages = normalize_func(lambda: read_values(path))
print(percentages[:3])
print(sum(percentages))

However, using a `lambda` function with `read_values` feels a bit clumsy. Instead we can replace `read_values` with a new container class that implements the *iterator protocol*

In [None]:
class ReadValues:
    
    def __init__(self, data_path):
        self.data_path = data_path
        
    def __iter__(self):
        with open(self.data_path) as f:
            for line in f:
                yield int(line)

In [None]:
values = ReadValues(path)
percentages = normalize(values)
print(percentages[:3])
print(sum(percentages))

This works because each call to `ReadVisits.__iter__` returns a new iterator object.

The *iterator protocol* states that when an iterator is passed to the `iter` built-in function, `iter` returns the iterator itself. However, when a container type is passed to `iter` a new iterator object is returned each time. `list` and `ReadVisits` inputs are iterable containers that follow the protocol.

We can test for this behaviour inside the function to ensure a container is being passed and not an iterator

In [None]:
def normalize_defensive(numbers):
    if iter(numbers) is numbers:
        raise TypeError('Must supply a container')
    total = sum(numbers)
    result = []
    for value in numbers:
        percent = 100 * value / total
        result.append(percent)
    return result

In [None]:
values = [15, 35, 80]
percentages = normalize(values)
print(percentages)
print(sum(percentages))

In [None]:
values = ReadValues(path)
percentages = normalize_defensive(values)
print(percentages[:3])
print(sum(percentages))

In [None]:
values = [15, 35, 80]
it = iter(values)
normalize_defensive(it)

## Things to Remember

- Beware of functions and methods that iterate over input arguments multiple times. If these arguments are iterators, you may see strange behaviour and missing values
- Python's iterator protocol defines how containers and iterators interact with the `iter` and `next` built-in functions, `for` loops, and related expressions.
- You can easily define your own iterable container by implementing the `__iter__` method as a generator
- You can detect that a value is an iterator (instead of a container) if calling `iter` on it produces the same value as what you passed in.