# Item 17: Be Defensive When Iterating Over Arguments

- When a function takes a list of objects as a parameter, it's often important to iterate over that list multiple times. For example, say you want to analyze tourism numbers for the U.S. state of Texas. Imagine the data set is the number of visitors to each city(in million per year). You'd like to figure out what percentage of overall tourism each city receives. 

- To do this you need a normalization function. It sums the inputs to determine the total number of tourists per year. Then it divides each city's individual visitor count by the total to find that citys' contribution to the whole.

In [4]:
def normalize(numbers):
    total = sum(numbers)
    result = []
    for value in numbers:
        percent = 100 * value / total
        result.append(percent)
    return result

In [5]:
visits = [15, 35, 80]
percentages = normalize(visits)
print(percentages)

[11.538461538461538, 26.923076923076923, 61.53846153846154]


- To scale this up, I nned to read the data from a file that contains every city in all of Texas. I define a generator to do this because then I can reuse the same function later when I want to compute tourism numbers for the while world, a much larger data set.

In [6]:
def read_visits(data_path):
    with open(data_path) as f:
        for line in f:
            yield int(line)

- Surprisingly, calling normalize on the generator's return value produces no results.

In [None]:
it = read_visit('/tmp/my_numbers.txt')
percentages = normalize(it)
print(percentages)
# >>>
# []

- The cause of this behavior is that an iterator only produces its results a single time. If your iterate over an iterator or generator that haas already raised a StopIteration exception, you won't get any results the second time around.

In [None]:
it = read_visits('/tmp/my_numbers.txt')
print(list(it))
print(list(it)) # Already exhausted
# >>>
# [15, 35, 80]
# []

- What's confusing is that you also won't get any errors when you iterate over an already exhausted iterator. for loops, the list constructor, and many other functions thoughout the Python standard library expect the StopIteration exception to be raised during normal operation. These functions can't tell the difference between an iterator that has no output and an iterator that had output and is now exhausted.

- To solve this problem, you can explicitly exhaust an input iterator and keep a copy of this entire contents in a list. You can then iterate over the llist version of the data as many times a you need to. Here's the same function as before, but it defensively copies the input iterator.

In [7]:
def normalize_copy(numbers):
    numbers = list(numbers)
    total = sum(numbers)
    result = []
    for value in numbers:
        percent = 100 * value / total
        result.append(percent)
    return result

- Now the function works correctly on a generator's return value.

In [None]:
it = read_visits('/tmp/my_numbers.txt')
percentages = normalize_copy(it)
print(percentages)

- The problem with this approach is the copy of the input iterator's contents could be large. Copying the iterator could cause your program to run out of memory and crash. One way around this is to accept a function taht returns a new iterator each time it's called.

In [8]:
def normalize_func(get_iter):
    total = sum(get_iter())
    result = []
    for value in get_iter():
        percent = 100 * value / total
        result.append(percent)
    return result

- To use normalizer_func, you can pass in a lambda expression that calls the generator and produces a new iterator each time.

In [9]:
percentages = normalize_func(lambda: read_visits(path))

NameError: name 'path' is not defined

- Though it works, having no pass a lambda function like this is clumsy. The better way to achieve the same result is to porvide a new container class that implements the *iterator protocol*.
- The iterator protocol is how Python for loops and related expressions traverse the contents of a container type. When Python sees a statement like for x in foo it will actually call iter(foo). The iter built-in function calls the foo.\_\_iter\_\_ special method in turn. The \_\_iter\_\_ method must return an iterator object (which itself implements the \_\_next\_\_ special method). Then the for loop repeatedly calls the next built-in function on the iterator object until it's exhausted (and raises a StopIteration exception).

- It sounds complicated, but practically speaking you can achieve all of this behavior for your classes by implementing the \_\_iter\_\_ method as a generator. Here, I define an iterable container class that reads the files contining tourism data.

In [10]:
class ReadVisits(object):
    def __init__(self, data_path):
        self.data_path = data_path
        
    def __iter__(self):
        with open(self.data_path) as f:
            for line in f:
                yield int(line)

- This new container type works correctly when passed to the original function without any modifications.

In [None]:
visits = ReadVisits(path)
percentages = normalize(visits)
print(percentages)

- This works because the sum method in normalize will call ReadVisits.\_\_iter\_\_ to allocate a new iterator object. The for loop to normalize the numbers will also call \_\_iter\_\_ to allocate a second iterator object. Each of those iterators will be advanced and exhausted independently, ensuring that each unique iteration sees all of the input data values. The only downside of this approach is that it read the input data multiple times.

In [32]:
a = [1, 2, 3, 4]
b = lambda: print(a)


[1, 2, 3, 4]


In [20]:
print(b)

<list_iterator object at 0x7fc03189ec88>


In [19]:
b

<list_iterator at 0x7fc03189ec88>

In [36]:
print(iter([1, 2]))

<list_iterator object at 0x7fc031822c50>


- Now that you know containers like ReadVisits work, you can write your functions to enfure that parameters aren't just iterators. The protocol states that the when an iterator is passed to the iter built-in funtion, iter will return the iterator itself. In contrast, when a container type is passed to iter, a new iterator object will be returned each time. Thus, you can test an input value for this behavior and raise a TypeError to reject iterators.

In [38]:
def normalize_defensive(numbers):
    if iter(numbers) is iter(numbers):
        raise TypeError('Must supply a container')
    total = sum(numbers)
    result = []
    for value in numbers:
        percent = 100 * value / total
        result.append(percent)
    return result

- This is ideal if you don't want to copy the full input iterator like normalize_copy above, but you also need to iterate over the input data multiple times. This function works as expected for list and ReadVisits inputs because they are containers. It will work for any type of container that follows the iterator protocol.

In [39]:
visits = [15, 35, 80]
normalize_defensive(visits)
visits = ReadVisits(path)
normalize_defensive(visits)

NameError: name 'path' is not defined

In [41]:
it = iter(visits)
normalize_defensive(it)

TypeError: Must supply a container

## Things to Remember

- Beware of functions that iterate over input arguments multiple times. If these arguments are iterators, you may see strange behavior and missing values.
- Python's iterator protocol defines how containers and iterators interact with the iter and next built-in functions, for loops, and related expressions.
- You can easily define your own iterable container type by implementing the \_\_iter\_\_ method as a generator
- You can detect that a value is an iterator (instead of a container) if calling iter on it twice produces the same result, which can then be progressed with the next built-in function.

# Item 18: Reduce Visual Noise with Variable Positional Arguments

- Accepting optional positional arguments (often called *star args* in reference to the conventional name for the parameter, \*args) can make a function call more clear and remove *visual noise*.

- For example, say you want to log some debug information. With a fixed number of arguments, you would need a function that takes a message and a list of values.

In [42]:
def log(message, values):
    if not values:
        print(message)
    else:
        values_str = ', '.join(str(x) for x in values)
        print('%s: %s' % (message, values_str))

In [45]:
log('My numbers are', [1, 2])
log('Hi there', [])

My numbers are: [1, 2]
Hi there: []


- Having to pass an empty list when you have no values to log is cumbersome and noisy. It'd better to leave out the second argument entirely. You can do this in Python by prefixing the last positional parameter name with \*. The first parameter for the log message is required, whereas any number of subsequent positional arguments are optional. The function body doesn't need to change, only the callers do.

In [46]:
def log(message, *values):
    if not values:
        print(message)
    else:
        values_str = ', '.join(str(x) for x in values)
        print('%s: %s' % (message, values_str))

In [47]:
log('My numbers are', [1, 2])
log('Hi there', [])

My numbers are: [1, 2]
Hi there: []


In [48]:
favorites = [7, 33, 99]
log('Favorite colors', *favorites)

Favorite colors: 7, 33, 99


- There are two problems with accepting a variable number of positional arguments.

- The first issue is that the variable arguments are always turned into a tuple before they are passed to your function. This means that if the callers of your function uses the \* operator on a generator, it will be iterated until it's exhausted. The resulting tuple include every value from  the generator, which could comsume a lot of memory and cause your program to crash.

In [50]:
def my_generator():
    for i in range(10):
        yield i
        
        
def my_func(*args):
    print(args)
    

it = my_generator()
my_func(*it)

(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)


- Functions that accept \*args are best for situations where you know the number of inputs in the argument list will be reasonably small. It's ideal for function calls that pass many literals or variable names together. It's primarily for the convenience of the programmer and the readability of the code.


- The second issue with \*args is that you can't add new positional arguments to your function in the future without migrating every caller. If you try to add a positional argument in the front of the argument list, existing callers will subtly break if they aren't updated.

In [51]:
def log(sequence, message, *values):
    if not values:
        print('%s: %s' % (sequence, message))
    else:
        values_str = ', '.join(str(x) for x in values)
        print('%s: %s: %s' % (sequence, message, values_str))

In [53]:
log(1, 'Favorites', 7, 33)
log('Favorite numbers', 7, 33) # Old usage breaks

1: Favorites: 7, 33
Favorite numbers: 7: 33


- The problem here is that the second call to log used 7 as the message parameter because a sequence argument wasn't given. Bugs like this are hard to track down because the code still runs without raising any exceptions. To avoid this possibility entirely, you should use keyword-only arguments when you want to extend functions that accept \* args.

## Things to Remember

- Fucntions can accept a variable number of positional arguments by using \* args in the def statement.
- You can use the items from a sequnce as the positional arguments for a function with the \* operator.
- Using the \* operator with a generator may cause your program to run out ot memory and crash.

- Adding new positional parameters to functions that accept \* args can introduce hard-to-find bugs.