### Item 1: Know Which Version of Python You’re Using

In [7]:
import sys
print(sys.version_info)

# python --version

sys.version_info(major=3, minor=5, micro=1, releaselevel='final', serial=0)


1. There are two major versions of Python still in active use: Python 2 and Python 3.
2. There are multiple popular runtimes for Python: CPython, Jython, IronPython, PyPy, etc.
3. Be sure that the command-line for running Python on your system is the version you expect it to be.
4. Prefer Python 3 for your next project because that is the primary focus of the Python community.

### Item 2: Follow the PEP 8 Style Guide

Python Enhancement Proposal #8, otherwise known as PEP 8, is the style guide for how to format Python code. You are welcome to write Python code however you want, as long as it has valid syntax. However, using a consistent style makes your code more approachable and easier to read. Sharing a common style with other Python programmers in the larger community facilitates collaboration on projects. But even if you are the only one who will ever read your code, following the style guide will make it easier to change things later.

PEP 8 has a wealth of details about how to write clear Python code. It continues to be updated as the Python language evolves. It’s worth reading the whole guide online (http://www.python.org/dev/peps/pep-0008/)

The Pylint tool (http://www.pylint.org/) is a popular static analyzer for Python source code. Pylint provides automated enforcement of the PEP 8 style guide and detects many other types of common errors in Python programs.

### Item 3: Know the Differences Between bytes, str, and unicode

In Python 3, there are two types that represent sequences of characters: bytes and str. Instances of bytes contain raw 8-bit values. Instances of str contain Unicode characters.

In Python 2, there are two types that represent sequences of characters: str and unicode. In contrast to Python 3, instances of str contain raw 8-bit values. Instances of unicode contain Unicode characters.

There are many ways to represent Unicode characters as binary data (raw 8-bit values). The most common encoding is UTF-8. Importantly, str instances in Python 3 and unicode instances in Python 2 do not have an associated binary encoding. 

** To convert Unicode characters to binary data, you must use the encode method.**

** To convert binary data to Unicode characters, you must use the decode method.**

When you’re writing Python programs, it’s important to do encoding and decoding of Unicode at the furthest boundary of your interfaces. The core of your program should use Unicode character types (str in Python 3, unicode in Python 2) and should not assume anything about character encodings. This approach allows you to be very accepting of alternative text encodings (such as Latin-1, Shift JIS, and Big5) while being strict about your output text encoding (ideally, UTF-8).

The split between character types leads to two common situations in Python code:
You want to operate on raw 8-bit values that are UTF-8-encoded characters (or some other encoding).
You want to operate on Unicode characters that have no specific encoding.
You’ll often need two helper functions to convert between these two cases and to ensure that the type of input values matches your code’s expectations.


In [9]:
def to_str(bytes_or_str):
       if isinstance(bytes_or_str, bytes):
           value = bytes_or_str.decode('utf-8')
       else:
           value = bytes_or_str
       return value  # Instance of str

def to_bytes(bytes_or_str):
       if isinstance(bytes_or_str, str):
           value = bytes_or_str.encode('utf-8')
       else:
           value = bytes_or_str
       return value  # Instance of bytes

### Item 4: Write Helper Functions Instead of Complex Expressions

As soon as your expressions get complicated, it’s time to consider splitting them into smaller pieces and moving logic into helper functions. What you gain in readability always outweighs what brevity may have afforded you. Don’t let Python’s pithy syntax for complex expressions get you into a mess like this.

```

from urllib.parse import parse_qs
my_values = parse_qs(‘red=5&blue=0&green=’,
                        keep_blank_values=True)
print(repr(my_values))
>>>{‘red’: [‘5’], ‘green’: [”], ‘blue’: [‘0’]}


red = int(my_values.get(‘red’, [”])[0] or 0)

red = int(red[0]) if red[0] else 0

def get_first_int(values, key, default=0):
       found = values.get(key, [”])
       if found[0]:
           found = int(found[0])
       else:
           found = default
       return found

red = get_first_int(my_values, ‘green’)

```

1. Python’s syntax makes it all too easy to write single-line expressions that are overly complicated and difficult to read.
2. Move complex expressions into helper functions, especially if you need to use the same logic repeatedly.

3. The if/else expression provides a more readable alternative to using Boolean operators like or and and in expressions.


### Item 5: Know How to Slice Sequences

Python includes syntax for slicing sequences into pieces. Slicing lets you access a subset of a sequence’s items with minimal effort. The simplest uses for slicing are the built-in types list, str, and bytes. Slicing can be extended to any Python class that implements the __getitem__ and __setitem__ special methods

** The basic form of the slicing syntax is somelist[start:end], where start is inclusive and end is exclusive.**


In [13]:
a = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
print('First four:', a[:4])
print('Last four: ', a[-4:])
print('Middle two:', a[3:-3])

First four: ['a', 'b', 'c', 'd']
Last four:  ['e', 'f', 'g', 'h']
Middle two: ['d', 'e']


* When slicing from the start of a list, you should leave out the zero index to reduce visual noise.
   assert a[:5] == a[0:5]

* When slicing to the end of a list, you should leave out the final index because it’s redundant.
    assert a[5:] == a[5:len(a)]

* Slicing deals properly with start and end indexes that are beyond the boundaries of the list. That makes it easy for your code to establish a maximum length to consider for an input sequence.
   first_twenty_items = a[:20]
   last_twenty_items = a[-20:]

* In contrast, accessing the same index directly causes an exception.
```
 >>> a[20] 
   IndexError: list index out of range
 ```

The result of slicing a list is a whole new list. References to the objects from the original list are maintained. Modifying the result of slicing won’t affect the original list.

In [16]:
b = a[4:]
print('Before:  ', b)
b[1] = 99
print('After:    ', b)
print('No change:', a)

Before:   ['e', 'f', 'g', 'h']
After:     ['e', 99, 'g', 'h']
No change: ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']


* Avoid being verbose: Don’t supply 0 for the start index or the length of the sequence for the end index.

* Slicing is forgiving of start or end indexes that are out of bounds, making it easy to express slices on the front or back boundaries of a sequence (like a[:20] or a[-20:]).

* Assigning to a list slice will replace that range in the original sequence with what’s referenced even if their lengths are different.

### Item 6: Avoid Using start, end, and stride in a Single Slice

In addition to basic slicing (see Item 5: “Know How to Slice Sequences”), Python has special syntax for the stride of a slice in the form somelist[start:end:stride]. This lets you take every nth item when slicing a sequence. For example, the stride makes it easy to group by even and odd indexes in a list.

In [19]:
   a = ['red', 'orange', 'yellow', 'green', 'blue', 'purple']
   odds = a[::2]
   evens = a[1::2]
   print(odds)
   print(evens)

['red', 'yellow', 'blue']
['orange', 'green', 'purple']


The point is that the stride part of the slicing syntax can be extremely confusing. Having three numbers within the brackets is hard enough to read because of its density. Then it’s not obvious when the start and end indexes come into effect relative to the stride value, especially when stride is negative.
To prevent problems, avoid using stride along with start and end indexes. If you must use a stride, prefer making it a positive value and omit start and end indexes. If you must use stride with start or end indexes, consider using one assignment to stride and another to slice.
```
    ￼b = a[::2]   # [‘a’, ‘c’, ‘e’, ‘g’]
    c = b[1:-1]  # [‘c’, ‘e’]
```
Slicing and then striding will create an extra shallow copy of the data. The first operation should try to reduce the size of the resulting slice by as much as possible. If your program can’t afford the time or memory required for two steps, consider using the itertools built-in module’s islice method (see Item 46: “Use Built-in Algorithms and Data Structures”), which doesn’t permit negative values for start, end, or stride.

* Specifying start, end, and stride in a slice can be extremely confusing.
* Prefer using positive stride values in slices without start or end indexes. Avoid negative stride values if possible.
* Avoid using start, end, and stride together in a single slice. If you need all three parameters, consider doing two assignments (one to slice, another to stride) or using islice from the itertools built-in module.

### Item 7: Use List Comprehensions Instead of map and filter

Python provides compact syntax for deriving one list from another. These expressions are called list comprehensions. For example, say you want to compute the square of each number in a list. You can do this by providing the expression for your computation and the input sequence to loop over.

In [20]:
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
squares = [x**2 for x in a]
print(squares)

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]


Unless you’re applying a single-argument function, list comprehensions are clearer than the map built-in function for simple cases. map requires creating a lambda function for the computation, which is visually noisy.

In [21]:
squares = map(lambda x: x ** 2, a)

Unlike map, list comprehensions let you easily filter items from the input list, removing corresponding outputs from the result. For example, say you only want to compute the squares of the numbers that are divisible by 2. Here, I do this by adding a conditional expression to the list comprehension after the loop:

In [22]:
even_squares = [x**2 for x in a if x % 2 == 0]
print(even_squares)

[4, 16, 36, 64, 100]


The filter built-in function can be used along with map to achieve the same outcome, but it is much harder to read.

In [24]:
alt = map(lambda x: x**2, filter(lambda x: x % 2 == 0, a))
assert even_squares == list(alt)

Dictionaries and sets have their own equivalents of list comprehensions. These make it easy to create derivative data structures when writing algorithms.

In [27]:
chile_ranks = {'ghost': 1, 'habanero': 2, 'cayenne': 3}
rank_dict = {rank: name for name, rank in chile_ranks.items()}
chile_len_set = {len(name) for name in rank_dict.values()}
print(rank_dict)
print(chile_len_set)

{1: 'ghost', 2: 'habanero', 3: 'cayenne'}
{8, 5, 7}


Things to Remember
* List comprehensions are clearer than the map and filter built-in functions because they don’t require extra lambda expressions.
* List comprehensions allow you to easily skip items from the input list, a behavior map doesn’t support without help from filter.
* Dictionaries and sets also support comprehension expressions.

### Item 8: Avoid More Than Two Expressions in List Comprehensions

Beyond basic usage (see Item 7: “Use List Comprehensions Instead of map and filter”), list comprehensions also support multiple levels of looping. For example, say you want to simplify a matrix (a list containing other lists) into one flat list of all cells. Here, I do this with a list comprehension by including two for expressions. These expressions run in the order provided from left to right.

In [30]:
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flat = [x for row in matrix for x in row]
print(flat)

[1, 2, 3, 4, 5, 6, 7, 8, 9]


The example above is simple, readable, and a reasonable usage of multiple loops. Another reasonable usage of multiple loops is replicating the two-level deep layout of the input list.

For example, say you want to square the value in each cell of a two-dimensional matrix. This expression is noisier because of the extra [] characters, but it’s still easy to read.


In [31]:
squared = [[x**2 for x in row] for row in matrix]
print(squared)

[[1, 4, 9], [16, 25, 36], [49, 64, 81]]


If this expression included another loop, the list comprehension would get so long that you’d have to split it over multiple lines.
```
my_lists = [
[[1, 2, 3], [4, 5, 6]], #...
   ]
   flat = [x for sublist1 in my_lists
           for sublist2 in sublist1
           for x in sublist2]
           ```
At this point, the multiline comprehension isn’t much shorter than the alternative. Here, I produce the same result using normal loop statements. The indentation of this version makes the looping clearer than the list comprehension.
  ```
  flat = []
   for sublist1 in my_lists:
       for sublist2 in sublist1:
           flat.extend(sublist2)
           ```

List comprehensions also support multiple if conditions. Multiple conditions at the same loop level are an implicit and expression. For example, say you want to filter a list of numbers to only even values greater than four. These two list comprehensions are equivalent.


In [36]:
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
b = [x for x in a if x > 4 if x % 2 == 0]
c = [x for x in a if x > 4 and x % 2 == 0]
print(b)
print(c)

[6, 8, 10]
[6, 8, 10]


Conditions can be specified at each level of looping after the for expression. For example, say you want to filter a matrix so the only cells remaining are those divisible by 3 in rows that sum to 10 or higher. Expressing this with list comprehensions is short, but extremely difficult to read.
```
   matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
   filtered = [[x for x in row if x % 3 == 0]
               for row in matrix if sum(row) >= 10]
   print(filtered)
   >>>
   [[6], [9]]
   ```
Though this example is a bit convoluted, in practice you’ll see situations arise where such expressions seem like a good fit. I strongly encourage you to avoid using list
￼￼￼￼
comprehensions that look like this. The resulting code is very difficult for others to comprehend. What you save in the number of lines doesn’t outweigh the difficulties it could cause later.
The rule of thumb is to avoid using more than two expressions in a list comprehension. This could be two conditions, two loops, or one condition and one loop. As soon as it gets more complicated than that, you should use normal if and for statements and write a helper function (see Item 16: “Consider Generators Instead of Returning Lists”).

* List comprehensions support multiple levels of loops and multiple conditions per loop level.
* List comprehensions with more than two expressions are very difficult to read and should be avoided.

### Item 9: Consider Generator Expressions for Large Comprehensions

The problem with list comprehensions (see Item 7: “Use List Comprehensions Instead of map and filter”) is that they may create a whole new list containing one item for each value in the input sequence. This is fine for small inputs, but for large inputs this could consume significant amounts of memory and cause your program to crash.

For example, say you want to read a file and return the number of characters on each line. Doing this with a list comprehension would require holding the length of every line of the file in memory. If the file is absolutely enormous or perhaps a never-ending network socket, list comprehensions are problematic. Here, I use a list comprehension in a way that can only handle small input values.


```
value = [len(x) for x in open(‘/tmp/my_file.txt’)]
   print(value)
   >>>
   [100, 57, 15, 1, 12, 75, 5, 86, 89, 11]
```
To solve this, Python provides generator expressions, a generalization of list comprehensions and generators. Generator expressions don’t materialize the whole output sequence when they’re run. Instead, generator expressions evaluate to an iterator that yields one item at a time from the expression.
A generator expression is created by putting list-comprehension-like syntax between () characters. Here, I use a generator expression that is equivalent to the code above. However, the generator expression immediately evaluates to an iterator and doesn’t make any forward progress.

```
   it = (len(x) for x in open(‘/tmp/my_file.txt’))
   print(it)
￼￼￼￼￼￼￼￼￼￼￼
>>>
   <generator object <genexpr> at 0x101b81480>
```

The returned iterator can be advanced one step at a time to produce the next output from the generator expression as needed (using the next built-in function). Your code can consume as much of the generator expression as you want without risking a blowup in memory usage.
```
   print(next(it))
   print(next(it))
>>> 100 57
```
Another powerful outcome of generator expressions is that they can be composed together. Here, I take the iterator returned by the generator expression above and use it as the input for another generator expression.
```
roots = ((x, x**0.5) for x in it)
```
Each time I advance this iterator, it will also advance the interior iterator, creating a domino effect of looping, evaluating conditional expressions, and passing around inputs and outputs.
```
   print(next(roots))
   >>>
   (15, 3.872983346207417)
```
Chaining generators like this executes very quickly in Python. When you’re looking for a way to compose functionality that’s operating on a large stream of input, generator expressions are the best tool for the job. The only gotcha is that the iterators returned by generator expressions are stateful, so you must be careful not to use them more than once (see Item 17: “Be Defensive When Iterating Over Arguments”).

* List comprehensions can cause problems for large inputs by using too much memory.
* Generator expressions avoid memory issues by producing outputs one at a time as an iterator.
* Generator expressions can be composed by passing the iterator from one generator expression into the for subexpression of another.
* Generator expressions execute very quickly when chained together.

### Item 10: Prefer enumerate Over range



The range built-in function is useful for loops that iterate over a set of integers.
```
random_bits = 0
   for i in range(64):
￼￼￼￼￼￼￼
if randint(0, 1):
           random_bits |= 1 << i
```
When you have a data structure to iterate over, like a list of strings, you can loop directly over the sequence.

```
flavor_list = [‘vanilla’, ‘chocolate’, ‘pecan’, ‘strawberry’]
   for flavor in flavor_list:
       print(‘%s is delicious’ % flavor)
```
Often, you’ll want to iterate over a list and also know the index of the current item in the list. For example, say you want to print the ranking of your favorite ice cream flavors. One way to do it is using range.

```
   for i in range(len(flavor_list)):
       flavor = flavor_list[i]
       print(‘%d: %s’ % (i + 1, flavor))
```
This looks clumsy compared with the other examples of iterating over flavor_list or range. You have to get the length of the list. You have to index into the array. It’s harder to read.
Python provides the enumerate built-in function for addressing this situation. enumerate wraps any iterator with a lazy generator. This generator yields pairs of the loop index and the next value from the iterator. The resulting code is much clearer.

```
   for i, flavor in enumerate(flavor_list):
       print(‘%d: %s’ % (i + 1, flavor))
   >>>
   1: vanilla
   2: chocolate
   3: pecan
   4: strawberry
```
You can make this even shorter by specifying the number from which enumerate should begin counting (1 in this case).
```
   for i, flavor in enumerate(flavor_list, 1):
       print(‘%d: %s’ % (i, flavor))
```

* enumerate provides concise syntax for looping over an iterator and getting the index of each item from the iterator as you go.
* Prefer enumerate instead of looping over a range and indexing into a sequence.
* You can supply a second parameter to enumerate to specify the number from which to begin counting (zero is the default).

### Item 11: Use zip to Process Iterators in Parallel

Often in Python you find yourself with many lists of related objects. List comprehensions make it easy to take a source list and get a derived list by applying an expression (see Item 7: “Use List Comprehensions Instead of map and filter”).
```Click here to view code image
   names = [‘Cecilia’, ‘Lise’, ‘Marie’]
   letters = [len(n) for n in names]
```
The items in the derived list are related to the items in the source list by their indexes. To iterate over both lists in parallel, you can iterate over the length of the names source list.
```Click here to view code image
   longest_name = None
   max_letters = 0
   for i in range(len(names)):
       count = letters[i]
       if count > max_letters:
           longest_name = names[i]
           max_letters = count
   print(longest_name)
   >>>
Cecilia
```
The problem is that this whole loop statement is visually noisy. The indexes into names and letters make the code hard to read. Indexing into the arrays by the loop index i happens twice. Using enumerate (see Item 10: “Prefer enumerate Over range”) improves this slightly, but it’s still not ideal.
```Click here to view code image
   for i, name in enumerate(names):
       count = letters[i]
       if count > max_letters:
           longest_name = name
           max_letters = count
           
```
To make this code clearer, Python provides the zip built-in function. In Python 3, zip wraps two or more iterators with a lazy generator. The zip generator yields tuples containing the next value from each iterator. The resulting code is much cleaner than indexing into multiple lists.

```Click here to view code image
   for name, count in zip(names, letters):
       if count > max_letters:
           longest_name = name
           max_letters = count
```
There are two problems with the zip built-in.
The first issue is that in Python 2 zip is not a generator; it will fully exhaust the supplied iterators and return a list of all the tuples it creates. This could potentially use a lot of memory and cause your program to crash. If you want to zip very large iterators in
￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼
Python 2, you should use izip from the itertools built-in module (see Item 46: “Use Built-in Algorithms and Data Structures”).
The second issue is that zip behaves strangely if the input iterators are of different lengths. For example, say you add another name to the list above but forget to update the letter counts. Running zip on the two input lists will have an unexpected result.

``` 
Click here to view code image
   names.append(‘Rosalind’)
   for name, count in zip(names, letters):
   print(name)
   >>>
   Cecilia
   Lise
   Marie
   
```
The new item for 'Rosalind' isn’t there. This is just how zip works. It keeps yielding tuples until a wrapped iterator is exhausted. This approach works fine when you know that the iterators are of the same length, which is often the case for derived lists created by list comprehensions. In many other cases, the truncating behavior of zip is surprising and bad. If you aren’t confident that the lengths of the lists you want to zip are equal, consider using the zip_longest function from the itertools built-in module instead (also called izip_longest in Python 2).

* The zip built-in function can be used to iterate over multiple iterators in parallel.
* In Python 3, zip is a lazy generator that produces tuples. In Python 2, zip returns the full result as a list of tuples.
* zip truncates its output silently if you supply it with iterators of different lengths.
* The zip_longest function from the itertools built-in module lets you iterate over multiple iterators in parallel regardless of their lengths (see Item 46: “Use Built-in Algorithms and Data Structures”).


### Item 12: Avoid else Blocks After for and while Loops

Python loops have an extra feature that is not available in most other programming languages: you can put an else block immediately after a loop’s repeated interior block.

In [42]:
for i in range(3):
    print('Loop %d' % i)
else:
    print('Else block!')

Loop 0
Loop 1
Loop 2
Else block!


Surprisingly, the else block runs immediately after the loop finishes. Why is the clause called “else”? Why not “and”? In an if/else statement, else means, “Do this if the block before this doesn’t happen.” In a try/except statement, except has the same definition: “Do this if trying the block before this failed.”

Similarly, else from try/except/else follows this pattern (see Item 13: “Take Advantage of Each Block in try/except/else/finally”) because it means, “Do this if the block before did not fail.” try/finally is also intuitive because it means, “Always do what is final after trying the block before.”

Given all of the uses of else, except, and finally in Python, a new programmer might assume that the else part of for/else means, “Do this if the loop wasn’t completed.” In reality, it does exactly the opposite. Using a break statement in a loop will actually skip the else block.

In [43]:
for i in range(3):
       print('Loop %d' % i)
       if i == 1:
        break 
else:
       print('Else block!')

Loop 0
Loop 1


### Item 13: Take Advantage of Each Block in try/except/else/finally

#### Finally Blocks
Use try/finally when you want exceptions to propagate up, but you also want to run cleanup code even when exceptions occur. One common usage of try/finally is for reliably closing file handles (see Item 43: “Consider contextlib and with Statements for Reusable try/finally Behavior” for another approach).
```
Click here to view code image
   handle = open(‘/tmp/random_data.txt’)  # May raise IOError
   try:
       data = handle.read()  # May raise UnicodeDecodeError
   finally:
       handle.close()        # Always runs after try:
       
```
Any exception raised by the read method will always propagate up to the calling code, yet the close method of handle is also guaranteed to run in the finally block. You must call open before the try block because exceptions that occur when opening the file (like IOError if the file does not exist) should skip the finally block.

#### Else Blocks

Use try/except/else to make it clear which exceptions will be handled by your code and which exceptions will propagate up. When the try block doesn’t raise an exception, the else block will run. The else block helps you minimize the amount of code in the try block and improves readability. For example, say you want to load JSON dictionary data from a string and return the value of a key it contains.
```
Click here to view code image
   def load_json_key(data, key):
       try:
           result_dict = json.loads(data)  # May raise ValueError
       except ValueError as e:
           raise KeyError from e
       else:
           return result_dict[key]         # May raise KeyError
```
If the data isn’t valid JSON, then decoding with json.loads will raise a ValueError. The exception is caught by the except block and handled. If decoding is
￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼
successful, then the key lookup will occur in the else block. If the key lookup raises any exceptions, they will propagate up to the caller because they are outside the try block. The else clause ensures that what follows the try/except is visually distinguished from the except block. This makes the exception propagation behavior clear.

#### Everything Together

Use try/except/else/finally when you want to do it all in one compound statement. For example, say you want to read a description of work to do from a file, process it, and then update the file in place. Here, the try block is used to read the file and process it. The except block is used to handle exceptions from the try block that are expected. The else block is used to update the file in place and to allow related exceptions to propagate up. The finally block cleans up the file handle.
```
Click here to view code image
   UNDEFINED = object()
   def divide_json(path):
       handle = open(path, ‘r+’)
       try:
           data = handle.read()
           op = json.loads(data)
           value = (
               op[‘numerator’] /
# May raise IOError
# May raise UnicodeDecodeError
# May raise ValueError
￼        op[‘denominator’])  # May raise ZeroDivisionError
except ZeroDivisionError as e:
    return UNDEFINED
else:
    op[‘result’] = value
    result = json.dumps(op)
    handle.seek(0)
    handle.write(result)
    return value
finally:
    handle.close()
# May raise IOError
# Always runs

```
This layout is especially useful because all of the blocks work together in intuitive ways. For example, if an exception gets raised in the else block while rewriting the result data, the finally block will still run and close the file handle.


* The try/finally compound statement lets you run cleanup code regardless of whether exceptions were raised in the try block.
* The else block helps you minimize the amount of code in try blocks and visually distinguish the success case from the try/except blocks.
* An else block can be used to perform additional actions after a successful try block but before common cleanup in a finally block.

# Functions
### Item 14: Prefer Exceptions to Returning None
When writing utility functions, there’s a draw for Python programmers to give special meaning to the return value of None. It seems to makes sense in some cases. For example, say you want a helper function that divides one number by another. In the case of dividing by zero, returning None seems natural because the result is undefined.
```
def divide(a, b):
       try:
           return a / b
       except ZeroDivisionError:
return None
```
Code using this function can interpret the return value accordingly.
   ```
   result = divide(x, y)
   if result is None:
       print(‘Invalid inputs’)
       
```
What happens when the numerator is zero? That will cause the return value to also be zero (if the denominator is non-zero). This can cause problems when you evaluate the result in a condition like an if statement. You may accidentally look for any False equivalent value to indicate errors instead of only looking for None (see Item 4: “Write Helper Functions Instead of Complex Expressions” for a similar situation).

```
Click here to view code image
   x, y = 0, 5
   result = divide(x, y)
   if not result:
       print(‘Invalid inputs’)  # This is wrong!
       
```
This is a common mistake in Python code when None has special meaning. This is why returning None from a function is error prone. There are two ways to reduce the chance of such errors.
The first way is to split the return value into a two-tuple. The first part of the tuple indicates that the operation was a success or failure. The second part is the actual result that was computed.
￼￼￼￼
```
def divide(a, b):
       try:
           return True, a / b
       except ZeroDivisionError:
           return False, None
           
```
Callers of this function have to unpack the tuple. That forces them to consider the status part of the tuple instead of just looking at the result of division.
```
Click here to view code image
   success, result = divide(x, y)
   if not success:
       print(‘Invalid inputs’)
       
```
The problem is that callers can easily ignore the first part of the tuple (using the underscore variable name, a Python convention for unused variables). The resulting code doesn’t look wrong at first glance. This is as bad as just returning None.
```
   _, result = divide(x, y)
   if not result:
       print(‘Invalid inputs’)
       
```
The second, better way to reduce these errors is to never return None at all. Instead, raise an exception up to the caller and make them deal with it. Here, I turn a ZeroDivisionError into a ValueError to indicate to the caller that the input values are bad:

```
Click here to view code image
   def divide(a, b):
       try:
           return a / b
       except ZeroDivisionError as e:
           raise ValueError(‘Invalid inputs’) from e
```
Now the caller should handle the exception for the invalid input case (this behavior should be documented; see Item 49: “Write Docstrings for Every Function, Class, and Module”). The caller no longer requires a condition on the return value of the function. If the function didn’t raise an exception, then the return value must be good. The outcome of exception handling is clear.
```
Click here to view code image
x, y = 5, 2 try:
       result = divide(x, y)
   except ValueError:
       print(‘Invalid inputs’)
   else:
       print(‘Result is %.1f’ % result)
   >>>
Result is 2.5

```
Things to Remember
Functions that return None to indicate special meaning are error prone because None and other values (e.g., zero, the empty string) all evaluate to False in conditional expressions.
￼￼￼￼￼￼
Raise exceptions to indicate special situations instead of returning None. Expect the calling code to handle exceptions properly when they’re documented.

### Item 15: Know How Closures Interact with Variable Scope

Say you want to sort a list of numbers but prioritize one group of numbers to come first. This pattern is useful when you’re rendering a user interface and want important messages or exceptional events to be displayed before everything else.

A common way to do this is to pass a helper function as the key argument to a list’s sort method. The helper’s return value will be used as the value for sorting each item in the list. The helper can check whether the given item is in the important group and can vary the sort key accordingly.

In [51]:
def sort_priority(values, group):
       def helper(x):
           if x in group:
               return (0, x)
           return (1, x)
       values.sort(key=helper)

numbers = [8, 3, 1, 2, 5, 4, 7, 6]
group = {2, 3, 5, 7}
sort_priority(numbers, group)
print(numbers)

[2, 3, 5, 7, 1, 4, 6, 8]


There are three reasons why this function operates as expected:
* Python supports closures: functions that refer to variables from the scope in which they were defined. This is why the helper function is able to access the group argument to sort_priority.
* Functions are first-class objects in Python, meaning you can refer to them directly, assign them to variables, pass them as arguments to other functions, compare them in expressions and if statements, etc. This is how the sort method can accept a closure function as the key argument.
** * Python has specific rules for comparing tuples. It first compares items in index zero, then index one, then index two, and so on. This is why the return value from the helper closure causes the sort order to have two distinct groups.**

It’d be nice if this function returned whether higher-priority items were seen at all so the user interface code can act accordingly. Adding such behavior seems straightforward. There’s already a closure function for deciding which group each number is in. Why not also use the closure to flip a flag when high-priority items are seen? Then the function can return the flag value after it’s been modified by the closure.

In [52]:
def sort_priority2(numbers, group):
       found = False
       def helper(x):
           if x in group:
               found = True  # Seems simple
               return (0, x)
           return (1, x)
       numbers.sort(key=helper)
       return found

In [54]:
found = sort_priority2(numbers, group)
print('Found:', found)
print(numbers)

Found: False
[2, 3, 5, 7, 1, 4, 6, 8]


The sorted results are correct, but the found result is wrong. Items from group were definitely found in numbers, but the function returned False. How could this happen?
When you reference a variable in an expression, the Python interpreter will traverse the scope to resolve the reference in this order:
LEGB Rule
=========
1. The current function’s scope
2. Any enclosing scopes (like other containing functions)
3. The scope of the module that contains the code (also called the global scope)
4. The built-in scope (that contains functions like len and str)


If none of these places have a defined variable with the referenced name, then a NameError exception is raised.

Assigning a value to a variable works differently. If the variable is already defined in the current scope, then it will just take on the new value. If the variable doesn’t exist in the current scope, then Python treats the assignment as a variable definition. The scope of the newly defined variable is the function that contains the assignment.
This assignment behavior explains the wrong return value of the sort_priority2 function. The found variable is assigned to True in the helper closure. The closure’s assignment is treated as a new variable definition within helper, not as an assignment within sort_priority2.

Encountering this problem is sometimes called the scoping bug because it can be so surprising to newbies. But this is the intended result. This behavior prevents local variables in a function from polluting the containing module. Otherwise, every assignment within a function would put garbage into the global module scope. Not only would that be noise, but the interplay of the resulting global variables could cause obscure bugs.


##### Getting Data Out
In Python 3, there is special syntax for getting data out of a closure. The nonlocal statement is used to indicate that scope traversal should happen upon assignment for a specific variable name. The only limit is that nonlocal won’t traverse up to the module- level scope (to avoid polluting globals).
Here, I define the same function again using nonlocal:
```
Click here to view code image
   def sort_priority3(numbers, group):
       found = False
       def helper(x):
           nonlocal found
           if x in group:
               found = True
               return (0, x)
           return (1, x)
       numbers.sort(key=helper)
       return found
```
 
The nonlocal statement makes it clear when data is being assigned out of a closure into another scope. It’s complementary to the global statement, which indicates that a variable’s assignment should go directly into the module scope.
However, much like the anti-pattern of global variables, I’d caution against using nonlocal for anything beyond simple functions. The side effects of nonlocal can be hard to follow. It’s especially hard to understand in long functions where the nonlocal statements and assignments to associated variables are far apart.
When your usage of nonlocal starts getting complicated, it’s better to wrap your state in a helper class. Here, I define a class that achieves the same result as the nonlocal approach. It’s a little longer, but is much easier to read (see Item 23: “Accept Functions for Simple Interfaces Instead of Classes” for details on the __call__ special method).
```Click here to view code image
   class Sorter(object):
       def __init__(self, group):
           self.group = group
           self.found = False
       def __call__(self, x):
           if x in self.group:
￼￼￼￼￼self.found = True
return (0, x)
           return (1, x)
   sorter = Sorter(group)
   numbers.sort(key=sorter)
   assert sorter.found is True
```

##### Scope in Python 2
Unfortunately, Python 2 doesn’t support the nonlocal keyword. In order to get similar behavior, you need to use a work-around that takes advantage of Python’s scoping rules. This approach isn’t pretty, but it’s the common Python idiom.
``` Click here to view code image
# Python 2
   def sort_priority(numbers, group):
       found = [False]
       def helper(x):
           if x in group:
               found[0] = True
               return (0, x)
           return (1, x)
       numbers.sort(key=helper)
       return found[0]
```
As explained above, Python will traverse up the scope where the found variable is referenced to resolve its current value. The trick is that the value for found is a list, which is mutable. This means that once retrieved, the closure can modify the state of found to send data out of the inner scope (with found[0] = True).
This approach also works when the variable used to traverse the scope is a dictionary, a set, or an instance of a class you’ve defined.

Things to Remember
* Closure functions can refer to variables from any of the scopes in which they were defined.
* By default, closures can’t affect enclosing scopes by assigning variables.
* In Python 3, use the nonlocal statement to indicate when a closure can modify a variable in its enclosing scopes.
* In Python 2, use a mutable value (like a single-item list) to work around the lack of the nonlocal statement.
* Avoid using nonlocal statements for anything beyond simple functions.



### Item 16: Consider Generators Instead of Returning Lists

The simplest choice for functions that produce a sequence of results is to return a list of items. For example, say you want to find the index of every word in a string. Here, I accumulate results in a list using the append method and return it at the end of the function:
￼￼￼￼￼￼
```
Click here to view code image
   def index_words(text):
       result = []
       if text:
           result.append(0)
       for index, letter in enumerate(text):
           if letter == ‘ ‘:
               result.append(index + 1)
       return result
```
This works as expected for some sample input.

``` 

Click here to view code image
   address = ‘Four score and seven years ago...’
   result = index_words(address)
   print(result[:3])
   
    [0, 5, 11]

```
There are two problems with the index_words function.
The first problem is that the code is a bit dense and noisy. Each time a new result is found, I call the append method. The method call’s bulk (result.append) deemphasizes the value being added to the list (index + 1). There is one line for creating the result list and another for returning it. While the function body contains ~130 characters (without whitespace), only ~75 characters are important.
A better way to write this function is using a generator. Generators are functions that use yield expressions. When called, generator functions do not actually run but instead immediately return an iterator. With each call to the next built-in function, the iterator will advance the generator to its next yield expression. Each value passed to yield by the generator will be returned by the iterator to the caller.
Here, I define a generator function that produces the same results as before:
```Click here to view code image
   def index_words_iter(text):
       if text:
           yield 0
       for index, letter in enumerate(text):
           if letter == ‘ ‘:
               yield index + 1
```
It’s significantly easier to read because all interactions with the result list have been eliminated. Results are passed to yield expressions instead. The iterator returned by the generator call can easily be converted to a list by passing it to the list built-in function (see Item 9: “Consider Generator Expressions for Large Comprehensions” for how this works).
```Click here to view code image
   result = list(index_words_iter(address))
```
The second problem with index_words is that it requires all results to be stored in the list before being returned. For huge inputs, this can cause your program to run out of
￼￼￼￼￼￼
memory and crash. In contrast, a generator version of this function can easily be adapted to take inputs of arbitrary length.
Here, I define a generator that streams input from a file one line at a time and yields outputs one word at a time. The working memory for this function is bounded to the maximum length of one line of input.
```  
    
    def index_file(handle):
       offset = 0
       for line in handle:
           if line:
               yield offset
           for letter in line:
               offset += 1
               if letter == ‘ ‘:
yield offset

```

Running the generator produces the same results.
``` 

Click here to view code image
   with open(‘/tmp/address.txt’, ‘r’) as f:
       it = index_file(f)
       results = islice(it, 0, 3)
       print(list(results))
>>>
[0, 5, 11]

```
The only gotcha of defining generators like this is that the callers must be aware that the iterators returned are stateful and can’t be reused (see Item 17: “Be Defensive When Iterating Over Arguments”).
Things to Remember
Using generators can be clearer than the alternative of returning lists of accumulated results.
The iterator returned by a generator produces the set of values passed to yield expressions within the generator function’s body.
Generators can produce a sequence of outputs for arbitrarily large inputs because their working memory doesn’t include all inputs and outputs.

### Item 17: Be Defensive When Iterating Over Arguments

When a function takes a list of objects as a parameter, it’s often important to iterate over that list multiple times. For example, say you want to analyze tourism numbers for the U.S. state of Texas. Imagine the data set is the number of visitors to each city (in millions per year). You’d like to figure out what percentage of overall tourism each city receives.
To do this you need a normalization function. It sums the inputs to determine the total number of tourists per year. Then it divides each city’s individual visitor count by the total to find that city’s contribution to the whole.
``` 
Click here to view code image
￼￼￼￼￼￼￼￼
   def normalize(numbers):
       total = sum(numbers)
       result = []
       for value in numbers:
           percent = 100 * value / total
           result.append(percent)
       return result
```

This function works when given a list of visits.
```
Click here to view code image
   visits = [15, 35, 80]
   percentages = normalize(visits)
   print(percentages)
   >>>
   [11.538461538461538, 26.923076923076923, 61.53846153846154]
   
```
To scale this up, I need to read the data from a file that contains every city in all of Texas. I define a generator to do this because then I can reuse the same function later when I want to compute tourism numbers for the whole world, a much larger data set (see Item 16: “Consider Generators Instead of Returning Lists”).
```
Click here to view code image
   def read_visits(data_path):
       with open(data_path) as f:
           for line in f:
               yield int(line)
Surprisingly, calling normalize on the generator’s return value produces no results.
Click here to view code image
   it = read_visits(‘/tmp/my_numbers.txt’)
   percentages = normalize(it)
   print(percentages)
>>> []

```

The cause of this behavior is that an iterator only produces its results a single time. If you iterate over an iterator or generator that has already raised a StopIteration exception, you won’t get any results the second time around.
```
Click here to view code image
   it = read_visits(‘/tmp/my_numbers.txt’)
   print(list(it))
   print(list(it))  # Already exhausted
   >>>
   [15, 35, 80]
   []
   
```
What’s confusing is that you also won’t get any errors when you iterate over an already exhausted iterator. for loops, the list constructor, and many other functions throughout the Python standard library expect the StopIteration exception to be raised during normal operation. These functions can’t tell the difference between an iterator that has no output and an iterator that had output and is now exhausted.
￼￼￼￼￼￼
To solve this problem, you can explicitly exhaust an input iterator and keep a copy of its entire contents in a list. You can then iterate over the list version of the data as many times as you need to. Here’s the same function as before, but it defensively copies the input iterator:
```
Click here to view code image
   def normalize_copy(numbers):
       numbers = list(numbers)  # Copy the iterator
       total = sum(numbers)
       result = []
       for value in numbers:
           percent = 100 * value / total
           result.append(percent)
       return result
       
```
Now the function works correctly on a generator’s return value.
```
Click here to view code image
   it = read_visits(‘/tmp/my_numbers.txt’)
   percentages = normalize_copy(it)
   print(percentages)
   >>>
   [11.538461538461538, 26.923076923076923, 61.53846153846154]
```

The problem with this approach is the copy of the input iterator’s contents could be large. Copying the iterator could cause your program to run out of memory and crash. One way around this is to accept a function that returns a new iterator each time it’s called.
```
Click here to view code image
   def normalize_func(get_iter):
       total = sum(get_iter())   # New iterator
       result = []
       for value in get_iter():  # New iterator
           percent = 100 * value / total
           result.append(percent)
       return result
```
To use normalize_func, you can pass in a lambda expression that calls the generator and produces a new iterator each time.
```
Click here to view code image
   percentages = normalize_func(lambda: read_visits(path))
```

Though it works, having to pass a lambda function like this is clumsy. The better way to achieve the same result is to provide a new container class that implements the iterator protocol.
The iterator protocol is how Python for loops and related expressions traverse the contents of a container type. When Python sees a statement like for x in foo it will actually call iter(foo). The iter built-in function calls the foo.__iter__ special method in turn. The __iter__ method must return an iterator object (which itself implements the __next__ special method). Then the for loop repeatedly calls the next built-in function on the iterator object until it’s exhausted (and raises a StopIteration exception).
￼￼￼￼
It sounds complicated, but practically speaking you can achieve all of this behavior for your classes by implementing the __iter__ method as a generator. Here, I define an iterable container class that reads the files containing tourism data:
```
Click here to view code image
   class ReadVisits(object):
       def __init__(self, data_path):
           self.data_path = data_path
       def __iter__(self):
           with open(self.data_path) as f:
               for line in f:
                   yield int(line)
```

This new container type works correctly when passed to the original function without any modifications.
```
Click here to view code image
   visits = ReadVisits(path)
   percentages = normalize(visits)
   print(percentages)
   >>>
   [11.538461538461538, 26.923076923076923, 61.53846153846154]
```

This works because the sum method in normalize will call ReadVisits.__iter__ to allocate a new iterator object. The for loop to normalize the numbers will also call __iter__ to allocate a second iterator object. Each of those iterators will be advanced and exhausted independently, ensuring that each unique iteration sees all of the input data values. The only downside of this approach is that it reads the input data multiple times.
Now that you know how containers like ReadVisits work, you can write your functions to ensure that parameters aren’t just iterators. The protocol states that when an iterator is passed to the iter built-in function, iter will return the iterator itself. In contrast, when a container type is passed to iter, a new iterator object will be returned each time. Thus, you can test an input value for this behavior and raise a TypeError to reject iterators.

```Click here to view code image
   def normalize_defensive(numbers):
       if iter(numbers) is iter(numbers):  # An iterator — bad!
           raise TypeError(‘Must supply a container’)
       total = sum(numbers)
       result = []
       for value in numbers:
           percent = 100 * value / total
           result.append(percent)
       return result
```
This is ideal if you don’t want to copy the full input iterator like normalize_copy above, but you also need to iterate over the input data multiple times. This function works as expected for list and ReadVisits inputs because they are containers. It will work for any type of container that follows the iterator protocol.
￼￼￼
``` here to view code image
   visits = [15, 35, 80]
   normalize_defensive(visits)  # No error
   visits = ReadVisits(path)
   normalize_defensive(visits)  # No error
```

The function will raise an exception if the input is iterable but not a container.
``` 

Click here to view code image
   it = iter(visits)
   normalize_defensive(it)
   >>>
   TypeError: Must supply a container
```
Things to Remember
Beware of functions that iterate over input arguments multiple times. If these arguments are iterators, you may see strange behavior and missing values.
Python’s iterator protocol defines how containers and iterators interact with the iter and next built-in functions, for loops, and related expressions.
You can easily define your own iterable container type by implementing the __iter__ method as a generator.
You can detect that a value is an iterator (instead of a container) if calling iter on it twice produces the same result, which can then be progressed with the next built- in function.

### Item 18: Reduce Visual Noise with Variable Positional Arguments

Accepting optional positional arguments (often called star args in reference to the conventional name for the parameter, *args) can make a function call more clear and remove visual noise.
For example, say you want to log some debug information. With a fixed number of arguments, you would need a function that takes a message and a list of values.
```
Click here to view code image
   def log(message, values):
       if not values:
           print(message)
       else:
           values_str = ‘, ‘.join(str(x) for x in values)
           print(‘%s: %s’ % (message, values_str))
   log(‘My numbers are’, [1, 2])
   log(‘Hi there’, [])
   >>>
   My numbers are: 1, 2
   Hi there
￼￼￼￼￼￼￼
```

Having to pass an empty list when you have no values to log is cumbersome and noisy. It’d be better to leave out the second argument entirely. You can do this in Python by prefixing the last positional parameter name with *. The first parameter for the log message is required, whereas any number of subsequent positional arguments are optional. The function body doesn’t need to change, only the callers do.
```
Click here to view code image
   def log(message, *values):  # The only difference
       if not values:
           print(message)
       else:
           values_str = ‘, ‘.join(str(x) for x in values)
           print(‘%s: %s’ % (message, values_str))
   log(‘My numbers are’, 1, 2)
   log(‘Hi there’)  # Much better
   >>>
   My numbers are: 1, 2
   Hi there
```
If you already have a list and want to call a variable argument function like log, you can do this by using the * operator. This instructs Python to pass items from the sequence as positional arguments.
```
Click here to view code image
   favorites = [7, 33, 99]
   log(‘Favorite colors’, *favorites)
   >>>
   Favorite colors: 7, 33, 99
```

There are two problems with accepting a variable number of positional arguments.
The first issue is that the variable arguments are always turned into a tuple before they are passed to your function. This means that if the caller of your function uses the * operator on a generator, it will be iterated until it’s exhausted. The resulting tuple will include every value from the generator, which could consume a lot of memory and cause your program to crash.

```
Click here to view code image
   def my_generator():
       for i in range(10):
           yield i
   def my_func(*args):
       print(args)
   it = my_generator()
   my_func(*it)
   >>>
   (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
```

Functions that accept *args are best for situations where you know the number of inputs in the argument list will be reasonably small. It’s ideal for function calls that pass many
￼￼￼
literals or variable names together. It’s primarily for the convenience of the programmer and the readability of the code.
The second issue with *args is that you can’t add new positional arguments to your function in the future without migrating every caller. If you try to add a positional argument in the front of the argument list, existing callers will subtly break if they aren’t updated.
```
Click here to view code image
   def log(sequence, message, *values):
       if not values:
           print(‘%s: %s’ % (sequence, message))
       else:
           values_str = ‘, ‘.join(str(x) for x in values)
           print(‘%s: %s: %s’ % (sequence, message, values_str))
   log(1, ‘Favorites’, 7, 33)      # New usage is OK
   log(‘Favorite numbers’, 7, 33)  # Old usage breaks
   >>>
   1: Favorites: 7, 33
   Favorite numbers: 7: 33
```
The problem here is that the second call to log used 7 as the message parameter because a sequence argument wasn’t given. Bugs like this are hard to track down because the code still runs without raising any exceptions. To avoid this possibility entirely, you should use keyword-only arguments when you want to extend functions that accept *args (see Item 21: “Enforce Clarity with Keyword-Only Arguments”).
Things to Remember
Functions can accept a variable number of positional arguments by using *args in the def statement.
You can use the items from a sequence as the positional arguments for a function with the * operator.
Using the * operator with a generator may cause your program to run out of memory and crash.
Adding new positional parameters to functions that accept *args can introduce hard-to-find bugs.

### Item 19: Provide Optional Behavior with Keyword Arguments

Like most other programming languages, calling a function in Python allows for passing arguments by position.

In [79]:
def remainder(number, divisor):
       return number % divisor
remainder(20, 3)
remainder(20, divisor=3)
remainder(number=20, divisor=3)
remainder(divisor=3 , number=20)

2

All positional arguments to Python functions can also be passed by keyword, where the name of the argument is used in an assignment within the parentheses of a function call. The keyword arguments can be passed in any order as long as all of the required positional arguments are specified. You can mix and match keyword and positional arguments. These calls are equivalent:

Things to Remember
* Function arguments can be specified by position or by keyword.
* Keywords make it clear what the purpose of each argument is when it would be confusing with only positional arguments.
* Keyword arguments with default values make it easy to add new behaviors to a function, especially when the function has existing callers.
* Optional keyword arguments should always be passed by keyword instead of by position.

### Item 20: Use None and Docstrings to Specify Dynamic Default Arguments

Sometimes you need to use a non-static type as a keyword argument’s default value. For example, say you want to print logging messages that are marked with the time of the logged event. In the default case, you want the message to include the time when the function was called. You might try the following approach, assuming the default arguments are reevaluated each time the function is called.

In [85]:
import datetime as dt

def log(message, when=dt.datetime.now()):
       print('%s: %s' % (when, message))
        
log("Hello")
log("World")
log("World2")   
# all have same timestamp which is woring...reason ebing is that when the function object is created at
# load time dattime is evaluated and kep in defaultkeys hence you have same when.

2016-03-26 20:10:34.937349: Hello
2016-03-26 20:10:34.937349: World
2016-03-26 20:10:34.937349: World2


The convention for achieving the desired result in Python is to provide a default value of None and to document the actual behavior in the docstring (see Item 49: “Write Docstrings for Every Function, Class, and Module”). When your code sees an argument value of None, you allocate the default value accordingly.

In [88]:
def log(message, when=None):
       """ Log a message with a timestamp.
       Args:
           message: Message to print.
           when: datetime of when the message occurred.
               Defaults to the present time.
       """
       when = dt.datetime.now() if when is None else when
       print('%s: %s' % (when, message))
        
log("Hello")
log("World")
log("World2") 

2016-03-26 20:13:04.918648: Hello
2016-03-26 20:13:04.918748: World
2016-03-26 20:13:04.918798: World2


Using None for default argument values is especially important when the arguments are mutable. For example, say you want to load a value encoded as JSON data. If decoding the data fails, you want an empty dictionary to be returned by default. You might try this approach.

In [93]:
import json
def decode(data, default={}):
       try:
           return json.loads(data)
       except ValueError:
           return default

foo = decode('bad data')
foo['stuff'] = 5
bar = decode('also bad')
bar['meep'] = 1
print('Foo:', foo)
print('Bar:', bar)

Foo: {'meep': 1, 'stuff': 5}
Bar: {'meep': 1, 'stuff': 5}


The problem here is the same as the datetime.now example above. The dictionary specified for default will be shared by all calls to decode because default argument values are only evaluated once (at module load time). This can cause extremely surprising behavior.

You’d expect two different dictionaries, each with a single key and value. But modifying one seems to also modify the other. The culprit is that foo and bar are both equal to the default parameter. They are the same dictionary object.
assert foo is bar
The fix is to set the keyword argument default value to None and then document the
behavior in the function’s docstring.

In [96]:
def decode(data, default=None):
       """
       Load JSON data from a string.
       Args:
           data: JSON data to decode.
           default: Value to return if decoding fails.
               Defaults to an empty dictionary.
       """
       if default is None:
           default = {}
       try:
           return json.loads(data)
       except ValueError:
           return default
        
foo = decode('bad data')
foo['stuff'] = 5
bar = decode('also bad')
bar['meep'] = 1
print('Foo:', foo)
print('Bar:', bar)

Foo: {'stuff': 5}
Bar: {'meep': 1}


Things to Remember
* Default arguments are only evaluated once: during function definition at module load time. This can cause odd behaviors for dynamic values (like {} or []).
* Use None as the default value for keyword arguments that have a dynamic value. Document the actual default behavior in the function’s docstring

### Item 21: Enforce Clarity with Keyword-Only Arguments

In [102]:
def safe_division(number, divisor, ignore_overflow=True,
                     ignore_zero_division=False):
       try:
           return number / divisor
       except OverflowError:
           if ignore_overflow:
               return 0
           else:
               raise
       except ZeroDivisionError:
           if ignore_zero_division:
               return float('inf')
           else: 
                raise

In [99]:
result = safe_division(1, 10**500, True, False)
print(result)

0.0


In [103]:
safe_division(1, 10**500, ignore_overflow=True)
safe_division(1, 0, ignore_zero_division=True)

inf

Things to Remember
* Keyword arguments make the intention of a function call more clear.
* Use keyword-only arguments to force callers to supply keyword arguments for potentially confusing functions, especially those that accept multiple Boolean flags.
* Python 3 supports explicit syntax for keyword-only arguments in functions.
* Python 2 can emulate keyword-only arguments for functions by using **kwargs and manually raising TypeError exceptions.

# Classes and Inheritance

As an object-oriented programming language, Python supports a full range of features, such as inheritance, polymorphism, and encapsulation. Getting things done in Python often requires writing new classes and defining how they interact through their interfaces and hierarchies.

Python’s classes and inheritance make it easy to express your program’s intended behaviors with objects. They allow you to improve and expand functionality over time. They provide flexibility in an environment of changing requirements. Knowing how to use them well enables you to write maintainable code.

### Item 22: Prefer Helper Classes Over Bookkeeping with Dictionaries and Tuples

##### Refactoring to Classes
You can start moving to classes at the bottom of the dependency tree: a single grade. A class seems too heavyweight for such simple information. A tuple, though, seems appropriate because grades are immutable. Here, I use the tuple (score, weight) to track grades in a list:
```
Click here to view code image
grades = []
grades.append((95, 0.45))
#...
total = sum(score * weight for score, weight in grades) total_weight = sum(weight for _, weight in grades) average_grade = total / total_weight
```
The problem is that plain tuples are positional. When you want to associate more information with a grade, like a set of notes from the teacher, you’ll need to rewrite every usage of the two-tuple to be aware that there are now three items present instead of two. Here, I use _ (the underscore variable name, a Python convention for unused variables) to capture the third entry in the tuple and just ignore it:
```
Click here to view code image
grades = []
grades.append((95, 0.45, ‘Great job’))
#...
total = sum(score * weight for score, weight, _ in grades)
￼￼￼
total_weight = sum(weight for _, weight, _ in grades)
   average_grade = total / total_weight
```

This pattern of extending tuples longer and longer is similar to deepening layers of dictionaries. As soon as you find yourself going longer than a two-tuple, it’s time to consider another approach.
The namedtuple type in the collections module does exactly what you need. It lets you easily define tiny, immutable data classes.
```
Click here to view code image
   import collections
   Grade = collections.namedtuple(‘Grade’, (‘score’, ‘weight’))
```
These classes can be constructed with positional or keyword arguments. The fields are accessible with named attributes. Having named attributes makes it easy to move from a namedtuple to your own class later if your requirements change again and you need to add behaviors to the simple data containers.


#### Limitations of namedtuple
* Although useful in many circumstances, it’s important to understand when
namedtuple can cause more harm than good.

* You can’t specify default argument values for namedtuple classes. This makes them unwieldy when your data may have many optional properties. If you find yourself using more than a handful of attributes, defining your own class may be a better choice.

* The attribute values of namedtuple instances are still accessible using numerical indexes and iteration. Especially in externalized APIs, this can lead to unintentional usage that makes it harder to move to a real class later. If you’re not in control of all of the usage of your namedtuple instances, it’s better to define your own class.

* Avoid making dictionaries with values that are other dictionaries or long tuples.
* Use namedtuple for lightweight, immutable data containers before you need the flexibility of a full class.
* Move your bookkeeping code to use multiple helper classes when your internal state dictionaries get complicated.

### Item 23: Accept Functions for Simple Interfaces Instead of Classes

Many of Python’s built-in APIs allow you to customize behavior by passing in a function. These hooks are used by APIs to call back your code while they execute. For example, the list type’s sort method takes an optional key argument that’s used to determine each index’s value for sorting. Here, I sort a list of names based on their lengths by providing a lambda expression as the key hook:

In other languages, you might expect hooks to be defined by an abstract class. In Python, many hooks are just stateless functions with well-defined arguments and return values. Functions are ideal for hooks because they are easier to describe and simpler to define than classes. Functions work as hooks because Python has first-class functions: Functions and methods can be passed around and referenced like any other value in the language.

In [105]:
names = ['Socrates', 'Archimedes', 'Plato', 'Aristotle']
names.sort(key=lambda x: len(x))
print(names)

['Plato', 'Socrates', 'Aristotle', 'Archimedes']


* Instead of defining and instantiating classes, functions are often all you need for simple interfaces between components in Python.
* References to functions and methods in Python are first class, meaning they can be
￼￼￼￼
* used in expressions like any other type.
* The __call__ special method enables instances of a class to be called like plain Python functions.
* When you need a function to maintain state, consider defining a class that provides the __call__ method instead of defining a stateful closure (see Item 15: “Know How Closures Interact with Variable Scope”).

### Item 24: Use @classmethod Polymorphism to ConstructObjects Generically

** In Python, not only do the objects support polymorphism, but the classes do as well. What does that mean, and what is it good for?**

Polymorphism is a way for multiple classes in a hierarchy to implement their own unique versions of a method. This allows many classes to fulfill the same interface or abstract base class while providing different functionality (see Item 28: “Inherit from collections.abc for Custom Container Types” for an example).

Things to Remember
* Python only supports a single constructor per class, the __init__ method. 
* Use @classmethod to define alternative constructors for your classes.
* Use class method polymorphism to provide generic ways to build and connect concrete subclasses.

### Item 25: Initialize Parent Classes with super

The old way to initialize a parent class from a child class is to directly call the parent
class’s __init__ method with the child instance.

```
class MyBaseClass(object):
       def __init__(self, value):
            self.value = value
            
class MyChildClass(MyBaseClass):
       def __init__(self):
            MyBaseClass.__init__(self, 5)
```
This approach works fine for simple hierarchies but breaks down in many cases.

This approach works fine for simple hierarchies but breaks down in many cases.
If your class is affected by multiple inheritance (something to avoid in general; see Item 26: “Use Multiple Inheritance Only for Mix-in Utility Classes”), calling the superclasses’ __init__ methods directly can lead to unpredictable behavior.
One problem is that the __init__ call order isn’t specified across all subclasses. For example, here I define two parent classes that operate on the instance’s value field:

```   

 ```
 
 

In [108]:
class MyBaseClass(object):
       def __init__(self, value):
           self.value = value
class TimesTwo(object):
    def __init__(self):
           self.value *= 2
class PlusFive(object):
    def __init__(self):
           self.value += 5
            
class OneWay(MyBaseClass, TimesTwo, PlusFive):
       def __init__(self, value):
           MyBaseClass.__init__(self, value)
           TimesTwo.__init__(self)
           PlusFive.__init__(self)           

foo = OneWay(5)
print('First ordering is (5 * 2) + 5 =', foo.value)

First ordering is (5 * 2) + 5 = 15


```
The old way to initialize a parent class from a child class is to directly call the parent
class’s __init__ method with the child instance.
Click here to view code image
   class MyBaseClass(object):
       def __init__(self, value):
           self.value = value
   class MyChildClass(MyBaseClass):
       def __init__(self):
MyBaseClass.__init__(self, 5)
This approach works fine for simple hierarchies but breaks down in many cases.
If your class is affected by multiple inheritance (something to avoid in general; see Item 26: “Use Multiple Inheritance Only for Mix-in Utility Classes”), calling the superclasses’ __init__ methods directly can lead to unpredictable behavior.
One problem is that the __init__ call order isn’t specified across all subclasses. For example, here I define two parent classes that operate on the instance’s value field:
   class TimesTwo(object):
       def __init__(self):
           self.value *= 2
   class PlusFive(object):
       def __init__(self):
           self.value += 5
This class defines its parent classes in one ordering.
Click here to view code image
   class OneWay(MyBaseClass, TimesTwo, PlusFive):
       def __init__(self, value):
           MyBaseClass.__init__(self, value)
           TimesTwo.__init__(self)
           PlusFive.__init__(self)
￼￼￼￼￼￼￼￼￼
And constructing it produces a result that matches the parent class ordering.
Click here to view code image
   foo = OneWay(5)
   print(‘First ordering is (5 * 2) + 5 =’, foo.value)
   >>>
   First ordering is (5 * 2) + 5 = 15
Here’s another class that defines the same parent classes but in a different ordering:
Click here to view code image
   class AnotherWay(MyBaseClass, PlusFive, TimesTwo):
       def __init__(self, value):
           MyBaseClass.__init__(self, value)
           TimesTwo.__init__(self)
           PlusFive.__init__(self)
However, I left the calls to the parent class constructors PlusFive.__init__ and TimesTwo.__init__ in the same order as before, causing this class’s behavior not to match the order of the parent classes in its definition.
Click here to view code image
   bar = AnotherWay(5)
   print(‘Second ordering still is’, bar.value)
   >>>
   Second ordering still is 15
Another problem occurs with diamond inheritance. Diamond inheritance happens when a subclass inherits from two separate classes that have the same superclass somewhere in the hierarchy. Diamond inheritance causes the common superclass’s __init__ method to run multiple times, causing unexpected behavior. For example, here I define two child classes that inherit from MyBaseClass.
Click here to view code image
   class TimesFive(MyBaseClass):
       def __init__(self, value):
           MyBaseClass.__init__(self, value)
           self.value *= 5
   class PlusTwo(MyBaseClass):
       def __init__(self, value):
           MyBaseClass.__init__(self, value)
           self.value += 2
Then, I define a child class that inherits from both of these classes, making MyBaseClass the top of the diamond.
Click here to view code image
   class ThisWay(TimesFive, PlusTwo):
       def __init__(self, value):
           TimesFive.__init__(self, value)
           PlusTwo.__init__(self, value)
   foo = ThisWay(5)
   print(‘Should be (5 * 5) + 2 = 27 but is’, foo.value)
￼￼￼￼￼
   >>>
   Should be (5 * 5) + 2 = 27 but is 7
The output should be 27 because (5 * 5) + 2 = 27. But the call to the second parent class’s constructor, PlusTwo.__init__, causes self.value to be reset back to 5 when MyBaseClass.__init__ gets called a second time.
To solve these problems, Python 2.2 added the super built-in function and defined the method resolution order (MRO). The MRO standardizes which superclasses are initialized before others (e.g., depth-first, left-to-right). It also ensures that common superclasses in diamond hierarchies are only run once.
Here, I create a diamond-shaped class hierarchy again, but this time I use super (in the Python 2 style) to initialize the parent class:
Click here to view code image
# Python 2
   class TimesFiveCorrect(MyBaseClass):
       def __init__(self, value):
           super(TimesFiveCorrect, self).__init__(value)
           self.value *= 5
   class PlusTwoCorrect(MyBaseClass):
       def __init__(self, value):
           super(PlusTwoCorrect, self).__init__(value)
           self.value += 2
Now the top part of the diamond, MyBaseClass.__init__, is only run a single time. The other parent classes are run in the order specified in the class statement.
Click here to view code image
# Python 2
   class GoodWay(TimesFiveCorrect, PlusTwoCorrect):
       def __init__(self, value):
           super(GoodWay, self).__init__(value)
   foo = GoodWay(5)
   print ‘Should be 5 * (5 + 2) = 35 and is’, foo.value
   >>>
   Should be 5 * (5 + 2) = 35 and is 35
This order may seem backwards at first. Shouldn’t TimesFiveCorrect.__init__ have run first? Shouldn’t the result be (5 * 5) + 2 = 27? The answer is no. This ordering matches what the MRO defines for this class. The MRO ordering is available on a class method called mro.
Click here to view code image
   from pprint import pprint
   pprint(GoodWay.mro())
   >>>
   [<class ‘__main__.GoodWay’>,
   <class ‘__main__.TimesFiveCorrect’>,
   <class ‘__main__.PlusTwoCorrect’>,
   <class ‘__main__.MyBaseClass’>,
￼￼￼
<class ‘object’>]
When I call GoodWay(5), it in turn calls TimesFiveCorrect.__init__, which calls PlusTwoCorrect.__init__, which calls MyBaseClass.__init__. Once this reaches the top of the diamond, then all of the initialization methods actually do their work in the opposite order from how their __init__ functions were called. MyBaseClass.__init__ assigns the value to 5. PlusTwoCorrect.__init__ adds 2 to make value equal 7. TimesFiveCorrect.__init__ multiplies it by 5 to make value equal 35.
The super built-in function works well, but it still has two noticeable problems in Python 2:
Its syntax is a bit verbose. You have to specify the class you’re in, the self object, the method name (usually __init__), and all the arguments. This construction can be confusing to new Python programmers.
You have to specify the current class by name in the call to super. If you ever change the class’s name—a very common activity when improving a class hierarchy —you also need to update every call to super.
Thankfully, Python 3 fixes these issues by making calls to super with no arguments equivalent to calling super with __class__ and self specified. In Python 3, you should always use super because it’s clear, concise, and always does the right thing.
Click here to view code image
   class Explicit(MyBaseClass):
       def __init__(self, value):
           super(__class__, self).__init__(value * 2)
   class Implicit(MyBaseClass):
       def __init__(self, value):
           super().__init__(value * 2)
   assert Explicit(10).value == Implicit(10).value
This works because Python 3 lets you reliably reference the current class in methods using the __class__ variable. This doesn’t work in Python 2 because __class__ isn’t defined. You may guess that you could use self.__class__ as an argument to super, but this breaks because of the way super is implemented in Python 2.
Things to Remember
Python’s standard method resolution order (MRO) solves the problems of superclass initialization order and diamond inheritance.
Always use the super built-in function to initialize parent classes.
```

### Item 26: Use Multiple Inheritance Only for Mix-in Utility Classes

Python is an object-oriented language with built-in facilities for making multiple inheritance tractable (see Item 25: “Initialize Parent Classes with super”). However, it’s better to avoid multiple inheritance altogether.

If you find yourself desiring the convenience and encapsulation that comes with multiple inheritance, consider writing a mix-in instead. A mix-in is a small class that only defines a set of additional methods that a class should provide. Mix-in classes don’t define their own instance attributes nor require their __init__ constructor to be called.

Writing mix-ins is easy because Python makes it trivial to inspect the current state of any object regardless of its type. Dynamic inspection lets you write generic functionality a single time, in a mix-in, that can be applied to many other classes. Mix-ins can be composed and layered to minimize repetitive code and maximize reuse.
For example, say you want the ability to convert a Python object from its in-memory representation to a dictionary that’s ready for serialization. Why not write this functionality generically so you can use it with all of your classes?

Here, I define an example mix-in that accomplishes this with a new public method that’s added to any class that inherits from it:

```
Click here to view code image
   class ToDictMixin(object):
       def to_dict(self):
           return self._traverse_dict(self.__dict__)
The implementation details are straightforward and rely on dynamic attribute access using hasattr, dynamic type inspection with isinstance, and accessing the instance dictionary __dict__.
Click here to view code image
     def _traverse_dict(self, instance_dict):
           output = {}
           for key, value in instance_dict.items():
               output[key] = self._traverse(key, value)
           return output
       def _traverse(self, key, value):
           if isinstance(value, ToDictMixin):
               return value.to_dict()
           elif isinstance(value, dict):
               return self._traverse_dict(value)
           elif isinstance(value, list):
               return [self._traverse(key, i) for i in value]
           elif hasattr(value, ‘__dict__’):
               return self._traverse_dict(value.__dict__)
           else:
return value
Here, I define an example class that uses the mix-in to make a dictionary representation of a binary tree:
￼￼￼￼￼
Click here to view code image
   class BinaryTree(ToDictMixin):
       def __init__(self, value, left=None, right=None):
           self.value = value
           self.left = left
           self.right = right
Translating a large number of related Python objects into a dictionary becomes easy.
Click here to view code image
   tree = BinaryTree(10,
       left=BinaryTree(7, right=BinaryTree(9)),
       right=BinaryTree(13, left=BinaryTree(11)))
   print(tree.to_dict())
   >>>
   {‘left’: {‘left’: None,
             ‘right’: {‘left’: None, ‘right’: None, ‘value’: 9},
             ‘value’: 7},
   ‘right’: {‘left’: {‘left’: None, ‘right’: None, ‘value’: 11},
              ‘right’: None,
              ‘value’: 13},
   ‘value’: 10}
The best part about mix-ins is that you can make their generic functionality pluggable so behaviors can be overridden when required. For example, here I define a subclass of BinaryTree that holds a reference to its parent. This circular reference would cause the default implementation of ToDictMixin.to_dict to loop forever.
Click here to view code image
   class BinaryTreeWithParent(BinaryTree):
       def __init__(self, value, left=None,
                    right=None, parent=None):
           super().__init__(value, left=left, right=right)
           self.parent = parent
The solution is to override the ToDictMixin._traverse method in the BinaryTreeWithParent class to only process values that matter, preventing cycles encountered by the mix-in. Here, I override the _traverse method to not traverse the parent and just insert its numerical value:
Click here to view code image
     def _traverse(self, key, value):
           if (isinstance(value, BinaryTreeWithParent) and
                   key == ‘parent’):
               return value.value  # Prevent cycles
           else:
               return super()._traverse(key, value)
Calling BinaryTreeWithParent.to_dict will work without issue because the circular referencing properties aren’t followed.
Click here to view code image
   root = BinaryTreeWithParent(10)
   root.left = BinaryTreeWithParent(7, parent=root)
   root.left.right = BinaryTreeWithParent(9, parent=root.left)
   print(root.to_dict())
￼￼￼￼￼
   >>>
   {‘left’: {‘left’: None,
             ‘parent’: 10,
             ‘right’: {‘left’: None,
                       ‘parent’: 7,
                       ‘right’: None,
                       ‘value’: 9},
             ‘value’: 7},
   ‘parent’: None,
   ‘right’: None,
   ‘value’: 10}
By defining BinaryTreeWithParent._traverse, I’ve also enabled any class that has an attribute of type BinaryTreeWithParent to automatically work with ToDictMixin.
Click here to view code image
   class NamedSubTree(ToDictMixin):
       def __init__(self, name, tree_with_parent):
           self.name = name
           self.tree_with_parent = tree_with_parent
   my_tree = NamedSubTree(‘foobar’, root.left.right)
   print(my_tree.to_dict())  # No infinite loop
   >>>
   {‘name’: ‘foobar’,
   ‘tree_with_parent’: {‘left’: None,
                        ‘parent’: 7,
                        ‘right’: None,
                        ‘value’: 9}}
Mix-ins can also be composed together. For example, say you want a mix-in that provides generic JSON serialization for any class. You can do this by assuming that a class provides a to_dict method (which may or may not be provided by the ToDictMixin class).
Click here to view code image
   class JsonMixin(object):
       @classmethod
       def from_json(cls, data):
           kwargs = json.loads(data)
           return cls(**kwargs)
       def to_json(self):
           return json.dumps(self.to_dict())
Note how the JsonMixin class defines both instance methods and class methods. Mix- ins let you add either kind of behavior. In this example, the only requirements of the JsonMixin are that the class has a to_dict method and its __init__ method takes keyword arguments (see Item 19: “Provide Optional Behavior with Keyword Arguments”).
This mix-in makes it simple to create hierarchies of utility classes that can be serialized to and from JSON with little boilerplate. For example, here I have a hierarchy of data classes representing parts of a datacenter topology:
￼￼￼￼￼
Click here to view code image
   class DatacenterRack(ToDictMixin, JsonMixin):
       def __init__(self, switch=None, machines=None):
           self.switch = Switch(**switch)
           self.machines = [
               Machine(**kwargs) for kwargs in machines]
   class Switch(ToDictMixin, JsonMixin):
#...
class Machine(ToDictMixin, JsonMixin): #...
Serializing these classes to and from JSON is simple. Here, I verify that the data is able to be sent round-trip through serializing and deserializing:
Click here to view code image
   serialized = ”””{
       “switch”: {“ports”: 5, “speed”: 1e9},
       “machines”: [
           {“cores”: 8, “ram”: 32e9, “disk”: 5e12},
           {“cores”: 4, “ram”: 16e9, “disk”: 1e12},
           {“cores”: 2, “ram”: 4e9, “disk”: 500e9}
] }”””
   deserialized = DatacenterRack.from_json(serialized)
   roundtrip = deserialized.to_json()
   assert json.loads(serialized) == json.loads(roundtrip)
When you use mix-ins like this, it’s also fine if the class already inherits from JsonMixin higher up in the object hierarchy. The resulting class will behave the same way.
Things to Remember
Avoid using multiple inheritance if mix-in classes can achieve the same outcome.
Use pluggable behaviors at the instance level to provide per-class customization when mix-in classes may require it.
Compose mix-ins to create complex functionality from simple behaviors.
```


### Item 27: Prefer Public Attributes Over Private Ones

Things to Remember
* Private attributes aren’t rigorously enforced by the Python compiler.
* Plan from the beginning to allow subclasses to do more with your internal APIs and attributes instead of locking them out by default.
* Use documentation of protected fields to guide subclasses instead of trying to force access control with private attributes.
* Only consider using private attributes to avoid naming conflicts with subclasses that are out of your control.

### Item 28: Inherit from collections.abc for Custom Container Types

Much of programming in Python is defining classes that contain data and describing how such objects relate to each other. Every Python class is a container of some kind, encapsulating attributes and functionality together. Python also provides built-in container types for managing data: lists, tuples, sets, and dictionaries.

When you’re designing classes for simple use cases like sequences, it’s natural that you’d want to subclass Python’s built-in list type directly. For example, say you want to create your own custom list type that has additional methods for counting the frequency of its members.

Things to Remember

* Inherit directly from Python’s container types (like list or dict) for simple use cases.
* Beware of the large number of methods required to implement custom container types correctly.
* Have your custom container types inherit from the interfaces defined in collections.abc to ensure that your classes match required interfaces and behaviors.

# Metaclasses and Attributes

Metaclasses are often mentioned in lists of Python’s features, but few understand what they accomplish in practice. The name metaclass vaguely implies a concept above and beyond a class. Simply put, metaclasses let you intercept Python’s class statement and provide special behavior each time a class is defined.

Similarly mysterious and powerful are Python’s built-in features for dynamically customizing attribute accesses. Along with Python’s object-oriented constructs, these facilities provide wonderful tools to ease the transition from simple classes to complex ones.

However, with these powers come many pitfalls. Dynamic attributes enable you to override objects and cause unexpected side effects. Metaclasses can create extremely bizarre behaviors that are unapproachable to newcomers. It’s important that you follow the rule of least surprise and only use these mechanisms to implement well-understood idioms.

### Item 29: Use Plain Attributes Instead of Get and Set Methods

Programmers coming to Python from other languages may naturally try to implement explicit getter and setter methods in their classes.
```  
  class OldResistor(object):
       def __init__(self, ohms):
           self._ohms = ohms
       def get_ohms(self):
           return self._ohms
       def set_ohms(self, ohms):
self._ohms = ohms
```
Using these setters and getters is simple, but it’s not Pythonic.

In Python, however, you almost never need to implement explicit setter or getter methods. Instead, you should always start your implementations with simple public attributes.
```
Click here to view code image
   class Resistor(object):
       def __init__(self, ohms):
           self.ohms = ohms
           self.voltage = 0
           self.current = 0
   r1 = Resistor(50e3)
   r1.ohms = 10e3
   ```
These make operations like incrementing in place natural and clear. r1.ohms += 5e3
Later, if you decide you need special behavior when an attribute is set, you can migrate to the @property decorator and its corresponding setter attribute. Here, I define a new subclass of Resistor that lets me vary the current by assigning the voltage property. Note that in order to work properly the name of both the setter and getter methods must match the intended property name.
```Click here to view code image
   class VoltageResistance(Resistor):
       def __init__(self, ohms):
           super().__init__(ohms)
           self._voltage = 0
       @property
       def voltage(self):
           return self._voltage
       @voltage.setter
       def voltage(self, voltage):
           self._voltage = voltage
           self.current = self._voltage / self.ohms
```
Now, assigning the voltage property will run the voltage setter method, updating the current property of the object to match.

The biggest shortcoming of @property is that the methods for an attribute can only be shared by subclasses. Unrelated classes can’t share the same implementation. However, Python also supports descriptors (see Item 31: “Use Descriptors for Reusable @property Methods”) that enable reusable property logic and many other use cases.

Finally, when you use @property methods to implement setters and getters, be sure that the behavior you implement is not surprising. For example, don’t set other attributes in getter property methods.

The best policy is to only modify related object state in @property.setter methods. Be sure to avoid any other side effects the caller may not expect beyond the object, such as importing modules dynamically, running slow helper functions, or making expensive database queries. Users of your class will expect its attributes to be like any other Python object: quick and easy. Use normal methods to do anything more complex or slow.
Things to Remember

Define new class interfaces using simple public attributes, and avoid set and get methods.
Use @property to define special behavior when attributes are accessed on your objects, if necessary.
Follow the rule of least surprise and avoid weird side effects in your @property methods.
Ensure that @property methods are fast; do slow or complex work using normal methods.



### Item 30: Consider @property Instead of Refactoring Attributes
The built-in @property decorator makes it easy for simple accesses of an instance’s attributes to act smarter (see Item 29: “Use Plain Attributes Instead of Get and Set Methods”). One advanced but common use of @property is transitioning what was once a simple numerical attribute into an on-the-fly calculation. This is extremely helpful because it lets you migrate all existing usage of a class to have new behaviors without rewriting any of the call sites. It also provides an important stopgap for improving your interfaces over time.

For example, say you want to implement a leaky bucket quota using plain Python objects. Here, the Bucket class represents how much quota remains and the duration for which the quota will be available:

The best part is that the code using Bucket.quota doesn’t have to change or know that the class has changed. New usage of Bucket can do the right thing and access max_quota and quota_consumed directly.
I especially like @property because it lets you make incremental progress toward a better data model over time. Reading the Bucket example above, you may have thought to yourself, “fill and deduct should have been implemented as instance methods in
￼￼
the first place.” Although you’re probably right (see Item 22: “Prefer Helper Classes Over Bookkeeping with Dictionaries and Tuples”), in practice there are many situations in which objects start with poorly defined interfaces or act as dumb data containers. This happens when code grows over time, scope increases, multiple authors contribute without anyone considering long-term hygiene, etc.

@property is a tool to help you address problems you’ll come across in real-world code. Don’t overuse it. When you find yourself repeatedly extending @property methods, it’s probably time to refactor your class instead of further paving over your code’s poor design.

Things to Remember
* Use @property to give existing instance attributes new functionality. 
* Make incremental progress toward better data models by using @property.
* Consider refactoring a class and all call sites when you find yourself using @property too heavily.

### Item 31: Use Descriptors for Reusable @property Methods

Things to Remember
* Reuse the behavior and validation of @property methods by defining your own descriptor classes.
* Use WeakKeyDictionary to ensure that your descriptor classes don’t cause memory leaks.
* Don’t get bogged down trying to understand exactly how __getattribute__ uses the descriptor protocol for getting and setting attributes.

### Item 32: Use __getattr__, __getattribute__, and __setattr__ for Lazy Attributes

Python’s language hooks make it easy to write generic code for gluing systems together. For example, say you want to represent the rows of your database as Python objects. Your database has its schema set. Your code that uses objects corresponding to those rows must also know what your database looks like. However, in Python, the code that connects your Python objects to the database doesn’t need to know the schema of your rows; it can be generic.

How is that possible? Plain instance attributes, @property methods, and descriptors can’t do this because they all need to be defined in advance. Python makes this dynamic behavior possible with the \__getattr\__ special method. If your class defines \__getattr\__, that method is called every time an attribute can’t be found in an object’s instance dictionary.

In [121]:
class LazyDB(object):
       def __init__(self):
           self.exists = 5 
       def __getattr__(self, name):
           value = 'Value for %s' % name
           setattr(self, name, value)
           return value

# Here, I access the missing property foo. This causes Python to call the __getattr__ method above, 
# which mutates the instance dictionary __dict__:

In [124]:
data = LazyDB()
print('Before:', data.__dict__)
print('foo:   ', data.foo)
print('After: ', data.__dict__)
print('val2 ', data.val2)
print('After: ', data.__dict__)

Before: {'exists': 5}
foo:    Value for foo
After:  {'exists': 5, 'foo': 'Value for foo'}
val2  Value for val2
After:  {'exists': 5, 'foo': 'Value for foo', 'val2': 'Value for val2'}


In [127]:
class LoggingLazyDB(LazyDB):
       def __getattr__(self, name):
           print('Called __getattr__(%s)' % name)
           return super().__getattr__(name)
data = LoggingLazyDB()
print('exists:', data.exists)
print('foo:   ', data.foo)
print('foo:   ', data.foo)

exists: 5
Called __getattr__(foo)
foo:    Value for foo
foo:    Value for foo


The exists attribute is present in the instance dictionary, so __getattr__ is never called for it. The foo attribute is not in the instance dictionary initially, so __getattr__ is called the first time. But the call to __getattr__ for foo also does a setattr, which populates foo in the instance dictionary. This is why the second time I access foo there isn’t a call to __getattr__.

This behavior is especially helpful for use cases like lazily accessing schemaless data. __getattr__ runs once to do the hard work of loading a property; all subsequent accesses retrieve the existing result.
Say you also want transactions in this database system. The next time the user accesses a property, you want to know whether the corresponding row in the database is still valid and whether the transaction is still open. The __getattr__ hook won’t let you do this reliably because it will use the object’s instance dictionary as the fast path for existing attributes.

To enable this use case, Python has another language hook called __getattribute__. This special method is called every time an attribute is accessed on an object, even in cases where it does exist in the attribute dictionary. This enables you to do things like
￼￼
check global transaction state on every property access. Here, I define ValidatingDB to log each time __getattribute__ is called:

In [130]:
class ValidatingDB(object):
       def __init__(self):
           self.exists = 5
       def __getattribute__(self, name):
           print('Called __getattribute__(%s)' % name)
           try:
               return super().__getattribute__(name)
           except AttributeError:
               value = 'Value for %s' % name
               setattr(self, name, value)
               return value

In [129]:
data = ValidatingDB()
print('exists:', data.exists)
print('foo:   ', data.foo)
print('foo:   ', data.foo)

Called __getattribute__(exists)
exists: 5
Called __getattribute__(foo)
foo:    Value for foo
Called __getattribute__(foo)
foo:    Value for foo


Things to Remember
* Use __getattr__ and __setattr__ to lazily load and save attributes for an object.
￼￼￼￼
* Understand that __getattr__ only gets called once when accessing a missing attribute, whereas __getattribute__ gets called every time an attribute is accessed.

* Avoid infinite recursion in __getattribute__ and __setattr__ by using methods from super() (i.e., the object class) to access instance attributes directly.

### Item 33: Validate Subclasses with Metaclasses
One of the simplest applications of metaclasses is verifying that a class was defined correctly. When you’re building a complex class hierarchy, you may want to enforce style, require overriding methods, or have strict relationships between class attributes. Metaclasses enable these use cases by providing a reliable way to run your validation code each time a new subclass is defined.

Often a class’s validation code runs in the __init__ method, when an object of the class’s type is constructed (see Item 28: “Inherit from collections.abc for Custom Container Types” for an example). Using metaclasses for validation can raise errors much earlier.

Before I get into how to define a metaclass for validating subclasses, it’s important to understand the metaclass action for standard objects. A metaclass is defined by inheriting from type. In the default case, a metaclass receives the contents of associated class statements in its __new__ method. Here, you can modify the class information before the type is actually constructed:
```
Click here to view code image
   class Meta(type):
       def __new__(meta, name, bases, class_dict):
           print((meta, name, bases, class_dict))
           return type.__new__(meta, name, bases, class_dict)
   class MyClass(object, metaclass=Meta):
       stuff = 123
       def foo(self):
           pass
```
The metaclass has access to the name of the class, the parent classes it inherits from, and all of the class attributes that were defined in the class’s body.
```
Click here to view code image
   >>>
   (<class ‘__main__.Meta’>,
    ‘MyClass’,
    (<class ‘object’>,),
    {‘__module__’: ‘__main__’,
     ‘__qualname__’: ‘MyClass’,
     ‘foo’: <function MyClass.foo at 0x102c7dd08>,
     ‘stuff’: 123})
Python 2 has slightly different syntax and specifies a metaclass using the __metaclass__ class attribute. The Meta.__new__ interface is the same.
￼￼￼￼￼￼￼￼￼
Click here to view code image
# Python 2
   class Meta(type):
       def __new__(meta, name, bases, class_dict):
#...
class MyClassInPython2(object): 
__metaclass__ = Meta
#...
```
You can add functionality to the Meta.__new__ method in order to validate all of the parameters of a class before it’s defined. For example, say you want to represent any type of multisided polygon. You can do this by defining a special validating metaclass and using it in the base class of your polygon class hierarchy. Note that it’s important not to apply the same validation to the base class.
```
Click here to view code image
   class ValidatePolygon(type):
       def __new__(meta, name, bases, class_dict):
           # Don’t validate the abstract Polygon class
           if bases != (object,):
               if class_dict[‘sides’] < 3:
                   raise ValueError(‘Polygons need 3+ sides’)
           return type.__new__(meta, name, bases, class_dict)
   class Polygon(object, metaclass=ValidatePolygon):
       sides = None  # Specified by subclasses
       @classmethod
       def interior_angles(cls):
           return (cls.sides - 2) * 180
   class Triangle(Polygon):
sides = 3

```
If you try to define a polygon with fewer than three sides, the validation will cause the class statement to fail immediately after the class statement body. This means your program will not even be able to start running when you define such a class.
```
Click here to view code image
   print(‘Before class’)
   class Line(Polygon):
       print(‘Before sides’)
       sides = 1
       print(‘After sides’)
   print(‘After class’)
   >>>
   Before class
   Before sides
   After sides
   Traceback ...
   ValueError: Polygons need 3+ sides
```
￼￼￼
Things to Remember
* Use metaclasses to ensure that subclasses are well formed at the time they are defined, before objects of their type are constructed.
* Metaclasses have slightly different syntax in Python 2 vs. Python 3.
* The __new__ method of metaclasses is run after the class statement’s entire body has been processed.


### Item 34: Register Class Existence with Metaclasses

Another common use of metaclasses is to automatically register types in your program. Registration is useful for doing reverse lookups, where you need to map a simple identifier back to a corresponding class.

For example, say you want to implement your own serialized representation of a Python object using JSON. You need a way to take an object and turn it into a JSON string. Here, I do this generically by defining a base class that records the constructor parameters and turns them into a JSON dictionary:


In [131]:
class Serializable(object):
       def __init__(self, *args):
           self.args = args
       def serialize(self):
            return json.dumps({'args': self.args})

In [132]:
class Point2D(Serializable):
       def __init__(self, x, y):
           super().__init__(x, y)
           self.x = x
           self.y = y
       def __repr__(self):
           return 'Point2D(%d, %d)' % (self.x, self.y)

In [133]:
point = Point2D(5, 3)
print('Object:    ', point)
print('Serialized:', point.serialize())

Object:     Point2D(5, 3)
Serialized: {"args": [5, 3]}


Using metaclasses for class registration ensures that you’ll never miss a class as long as the inheritance tree is right. This works well for serialization, as I’ve shown, and also applies to database object-relationship mappings (ORMs), plug-in systems, and system hooks.
Things to Remember
* Class registration is a helpful pattern for building modular Python programs.
* Metaclasses let you run registration code automatically each time your base class is subclassed in a program.
* Using metaclasses for class registration avoids errors by ensuring that you never miss a registration call.

 # Concurrency and Parallelism
 
 Concurrency is when a computer does many different things seemingly at the same time. For example, on a computer with one CPU core, the operating system will rapidly change which program is running on the single processor. This interleaves execution of the programs, providing the illusion that the programs are running simultaneously.
 
Parallelism is actually doing many different things at the same time. Computers with multiple CPU cores can execute multiple programs simultaneously. Each CPU core runs the instructions of a separate program, allowing each program to make forward progress during the same instant.
Within a single program, concurrency is a tool that makes it easier for programmers to solve certain types of problems. Concurrent programs enable many distinct paths of execution to make forward progress in a way that seems to be both simultaneous and independent.

The key difference between parallelism and concurrency is speedup. When two distinct paths of execution in a program make forward progress in parallel, the time it takes to do the total work is cut in half; the speed of execution is faster by a factor of two. In contrast, concurrent programs may run thousands of separate paths of execution seemingly in parallel but provide no speedup for the total work.

Python makes it easy to write concurrent programs. Python can also be used to do parallel work through system calls, subprocesses, and C-extensions. But it can be very difficult to make concurrent Python code truly run in parallel. It’s important to understand how to best utilize Python in these subtly different situations.

### Item 36: Use subprocess to Manage Child Processes

Python has battle-hardened libraries for running and managing child processes. This makes Python a great language for gluing other tools together, such as command-line utilities. When existing shell scripts get complicated, as they often do over time, graduating them to a rewrite in Python is a natural choice for the sake of readability and maintainability.

Child processes started by Python are able to run in parallel, enabling you to use Python to consume all of the CPU cores of your machine and maximize the throughput of your programs. Although Python itself may be CPU bound (see Item 37: “Use Threads for Blocking I/O, Avoid for Parallelism”), it’s easy to use Python to drive and coordinate CPU-intensive workloads.

Python has had many ways to run subprocesses over the years, including popen, popen2, and os.exec*. With the Python of today, the best and simplest choice for managing child processes is to use the subprocess built-in module.

Running a child process with subprocess is simple. Here, the Popen constructor starts the process. The communicate method reads the child process’s output and waits for termination.



In [154]:
import subprocess
proc = subprocess.Popen(['echo', 'Hello from the child!'],stdout=subprocess.PIPE)
out, err = proc.communicate()
print(out.decode('utf-8'))

Hello from the child!



Decoupling the child process from the parent means that the parent process is free to run many child processes in parallel. You can do this by starting all the child processes together upfront.

In [156]:
import time as t
def run_sleep(period):
       proc = subprocess.Popen(['sleep', str(period)])
       return proc
start = t.time()
procs = []
for _ in range(10):
    proc = run_sleep(0.1)
    procs.append(proc)
    
for proc in procs:
    proc.communicate()
end = t.time()
print('Finished in %.3f seconds' % (end - start))

Finished in 0.135 seconds


You can also pipe data from your Python program into a subprocess and retrieve its output. This allows you to utilize other programs to do work in parallel. For example, say you want to use the openssl command-line tool to encrypt some data. Starting the child process with command-line arguments and I/O pipes is easy.

In [157]:
def run_openssl(data):
    env = os.environ.copy()
    env['password'] = b'\xe24U\n\xd0Ql3S\x11'
    proc = subprocess.Popen(
        ['openssl', 'enc', '-des3', '-pass', 'env:password'],
        env=env,
        stdin=subprocess.PIPE,
        stdout=subprocess.PIPE)
    proc.stdin.write(data)
    proc.stdin.flush()  # Ensure the child gets input
    return proc

procs = []
import os
for _ in range(3):
    data = os.urandom(10)
    proc = run_openssl(data)
    procs.append(proc)

for proc in procs:
    out, err = proc.communicate()
    print(out[-10:])

b'\xe6hw\xe60\xf8\xfa\xca\xf6q'
b'(\xf70\xe2 \xe4\x89\xc8$ '
b'9B\x0b\xa9\x9a\xb6-\xad\xf4"'


In [168]:
def run_md5(input_stdin):
    proc = subprocess.Popen(
        ['md5'],
        stdin=input_stdin,
        stdout=subprocess.PIPE)
    return proc
input_procs = []
hash_procs = []
for _ in range(3):
    data = os.urandom(10)
    proc = run_openssl(data)
    input_procs.append(proc)
    hash_proc = run_md5(proc.stdout)
    hash_procs.append(hash_proc)

for proc in input_procs:
    proc.communicate()
for proc in hash_procs:
    out, err = proc.communicate()
    print(out.strip()) 

b'2581a5453ea15c28c3b7aee7067083c7'
b'd9663f1017a9fc7b9294ef5fde8c1699'
b'f83e17279f8d747a5cc667888431c1f8'


If you’re worried about the child processes never finishing or somehow blocking on input or output pipes, then be sure to pass the timeout parameter to the communicate method. This will cause an exception to be raised if the child process hasn’t responded within a time period, giving you a chance to terminate the misbehaving child.

In [169]:
proc = run_sleep(10)
try:
    proc.communicate(timeout=0.1)
except subprocess.TimeoutExpired:
    proc.terminate()
    proc.wait()
print('Exit status', proc.poll())

Exit status -15


Things to Remember
* Use the subprocess module to run child processes and manage their input and output streams.
* Child processes run in parallel with the Python interpreter, enabling you to maximize your CPU usage.
* Use the timeout parameter with communicate to avoid deadlocks and hanging child processes.

### Item 37: Use Threads for Blocking I/O, Avoid for Parallelism

The standard implementation of Python is called CPython. CPython runs a Python program in two steps. First, it parses and compiles the source text into bytecode. Then, it runs the bytecode using a stack-based interpreter. The bytecode interpreter has state that must be maintained and coherent while the Python program executes. Python enforces coherence with a mechanism called the global interpreter lock (GIL).

Essentially, the GIL is a mutual-exclusion lock (mutex) that prevents CPython from being affected by preemptive multithreading, where one thread takes control of a program by interrupting another thread. Such an interruption could corrupt the interpreter state if it comes at an unexpected time. The GIL prevents these interruptions and ensures that every bytecode instruction works correctly with the CPython implementation and its C- extension modules.

The GIL has an important negative side effect. With programs written in languages like C++ or Java, having multiple threads of execution means your program could utilize multiple CPU cores at the same time. Although Python supports multiple threads of execution, the GIL causes only one of them to make forward progress at a time. This means that when you reach for threads to do parallel computation and speed up your Python programs, you will be sorely disappointed.
For example, say you want to do something computationally intensive with Python. I’ll use a naive number factorization algorithm as a proxy.

In [172]:
import time as t
def factorize(number):
    for i in range(1, number + 1):
        if number % i == 0:
            yield i
            
numbers = [2139079, 1214759, 1516637, 1852285]
start = t.time()
for number in numbers:
    list(factorize(number))
end = t.time()
print('Took %.3f seconds' % (end - start))

Took 1.277 seconds


Using multiple threads to do this computation would make sense in other languages because you could take advantage of all of the CPU cores of your computer. Let me try that in Python. Here, I define a Python thread for doing the same computation as before:

In [174]:
from threading import Thread

class FactorizeThread(Thread):
    def __init__(self, number):
        super().__init__()
        self.number = number
    def run(self):
        self.factors = list(factorize(self.number))

start = t.time()
threads = []
for number in numbers:
    thread = FactorizeThread(number)
    thread.start()
    threads.append(thread)

for thread in threads:
    thread.join()
end = t.time()
print('Took %.3f seconds' % (end - start))

Took 1.445 seconds


What’s surprising is that this takes even longer than running factorize in serial. With one thread per number, you may expect less than a 4× speedup in other languages due to the overhead of creating threads and coordinating with them. You may expect only a 2× speedup on the dual-core machine I used to run this code. But you would never expect the performance of these threads to be worse when you have multiple CPUs to utilize. This demonstrates the effect of the GIL on programs running in the standard CPython interpreter.

There are ways to get CPython to utilize multiple cores, but it doesn’t work with the standard Thread class (see Item 41: “Consider concurrent.futures for True Parallelism”) and it can require substantial effort. Knowing these limitations you may wonder, why does Python support threads at all? There are two good reasons.

First, multiple threads make it easy for your program to seem like it’s doing multiple things at the same time. Managing the juggling act of simultaneous tasks is difficult to
￼￼￼￼￼￼￼￼
implement yourself (see Item 40: “Consider Coroutines to Run Many Functions Concurrently” for an example). With threads, you can leave it to Python to run your functions seemingly in parallel. This works because CPython ensures a level of fairness between Python threads of execution, even though only one of them makes forward progress at a time due to the GIL.

The second reason Python supports threads is to deal with blocking I/O, which happens when Python does certain types of system calls. System calls are how your Python program asks your computer’s operating system to interact with the external environment on your behalf. Blocking I/O includes things like reading and writing files, interacting with networks, communicating with devices like displays, etc. Threads help you handle blocking I/O by insulating your program from the time it takes for the operating system to respond to your requests.

For example, say you want to send a signal to a remote-controlled helicopter through a serial port. I’ll use a slow system call (select) as a proxy for this activity. This function asks the operating system to block for 0.1 second and then return control to my program, similar to what would happen when using a synchronous serial port.

In [176]:
import select
def slow_systemcall():
    select.select([], [], [], 0.1)

start = t.time()
for _ in range(5):
    slow_systemcall()
end = t.time()
print('Took %.3f seconds' % (end - start))

Took 0.522 seconds


The problem is that while the slow_systemcall function is running, my program can’t make any other progress. My program’s main thread of execution is blocked on the select system call. This situation is awful in practice. You need to be able to compute your helicopter’s next move while you’re sending it a signal, otherwise it’ll crash. When you find yourself needing to do blocking I/O and computation simultaneously, it’s time to consider moving your system calls to threads.

Here, I run multiple invocations of the slow_systemcall function in separate threads. This would allow you to communicate with multiple serial ports (and helicopters) at the same time, while leaving the main thread to do whatever computation is required.

In [184]:
start = t.time()
threads = []
for _ in range(5):
    thread = Thread(target=slow_systemcall)
    thread.start()
    threads.append(thread)

def compute_helicopter_location(index): #...
    pass
for i in range(5):
    compute_helicopter_location(i)
for thread in threads:
    thread.join()
end = t.time()
print('Took %.3f seconds' % (end - start))

Took 0.104 seconds


The parallel time is 5× less than the serial time. This shows that the system calls will all run in parallel from multiple Python threads even though they’re limited by the GIL. The GIL prevents my Python code from running in parallel, but it has no negative effect on system calls. This works because Python threads release the GIL just before they make system calls and reacquire the GIL as soon as the system calls are done.

There are many other ways to deal with blocking I/O besides threads, such as the asyncio built-in module, and these alternatives have important benefits. But these options also require extra work in refactoring your code to fit a different model of execution (see Item 40: “Consider Coroutines to Run Many Functions Concurrently”). Using threads is the simplest way to do blocking I/O in parallel with minimal changes to your program.

Things to Remember
* Python threads can’t run bytecode in parallel on multiple CPU cores because of the global interpreter lock (GIL).
* Python threads are still useful despite the GIL because they provide an easy way to do multiple things at seemingly the same time.
* Use Python threads to make multiple system calls in parallel. This allows you to do blocking I/O at the same time as computation.

### Item 38: Use Lock to Prevent Data Races in Threads

After learning about the global interpreter lock (GIL) (see Item 37: “Use Threads for Blocking I/O, Avoid for Parallelism”), many new Python programmers assume they can forgo using mutual-exclusion locks (mutexes) in their code altogether. If the GIL is already preventing Python threads from running on multiple CPU cores in parallel, it must also act as a lock for a program’s data structures, right? Some testing on types like lists and dictionaries may even show that this assumption appears to hold.

But beware, this is truly not the case. The GIL will not protect you. Although only one
￼￼￼￼￼￼￼￼￼
Python thread runs at a time, a thread’s operations on data structures can be interrupted between any two bytecode instructions in the Python interpreter. This is dangerous if you access the same objects from multiple threads simultaneously. The invariants of your data structures could be violated at practically any time because of these interruptions, leaving your program in a corrupted state.

For example, say you want to write a program that counts many things in parallel, like sampling light levels from a whole network of sensors. If you want to determine the total number of light samples over time, you can aggregate them with a new class.



In [186]:
class Counter(object):
    def __init__(self):
        self.count = 0
    def increment(self, offset):
        self.count += offset

def worker(sensor_index, how_many, counter):
    for _ in range(how_many):
        # Read from the sensor #... 
        counter.increment(1)

def run_threads(func, how_many, counter):
    threads = []
    for i in range(5):
        args = (i, how_many, counter)
        thread = Thread(target=func, args=args)
        threads.append(thread)
        thread.start()
    for thread in threads:
        thread.join()

how_many = 10**5
counter = Counter()
run_threads(worker, how_many, counter)
print('Counter should be %d, found %d' %(5 * how_many, counter.count))

Counter should be 500000, found 359522


But this result is way off! What happened here? How could something so simple go so wrong, especially since only one Python interpreter thread can run at a time?

￼￼￼￼
The Python interpreter enforces fairness between all of the threads that are executing to ensure they get a roughly equal amount of processing time. To do this, Python will suspend a thread as it’s running and will resume another thread in turn. The problem is that you don’t know exactly when Python will suspend your threads. A thread can even be paused seemingly halfway through what looks like an atomic operation. That’s what happened in this case.

The Counter object’s increment method looks simple. 
```
counter.count += offset
```
But the += operator used on an object attribute actually instructs Python to do three separate operations behind the scenes. The statement above is equivalent to this:

```
value = getattr(counter, ‘count’)
result = value + offset
setattr(counter, ‘count’, result)
```

To prevent data races like these and other forms of data structure corruption, Python includes a robust set of tools in the threading built-in module. The simplest and most useful of them is the Lock class, a mutual-exclusion lock (mutex).

By using a lock, I can have the Counter class protect its current value against simultaneous access from multiple threads. Only one thread will be able to acquire the lock at a time. Here, I use a with statement to acquire and release the lock; this makes it easier to see which code is executing while the lock is held (see Item 43: “Consider contextlib and with Statements for Reusable try/finally Behavior” for details):

In [193]:
from threading import Lock
class LockingCounter(object):
    def __init__(self):
        self.lock = Lock()
        self.count = 0
    def increment(self, offset):
        with self.lock:
            self.count += offset


counter = LockingCounter()
run_threads(worker, how_many, counter)
print('Counter should be %d, found %d' %
      (5 * how_many, counter.count))

Counter should be 500000, found 500000


* Even though Python has a global interpreter lock, you’re still responsible for protecting against data races between the threads in your programs.
* Your programs will corrupt their data structures if you allow multiple threads to modify the same objects without locks.
* The Lock class in the threading built-in module is Python’s standard mutual exclusion lock implementation.

### Item 39: Use Queue to Coordinate Work Between Threads

Python programs that do many things concurrently often need to coordinate their work.
One of the most useful arrangements for concurrent work is a pipeline of functions.

A pipeline works like an assembly line used in manufacturing. Pipelines have many phases in serial with a specific function for each phase. New pieces of work are constantly added to the beginning of the pipeline. Each function can operate concurrently on the piece of work in its phase. The work moves forward as each function completes until there are no phases remaining. This approach is especially good for work that includes blocking I/O or subprocesses—activities that can easily be parallelized using Python (see Item 37: “Use Threads for Blocking I/O, Avoid for Parallelism”).

For example, say you want to build a system that will take a constant stream of images from your digital camera, resize them, and then add them to a photo gallery online. Such a program could be split into three phases of a pipeline. New images are retrieved in the first phase. The downloaded images are passed through the resize function in the second phase. The resized images are consumed by the upload function in the final phase.

Imagine you had already written Python functions that execute the phases: download, resize, upload. How do you assemble a pipeline to do the work concurrently?

The first thing you need is a way to hand off work between the pipeline phases. This can be modeled as a thread-safe producer-consumer queue (see Item 38: “Use Lock to Prevent Data Races in Threads” to understand the importance of thread safety in Python; see Item 46: “Use Built-in Algorithms and Data Structures” for the deque class).


The producer, your digital camera, adds new images to the end of the list of pending items.
The consumer, the first phase of your processing pipeline, removes images from the front of the list of pending items.

Here, I represent each phase of the pipeline as a Python thread that takes work from one queue like this, runs a function on it, and puts the result on another queue. I also track how many times the worker has checked for new input and how much work it’s completed.


In [202]:
from collections import deque
class MyQueue(object):
    def __init__(self):
        self.items = deque()
        self.lock = Lock()

    def put(self, item):
        with self.lock:
            self.items.append(item)

    def get(self):
        with self.lock:
            return self.items.popleft()

In [205]:
class Worker(Thread):
    def __init__(self, func, in_queue, out_queue):
        super().__init__()
        self.func = func
        self.in_queue = in_queue
        self.out_queue = out_queue
        self.polled_count = 0
        self.work_done = 0

    def run(self):
        while True:
            self.polled_count += 1
            try:
                item = self.in_queue.get()
            except IndexError:
                sleep(0.01)  # No work to do
            else:
                result = self.func(item)
                self.out_queue.put(result)
                self.work_done += 1

def download():
    pass
def resize():
    pass
def upload():
    pass

download_queue = MyQueue()
resize_queue = MyQueue()
upload_queue = MyQueue()
done_queue = MyQueue()
threads = [
    Worker(download, download_queue, resize_queue),
    Worker(resize, resize_queue, upload_queue),
    Worker(upload, upload_queue, done_queue),
]

When the worker functions vary in speeds, an earlier phase can prevent progress in later phases, backing up the pipeline. This causes later phases to starve and constantly check their input queues for new work in a tight loop. The outcome is that worker threads waste CPU time doing nothing useful (they’re constantly raising and catching IndexError exceptions).
But that’s just the beginning of what’s wrong with this implementation. There are three more problems that you should also avoid. First, determining that all of the input work is complete requires yet another busy wait on the done_queue. Second, in Worker the run method will execute forever in its busy loop. There’s no way to signal to a worker thread that it’s time to exit.

Third, and worst of all, a backup in the pipeline can cause the program to crash arbitrarily. If the first phase makes rapid progress but the second phase makes slow progress, then the queue connecting the first phase to the second phase will constantly increase in size. The second phase won’t be able to keep up. Given enough time and input data, the program

When the worker functions vary in speeds, an earlier phase can prevent progress in later phases, backing up the pipeline. This causes later phases to starve and constantly check their input queues for new work in a tight loop. The outcome is that worker threads waste CPU time doing nothing useful (they’re constantly raising and catching IndexError exceptions).
But that’s just the beginning of what’s wrong with this implementation. There are three more problems that you should also avoid. First, determining that all of the input work is complete requires yet another busy wait on the done_queue. Second, in Worker the run method will execute forever in its busy loop. There’s no way to signal to a worker thread that it’s time to exit.

Third, and worst of all, a backup in the pipeline can cause the program to crash arbitrarily. If the first phase makes rapid progress but the second phase makes slow progress, then the queue connecting the first phase to the second phase will constantly increase in size. The second phase won’t be able to keep up. Given enough time and input data, the program
￼￼￼
will eventually run out of memory and die.
The lesson here isn’t that pipelines are bad; it’s that it’s hard to build a good producer- consumer queue yourself.

#### Queue to the Rescue
The Queue class from the queue built-in module provides all of the functionality you
need to solve these problems.
Queue eliminates the busy waiting in the worker by making the get method block until new data is available. For example, here I start a thread that waits for some input data on a queue:



In [207]:
from queue import Queue
queue = Queue()
def consumer():
 	print('Consumer waiting')
 	queue.get()
 	print('Consumer done')

# Runs after put() below
thread = Thread(target=consumer)
thread.start()

print('Producer putting')
queue.put(object())
thread.join()
print('Producer done')

Consumer waiting
Producer putting
Consumer done
Producer done


* Pipelines are a great way to organize sequences of work that run concurrently using multiple Python threads.
* Be aware of the many problems in building concurrent pipelines: busy waiting, stopping workers, and memory explosion.
* The Queue class has all of the facilities you need to build robust pipelines: blocking operations, buffer sizes, and joining.

### Item 40: Consider Coroutines to Run Many Functions Concurrently

Threads give Python programmers a way to run multiple functions seemingly at the same time (see Item 37: “Use Threads for Blocking I/O, Avoid for Parallelism”). But there are three big problems with threads:

* They require special tools to coordinate with each other safely (see Item 38: “Use Lock to Prevent Data Races in Threads” and Item 39: “Use Queue to Coordinate Work Between Threads”). 

* This makes code that uses threads harder to reason about than procedural, single-threaded code. This complexity makes threaded code more difficult to extend and maintain over time.

* Threads require a lot of memory, about 8 MB per executing thread. On many computers, that amount of memory doesn’t matter for a dozen threads or so. But what if you want your program to run tens of thousands of functions “simultaneously”? These functions may correspond to user requests to a server, pixels on a screen, particles in a simulation, etc. Running a thread per unique activity just won’t work.

* Threads are costly to start. If you want to constantly be creating new concurrent functions and finishing them, the overhead of using threads becomes large and slows
￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼
everything down.

Python can work around all these issues with coroutines. Coroutines let you have many seemingly simultaneous functions in your Python programs. They’re implemented as an extension to generators (see Item 16: “Consider Generators Instead of Returning Lists”). The cost of starting a generator coroutine is a function call. Once active, they each use less than 1 KB of memory until they’re exhausted.

Coroutines work by enabling the code consuming a generator to send a value back into the generator function after each yield expression. The generator function receives the value passed to the send function as the result of the corresponding yield expression.

In [209]:
def my_coroutine():
    while True:
        received = yield
        print('Received:', received)

it = my_coroutine()
next(it) # Prime the coroutine
it.send('First')
it.send('Second')


Received: First
Received: Second


The initial call to next is required to prepare the generator for receiving the first send by advancing it to the first yield expression. Together, yield and send provide generators with a standard way to vary their next yielded value in response to external input.

For example, say you want to implement a generator coroutine that yields the minimum value it’s been sent so far. Here, the bare yield prepares the coroutine with the initial minimum value sent in from the outside. Then the generator repeatedly yields the new minimum in exchange for the next value to consider.

In [221]:
def minimize():
    current = yield
    while True:
        value = yield current
        current = min(value, current)
        
it = minimize()
next(it)
print(it.send(10))
print(it.send(4))
print(it.send(22))
print(it.send(-1))

10
4
4
-1


Things to Remember
* Coroutines provide an efficient way to run tens of thousands of functions seemingly at the same time.
* Within a generator, the value of the yield expression will be whatever value was passed to the generator’s send method from the exterior code.
* Coroutines give you a powerful tool for separating the core logic of your program from its interaction with the surrounding environment.
* Python 2 doesn’t support yield from or returning values from generators.

### Item 41: Consider concurrent.futures for True Parallelism

At some point in writing Python programs, you may hit the performance wall. Even after optimizing your code (see Item 58: “Profile Before Optimizing”), your program’s execution may still be too slow for your needs. On modern computers that have an increasing number of CPU cores, it’s reasonable to assume that one solution would be parallelism. What if you could split your code’s computation into independent pieces of work that run simultaneously across multiple CPU cores?
Unfortunately, Python’s global interpreter lock (GIL) prevents true parallelism in threads (see Item 37: “Use Threads for Blocking I/O, Avoid for Parallelism”), so that option is out. Another common suggestion is to rewrite your most performance-critical code as an extension module using the C language. C gets you closer to the bare metal and can run faster than Python, eliminating the need for parallelism. C-extensions can also start native threads that run in parallel and utilize multiple CPU cores. Python’s API for C-extensions is well documented and a good choice for an escape hatch.

But rewriting your code in C has a high cost. Code that is short and understandable in Python can become verbose and complicated in C. Such a port requires extensive testing to ensure that the functionality is equivalent to the original Python code and that no bugs have been introduced. Sometimes it’s worth it, which explains the large ecosystem of C- extension modules in the Python community that speed up things like text parsing, image compositing, and matrix math. There are even open source tools such as Cython (http://cython.org/) and Numba (http://numba.pydata.org/) that can ease the transition to C.

The problem is that moving one piece of your program to C isn’t sufficient most of the time. Optimized Python programs usually don’t have one major source of slowness, but rather, there are often many significant contributors. To get the benefits of C’s bare metal and threads, you’d need to port large parts of your program, drastically increasing testing needs and risk. There must be a better way to preserve your investment in Python to solve difficult computational problems.

The multiprocessing built-in module, easily accessed via the concurrent.futures built-in module, may be exactly what you need. It enables Python to utilize multiple CPU cores in parallel by running additional interpreters as child processes. These child processes are separate from the main interpreter, so their global
￼￼￼￼￼￼￼￼￼￼
interpreter locks are also separate. Each child can fully utilize one CPU core. Each child has a link to the main process where it receives instructions to do computation and returns results.
For example, say you want to do something computationally intensive with Python and utilize multiple CPU cores. I’ll use an implementation of finding the greatest common divisor of two numbers as a proxy for a more computationally intense algorithm, like simulating fluid dynamics with the Navier-Stokes equation.

In [222]:
def gcd(pair):
       a, b = pair
       low = min(a, b)
       for i in range(low, 0, -1):
           if a % i == 0 and b % i == 0:
               return i

In [225]:
gcd((10,3))

1

In [230]:
import time as t
numbers = [(1963309, 2265973), (2030677, 3814172),
           (1551645, 2229620), (2039045, 2020802)]
start = t.time()
results = list(map(gcd, numbers))
end = t.time()
print('Took %.3f seconds' % (end - start))

Took 1.402 seconds


Running this code on multiple Python threads will yield no speed improvement because the GIL prevents Python from using multiple CPU cores in parallel. Here, I do the same computation as above using the concurrent.futures module with its ThreadPoolExecutor class and two worker threads (to match the number of CPU cores on my computer):

In [233]:
from concurrent.futures import ThreadPoolExecutor
from concurrent.futures import ProcessPoolExecutor
start = t.time()
pool = ThreadPoolExecutor(max_workers=2)
results = list(pool.map(gcd, numbers))
end = t.time()
print('Took %.3f seconds' % (end - start))

Took 1.488 seconds


Now for the surprising part: By changing a single line of code, something magical happens. If I replace the ThreadPoolExecutor with the ProcessPoolExecutor from the concurrent.futures module, everything speeds up.

In [238]:
from concurrent.futures import ThreadPoolExecutor
from concurrent.futures import ProcessPoolExecutor
start = t.time()
pool = ProcessPoolExecutor(max_workers=2)
results = list(pool.map(gcd, numbers))
end = t.time()
print('Took %.3f seconds' % (end - start))

Took 0.881 seconds


Running on my dual-core machine, it’s significantly faster! How is this possible? Here’s what the ProcessPoolExecutor class actually does (via the low-level constructs provided by the multiprocessing module):
1. It takes each item from the numbers input data to map.
2. It serializes it into binary data using the pickle module (see Item 44: “Makepickle Reliable with copyreg”).
3. It copies the serialized data from the main interpreter process to a child interpreterprocess over a local socket.
4. Next, it deserializes the data back into Python objects using pickle in the child process.
5. It then imports the Python module containing the gcd function.
6. It runs the function on the input data in parallel with other child processes. 7. It serializes the result back into bytes.
8. It copies those bytes back through the socket.
9. It deserializes the bytes back into Python objects in the parent process.
10. Finally, it merges the results from multiple children into a single list to return.

Although it looks simple to the programmer, the multiprocessing module and ProcessPoolExecutor class do a huge amount of work to make parallelism possible. In most other languages, the only touch point you need to coordinate two threads is a single lock or atomic operation. The overhead of using multiprocessing is high because of all of the serialization and deserialization that must happen between the parent and child processes.
This scheme is well suited to certain types of isolated, high-leverage tasks. By isolated, I mean functions that don’t need to share state with other parts of the program. By high- leverage, I mean situations in which only a small amount of data must be transferred between the parent and child processes to enable a large amount of computation. The greatest common denominator algorithm is one example of this, but many other mathematical algorithms work similarly.

If your computation doesn’t have these characteristics, then the overhead of multiprocessing may prevent it from speeding up your program through parallelization. When that happens, multiprocessing provides more advanced facilities for shared memory, cross-process locks, queues, and proxies. But all of these features are very complex. It’s hard enough to reason about such tools in the memory space of a single process shared between Python threads. Extending that complexity to

￼￼￼￼￼
other processes and involving sockets makes this much more difficult to understand.
I suggest avoiding all parts of multiprocessing and using these features via the simpler concurrent.futures module. You can start by using the ThreadPoolExecutor class to run isolated, high-leverage functions in threads. Later, you can move to the ProcessPoolExecutor to get a speedup. Finally, once you’ve completely exhausted the other options, you can consider using the multiprocessing module directly.

* Moving CPU bottlenecks to C-extension modules can be an effective way to improve performance while maximizing your investment in Python code. However, the cost of doing so is high and may introduce bugs.
* The multiprocessing module provides powerful tools that can parallelize certain types of Python computation with minimal effort.
* The power of multiprocessing is best accessed through the concurrent.futures built-in module and its simple ProcessPoolExecutor class.
* The advanced parts of the multiprocessing module should be avoided because they are so complex.

# Built-in Modules

Python takes a “batteries included” approach to the standard library. Many other languages ship with a small number of common packages and require you to look elsewhere for important functionality. Although Python also has an impressive repository of community- built modules, it strives to provide, in its default installation, the most important modules for common uses of the language.

The full set of standard modules is too large to cover in this book. But some of these built- in packages are so closely intertwined with idiomatic Python that they may as well be part of the language specification. These essential built-in modules are especially important when writing the intricate, error-prone parts of programs.

### Item 42: Define Function Decorators with functools.wraps
* Decorators are Python syntax for allowing one function to modify another function at runtime.
* Using decorators can cause strange behaviors in tools that do introspection, such as debuggers.
* Use the wraps decorator from the functools built-in module when you define your own decorators to avoid any issues.

In [243]:
def trace(func):
    def wrapper(*args, **kwargs):
        result = func(*args, **kwargs)
        print('%s(%r, %r) -> %r' %
              (func.__name__, args, kwargs, result))
        return result
    return wrapper

@trace
def fibonacci(n):
    if n in (0, 1):
        return n
    return (fibonacci(n - 2) + fibonacci(n - 1))

In [244]:
fibonacci(3)

fibonacci((1,), {}) -> 1
fibonacci((0,), {}) -> 0
fibonacci((1,), {}) -> 1
fibonacci((2,), {}) -> 1
fibonacci((3,), {}) -> 2


2

In [245]:
help(fibonacci)

Help on function wrapper in module __main__:

wrapper(*args, **kwargs)



In [246]:
from functools import wraps
def trace(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        result = func(*args, **kwargs)
        print('%s(%r, %r) -> %r' %
              (func.__name__, args, kwargs, result))
        return result
    return wrapper

@trace
def fibonacci(n):
    if n in (0, 1):
        return n
    return (fibonacci(n - 2) + fibonacci(n - 1))

In [247]:
fibonacci(3)

fibonacci((1,), {}) -> 1
fibonacci((0,), {}) -> 0
fibonacci((1,), {}) -> 1
fibonacci((2,), {}) -> 1
fibonacci((3,), {}) -> 2


2

In [248]:
help(fibonacci)

Help on function fibonacci in module __main__:

fibonacci(n)



### Item 43: Consider contextlib and with Statements for Reusable try/finally Behavior

The with statement in Python is used to indicate when code is running in a special context. For example, mutual exclusion locks (see Item 38: “Use Lock to Prevent Data Races in Threads”) can be used in with statements to indicate that the indented code only runs while the lock is held.

```
lock = Lock()
with lock:
print('Lock is held')

The example above is equivalent to this try/finally construction because the Lock
class properly enables the with statement.

=======
lock.acquire()
try:
    print('Lock is held')
finally:
    lock.release()
    ```

The with statement version of this is better because it eliminates the need to write the repetitive code of the try/finally construction. It’s easy to make your objects and functions capable of use in with statements by using the contextlib built-in module. 

This module contains the contextmanager decorator, which lets a simple function be used in with statements. This is much easier than defining a new class with the special methods __enter__ and __exit__ (the standard way).

For example, say you want a region of your code to have more debug logging sometimes. Here, I define a function that does logging at two severity levels:

* The with statement allows you to reuse logic from try/finally blocks and reduce visual noise.
￼￼￼￼￼
* The contextlib built-in module provides a contextmanager decorator that makes it easy to use your own functions in with statements.
* The value yielded by context managers is supplied to the as part of the with statement. It’s useful for letting your code directly access the cause of the special context.

In [251]:
from contextlib import contextmanager
@contextmanager
def log_level(level, name):
       logger = logging.getLogger(name)
       old_level = logger.getEffectiveLevel()
       logger.setLevel(level)
       try:
           yield logger
       finally:
           logger.setLevel(old_level)
            
import logging
with log_level(logging.DEBUG, 'my-log') as logger:
    logger.debug('This is my message!')
    logging.debug('This will not print')

The yield expression is the point at which the with block’s contents will execute. Any exceptions that happen in the with block will be re-raised by the yield expression for you to catch in the helper function (see Item 40: “Consider Coroutines to Run Many Functions Concurrently” for an explanation of how that works).

Now, I can call the same logging function again, but in the debug_logging context. This time, all of the debug messages are printed to the screen during the with block. The same function running outside the with block won’t print debug messages.

### Item 44: Make pickle Reliable with copyreg

The pickle built-in module can serialize Python objects into a stream of bytes and deserialize bytes back into objects. Pickled byte streams shouldn’t be used to communicate between untrusted parties. The purpose of pickle is to let you pass Python objects between programs that you control over binary channels.

The pickle module’s serialization format is unsafe by design. The serialized data contains what is essentially a program that describes how to reconstruct the original Python object. This means a malicious pickle payload could be used to compromise any part of the Python program that attempts to deserialize it.

In contrast, the json module is safe by design. Serialized JSON data contains a simple description of an object hierarchy. Deserializing JSON data does not expose a Python program to any additional risk. Formats like JSON should be used for communication between programs or people that don’t trust each other.

In [257]:
import pickle
class GameState(object):
    def __init__(self):
        self.level = 0
        self.lives = 4

state = GameState()
state.level += 1  # Player beat a level
state.lives -= 1  # Player had to try again

# When the user quits playing, the program can save the state of the game to a file so it 
# can be resumed at a later time. The pickle module makes it easy to do this. Here, 
# I dump the GameState object directly to a file:
state_path = '/tmp/game_state.bin'
with open(state_path, 'wb') as f:
    pickle.dump(state, f)

#Later, I can load the file and get back the GameState object as if it had never been
# serialized.

with open(state_path, 'rb') as f:
       state_after = pickle.load(f)
       print(state_after.__dict__)

{'level': 1, 'lives': 3}


The problem with this approach is what happens as the game’s features expand over time. Imagine you want the player to earn points towards a high score. To track the player’s points, you’d add a new field to the GameState class.

In [258]:
class GameState(object):
    def __init__(self):
        self.level = 0
        self.lives = 4
        self.points = 0

state = GameState()
serialized = pickle.dumps(state)
state_after = pickle.loads(serialized)
print(state_after.__dict__)
    
with open(state_path, 'rb') as f:
       state_after = pickle.load(f)
       print(state_after.__dict__)


{'level': 0, 'points': 0, 'lives': 4}
{'level': 1, 'lives': 3}


The points attribute is missing! This is especially confusing because the returned object is an instance of the new GameState class.

This behavior is a byproduct of the way the pickle module works. Its primary use case is making it easy to serialize objects. As soon as your use of pickle expands beyond trivial usage, the module’s functionality starts to break down in surprising ways.

Fixing these problems is straightforward using the copyreg built-in module. The copyreg module lets you register the functions responsible for serializing Python objects, allowing you to control the behavior of pickle and make it more reliable.


In [259]:
import copyreg
class GameState(object):
    def __init__(self, level=0, lives=4, points=0):
        self.level = level
        self.lives = lives
        self.points = points

def pickle_game_state(game_state):
    kwargs = game_state.__dict__
    return unpickle_game_state, (kwargs,)

def unpickle_game_state(kwargs):
    return GameState(**kwargs)

copyreg.pickle(GameState, pickle_game_state)


state = GameState()
state.points += 1000
serialized = pickle.dumps(state)
state_after = pickle.loads(serialized)
print(state_after.__dict__)

{'level': 0, 'points': 1000, 'lives': 4}



* The pickle built-in module is only useful for serializing and deserializing objects between trusted programs.
* The pickle module may break down when used for more than trivial use cases.
* Use the copyreg built-in module with pickle to add missing attribute values, allow versioning of classes, and provide stable import paths.

### Item 45: Use datetime Instead of time for Local Clocks

Coordinated Universal Time (UTC) is the standard, time-zone-independent representation of time. UTC works great for computers that represent time as seconds since the UNIX epoch. But UTC isn’t ideal for humans. Humans reference time relative to where they’re currently located. People say “noon” or “8 am” instead of “UTC 15:00 minus 7 hours.” If your program handles time, you’ll probably find yourself converting time between UTC and local clocks to make it easier for humans to understand.

Python provides two ways of accomplishing time zone conversions. The old way, using the time built-in module, is disastrously error prone. The new way, using the datetime built-in module, works great with some help from the community-built package named pytz.
You should be acquainted with both time and datetime to thoroughly understand why datetime is the best choice and time should be avoided.

The problem here is the platform-dependent nature of the time module. Its actual behavior is determined by how the underlying C functions work with the host operating system. This makes the functionality of the time module unreliable in Python. The time module fails to consistently work properly for multiple local times. Thus, you should avoid the time module for this purpose. If you must use time, only use it to convert between UTC and the host computer’s local time. For all other types of conversions, use the datetime module.

Unlike the time module, the datetime module has facilities for reliably converting from one local time to another local time. However, datetime only provides the machinery for time zone operations with its tzinfo class and related methods. What’s missing are the time zone definitions besides UTC.
Luckily, the Python community has addressed this gap with the pytz module that’s available for download from the Python Package Index (https://pypi.python.org/pypi/pytz/). pytz contains a full database of every time zone definition you might need.
To use pytz effectively, you should always convert local times to UTC first. Perform any
￼￼￼
datetime operations you need on the UTC values (such as offsetting). Then, convert to local times as a final step.

* Avoid using the time module for translating between different time zones.
* Use the datetime built-in module along with the pytz module to reliably convert between times in different time zones.
* Always represent time in UTC and do conversions to local time as the final step before presentation.

### Item 46: Use Built-in Algorithms and Data Structures

When you’re implementing Python programs that handle a non-trivial amount of data, you’ll eventually see slowdowns caused by the algorithmic complexity of your code. This usually isn’t the result of Python’s speed as a language (see Item 41: “Consider concurrent.futures for True Parallelism” if it is). The issue, more likely, is that you aren’t using the best algorithms and data structures for your problem.

Luckily, the Python standard library has many of the algorithms and data structures you’ll need to use built in. Besides speed, using these common algorithms and data structures can make your life easier. Some of the most valuable tools you may want to use are tricky to implement correctly. Avoiding reimplementation of common functionality will save you time and headaches

#### Double-ended Queue
The deque class from the collections module is a double-ended queue. It provides constant time operations for inserting or removing items from its beginning or end. This makes it ideal for first-in-first-out (FIFO) queues.
Click here to view code image

The list built-in type also contains an ordered sequence of items like a queue. You can insert or remove items from the end of a list in constant time. But inserting or removing items from the head of a list takes linear time, which is much slower than the constant time of a deque.

In [3]:
from collections import deque
fifo = deque()
fifo.append(1)      # Producer
fifo.append(11)
print(fifo.popleft())  # Consumer
print(fifo.popleft())

1
11


#### Ordered Dictionary
Standard dictionaries are unordered. That means a dict with the same keys and values can result in different orders of iteration. This behavior is a surprising byproduct of the way the dictionary’s fast hash table is implemented.

The OrderedDict class from the collections module is a special type of dictionary that keeps track of the order in which its keys were inserted. Iterating the keys of an OrderedDict has predictable behavior. This can vastly simplify testing and debugging by making all code deterministic.


In [6]:
from collections import OrderedDict
a = OrderedDict()
a['foo'] = 1
a['bar'] = 2
b = OrderedDict()
b['foo'] = 'red'
b['bar'] = 'blue'
for value1, value2 in zip(a.values(), b.values()):
    print(value1, value2)

1 red
2 blue


#### Default Dictionary
Dictionaries are useful for bookkeeping and tracking statistics. One problem with dictionaries is that you can’t assume any keys are already present. That makes it clumsy to do simple things like increment a counter stored in a dictionary.


In [9]:
stats = {}
key = 'my_counter'
if key not in stats:
   stats[key] = 0
stats[key] += 1

print(stats['my_counter'])

1


The defaultdict class from the collections module simplifies this by automatically storing a default value when a key doesn’t exist. All you have to do is provide a function that will return the default value each time a key is missing. In this example, the int built-in function returns 0 (see Item 23: “Accept Functions for Simple Interfaces Instead of Classes” for another example). Now, incrementing a counter is simple.

In [11]:
from collections import defaultdict
stats = defaultdict(int)
stats['my_counter'] += 1
print(stats['my_counter'])

1


#### Heap Queue

Heaps are useful data structures for maintaining a priority queue. The heapq module provides functions for creating heaps in standard list types with functions like heappush, heappop, and nsmallest.

Items of any priority can be inserted into the heap in any order.

In [18]:
from heapq import heappush, heappop, nsmallest
a = []
heappush(a, 5)
heappush(a, 3)
heappush(a, 7)
heappush(a, 4)

print(heappop(a), heappop(a))

3 4


The resulting list is easy to use outside of heapq. Accessing the 0 index of the heap will always return the smallest item.

In [20]:
assert a[0] == nsmallest(1, a)[0] == 5

In [22]:
heappush(a, 422)
heappush(a, 1)
print('Before:', a)
a.sort()
print('After: ', a)

Before: [1, 5, 422, 7]
After:  [1, 5, 7, 422]


** Each of these heapq operations takes logarithmic time in proportion to the length of the list. Doing the same work with a standard Python list would scale linearly.**

#### Bisection

Searching for an item in a list takes linear time proportional to its length when you call the index method.

In [23]:
x = list(range(10**6))
i = x.index(991234)

In [27]:
from bisect import bisect_left
i = bisect_left(x, 991234)

#### Iterator Tools
The itertools built-in module contains a large number of functions that are useful for organizing and interacting with iterators (see Item 16: “Consider Generators Instead of Returning Lists” and Item 17: “Be Defensive When Iterating Over Arguments” for background). Not all of these are available in Python 2, but they can easily be built using simple recipes documented in the module. See help(itertools) in an interactive Python session for more details.

The itertools functions fall into three main categories: Linking iterators together
•  chain: Combines multiple iterators into a single sequential iterator.
• cycle: Repeats an iterator’s items forever.
• tee: Splits a single iterator into multiple parallel iterators.
• zip_longest: A variant of the zip built-in function that works well with iterators of different lengths.
Filtering items from an iterator
• islice: Slices an iterator by numerical indexes without copying.
• takewhile: Returns items from an iterator while a predicate function returns True.
• dropwhile: Returns items from an iterator once the predicate function returns False for the first time.
• filterfalse: Returns all items from an iterator where a predicate function returns False. The opposite of the filter built-in function.
Combinations of items from iterators
• product: Returns the Cartesian product of items from an iterator, which is a nice alternative to deeply nested list comprehensions.
• permutations: Returns ordered permutations of length N with items from an iterator.
• combination: Returns the unordered combinations of length N with unrepeated items from an iterator.
There are even more functions and recipes available in the itertools module that I don’t mention here. Whenever you find yourself dealing with some tricky iteration code, it’s worth looking at the itertools documentation again to see whether there’s anything there for you to use.

### Item 47: Use decimal When Precision Is Paramount

Python is an excellent language for writing code that interacts with numerical data. Python’s integer type can represent values of any practical size. Its double-precision floating point type complies with the IEEE 754 standard. The language also provides a standard complex number type for imaginary values. However, these aren’t enough for every situation.

For example, say you want to compute the amount to charge a customer for an international phone call. You know the time in minutes and seconds that the customer was on the phone (say, 3 minutes 42 seconds). You also have a set rate for the cost of calling Antarctica from the United States ($1.45/minute). What should the charge be?
With floating point math, the computed charge seems reasonable.

In [None]:
rate = 1.45
seconds = 3*60 + 42
cost = rate * seconds / 60
print(cost)

In [None]:
from decimal import Decimal, ROUND_UP
rate = Decimal('0.05')
seconds = Decimal('5')
cost = rate * seconds / Decimal('60')
print(cost)
rounded = cost.quantize(Decimal('0.01'), rounding=ROUND_UP)
print(rounded)

### Item 48: Know Where to Find Community-Built Modules

Python has a central repository of modules (https://pypi.python.org) for you to install and use in your programs. These modules are built and maintained by people like you: the Python community. When you find yourself facing an unfamiliar challenge, the Python Package Index (PyPI) is a great place to look for code that will get you closer to your goal.

To use the Package Index, you’ll need to use a command-line tool named pip. pip is installed by default in Python 3.4 and above (it’s also accessible with python -m pip). For earlier versions, you can find instructions for installing pip on the Python Packaging website (https://packaging.python.org).

Once installed, using pip to install a new module is simple. For example, here I install the pytz module that I used in another item in this chapter (see Item 45: “Use datetime Instead of time for Local Clocks”):

In the example above, I used the pip3 command-line to install the Python 3 version of the package. The pip command-line (without the 3) is also available for installing packages for Python 2. The majority of popular packages are now available for either version of Python (see Item 1: “Know Which Version of Python You’re Using”). pip can also be used with pyvenv to track sets of packages to install for your projects (see Item 53: “Use Virtual Environments for Isolated and Reproducible Dependencies”).

* The Python Package Index (PyPI) contains a wealth of common packages that are built and maintained by the Python community.
* pip is the command-line tool to use for installing packages from PyPI.
* pip is installed by default in Python 3.4 and above; you must install it yourself for older versions.
￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼
* The majority of PyPI modules are free and open source software.

# Collaboration

There are language features in Python to help you construct well-defined APIs with clear interface boundaries. The Python community has established best practices that maximize the maintainability of code over time. There are also standard tools that ship with Python that enable large teams to work together across disparate environments.

Collaborating with others on Python programs requires being deliberate about how you write your code. Even if you’re working on your own, chances are you’ll be using code written by someone else via the standard library or open source packages. It’s important to understand the mechanisms that make it easy to collaborate with other Python programmers.

### Item 49: Write Docstrings for Every Function, Class, and Module

Documentation in Python is extremely important because of the dynamic nature of the language. Python provides built-in support for attaching documentation to blocks of code. Unlike many other languages, the documentation from a program’s source code is directly accessible as the program runs.


The accessibility of documentation makes interactive development easier. You can inspect functions, classes, and modules to see their documentation by using the help built-in function. This makes the Python interactive interpreter (the Python “shell”) and tools like IPython Notebook (http://ipython.org) a joy to use while you’re developing algorithms, testing APIs, and writing code snippets.

A standard way of defining documentation makes it easy to build tools that convert the text into more appealing formats (like HTML). This has led to excellent
￼￼￼￼￼
documentation-generation tools for the Python community, such as Sphinx (http://sphinx-doc.org). It’s also enabled community-funded sites like Read the Docs (https://readthedocs.org) that provide free hosting of beautiful-looking documentation for open source Python projects.

Python’s first-class, accessible, and good-looking documentation encourages people to write more documentation. The members of the Python community have a strong belief in the importance of documentation. There’s an assumption that “good code” also means well-documented code. This means that you can expect most open source Python libraries to have decent documentation.
To participate in this excellent culture of documentation, you need to follow a few guidelines when you write docstrings. The full details are discussed online in PEP 257 (http://www.python.org/dev/peps/pep-0257/). There are a few best-practices you should be sure to follow.

In [None]:
def palindrome(word):
    """Return True if the given word is a palindrome."""
    return word == word[::-1]
print(repr(palindrome.__doc__))

#### Documenting Modules

Each module should have a top-level docstring. This is a string literal that is the first statement in a source file. It should use three double quotes ("""). The goal of this docstring is to introduce the module and its contents.

The first line of the docstring should be a single sentence describing the module’s purpose. The paragraphs that follow should contain the details that all users of the module should know about its operation. The module docstring is also a jumping-off point where you can highlight important classes and functions found in the module.

* Write documentation for every module, class, and function using docstrings. Keep them up to date as your code changes.
* For modules: Introduce the contents of the module and any important classes or functions all users should know about.
* For classes: Document behavior, important attributes, and subclass behavior in the docstring following the class statement.
* For functions and methods: Document every argument, returned value, raised exception, and other behaviors in the docstring following the def statemen

### Item 50: Use Packages to Organize Modules and Provide Stable APIs

As the size of a program’s codebase grows, it’s natural for you to reorganize its structure. You split larger functions into smaller functions. You refactor data structures into helper classes (see Item 22: “Prefer Helper Classes Over Bookkeeping with Dictionaries and Tuples”). You separate functionality into various modules that depend on each other.

At some point, you’ll find yourself with so many modules that you need another layer in your program to make it understandable. For this purpose, Python provides packages. Packages are modules that contain other modules.

In most cases, packages are defined by putting an empty file named __init__.py into a directory. Once __init__.py is present, any other Python files in that directory will be available for import using a path relative to the directory. For example, imagine that you have the following directory structure in your program.
```
main.py
   mypackage/__init__.py
   mypackage/models.py
   mypackage/utils.py
To import the utils module, you use the absolute module name that includes the package directory’s name.
# main.py
from mypackage import utils
```
This pattern continues when you have package directories present within other packages
(like mypackage.foo.bar).


#### Stable APIs
The second use of packages in Python is to provide strict, stable APIs for external consumers.
When you’re writing an API for wider consumption, like an open source package (see Item 48: “Know Where to Find Community-Built Modules”), you’ll want to provide stable functionality that doesn’t change between releases. To ensure that happens, it’s important to hide your internal code organization from external users. This enables you to refactor and improve your package’s internal modules without breaking existing users.
￼￼￼￼￼￼￼￼￼￼￼￼
Python can limit the surface area exposed to API consumers by using the __all__ special attribute of a module or package. The value of __all__ is a list of every name to export from the module as part of its public API. When consuming code does from foo import *, only the attributes in foo.__all__ will be imported from foo. If __all__ isn’t present in foo, then only public attributes, those without a leading underscore, are imported (see Item 27: “Prefer Public Attributes Over Private Ones”).
For example, say you want to provide a package for calculating collisions between moving projectiles. Here, I define the models module of mypackage to contain the representation of projectiles:
Click here to view code image
 ```  # models.py
   __all__ = [‘Projectile’]
   class Projectile(object):
       def __init__(self, mass, velocity):
           self.mass = mass
           self.velocity = velocity
```
           
I also define a utils module in mypackage to perform operations on the Projectile instances, such as simulating collisions between them.
Click here to view code image
```# utils.py
   from . models import Projectile
   __all__ = [‘simulate_collision’]
def _dot_product(a, b): #...
def simulate_collision(a, b): #...
```
Now, I’d like to provide all of the public parts of this API as a set of attributes that are available on the mypackage module. This will allow downstream consumers to always import directly from mypackage instead of importing from mypackage.models or mypackage.utils. This ensures that the API consumer’s code will continue to work even if the internal organization of mypackage changes (e.g., models.py is deleted).
To do this with Python packages, you need to modify the __init__.py file in the mypackage directory. This file actually becomes the contents of the mypackage module when it’s imported. Thus, you can specify an explicit API for mypackage by limiting what you import into __init__.py. Since all of my internal modules already specify __all__, I can expose the public interface of mypackage by simply importing everything from the internal modules and updating __all__ accordingly.
  ``` # __init__.py
   __all__ = []
   from . models import *
   __all__ += models.__all__
   from . utils import *
   __all__ += utils.__all__ ```


Beware of import *
Import statements like from x import y are clear because the source of y is explicitly the x package or module. Wildcard imports like from foo import * can also be useful, especially in interactive Python sessions. However, wildcards make code more difficult to understand.

from foo import * hides the source of names from new readers of the code. If a module has multiple import * statements, you’ll need to check all of the referenced modules to figure out where a name was defined.

Names from import * statements will overwrite any conflicting names within the containing module. This can lead to strange bugs caused by accidental interactions between your code and overlapping names from multiple import * statements.

The safest approach is to avoid import * in your code and explicitly import names with the from x import y style.

Packages in Python are modules that contain other modules. Packages allow you to organize your code into separate, non-conflicting namespaces with unique absolute module names.

Simple packages are defined by adding an __init__.py file to a directory that contains other source files. These files become the child modules of the directory’s package. Package directories may also contain other packages.

You can provide an explicit API for a module by listing its publicly visible names in
￼￼￼￼￼￼￼￼￼￼￼￼￼￼
its __all__ special attribute.

You can hide a package’s internal implementation by only importing public names in the package’s __init__.py file or by naming internal-only members with a leading underscore.

When collaborating within a single team or on a single codebase, using __all__ for explicit APIs is probably unnecessary.

### Item 51: Define a Root Exception to Insulate Callers from APIs


* Defining root exceptions for your modules allows API consumers to insulate themselves from your API.
* Catching root exceptions can help you find bugs in code that consumes an API.
* Catching the Python Exception base class can help you find bugs in API implementations.
* Intermediate root exceptions let you add more specific types of exceptions in the future without breaking your API consumers.

### Item 52: Know How to Break Circular Dependencies

Inevitably, while you’re collaborating with others, you’ll find a mutual interdependency between modules. It can even happen while you work by yourself on the various parts of a single program.


#### what happen when you import ?

To understand what’s happening here, you need to know the details of Python’s import machinery. When a module is imported, here’s what Python actually does in depth-first order:

1. Searches for your module in locations from sys.path
2. Loads the code from the module and ensures that it compiles
3. Creates a corresponding empty module object
4. Inserts the module into sys.module
5. Runs the code in the module object to define its contents

```
# app.py
import dialog
class Prefs(object):
	def get(self, name):
		print("Prefs->get")


prefs = Prefs()
dialog.show()

# dialog.py
import app
class Dialog(object):
    def __init__(self, save_dir):
        self.save_dir = save_dir

save_dialog = Dialog(app.prefs.get('save_dir'))

def show():
   print("show")


#main.py
import app

Traceback (most recent call last):
     File “main.py”, line 4, in <module>
       import app
     File “app.py”, line 4, in <module>
       import dialog
     File “dialog.py”, line 16, in <module>
       save_dialog = Dialog(app.prefs.get(‘save_dir’))
   AttributeError: ‘module’ object has no attribute ‘prefs’
   
```

The problem with a circular dependency is that the attributes of a module aren’t defined until the code for those attributes has executed (after step #5). But the module can be loaded with the import statement immediately after it’s inserted into sys.modules (after step #4).

In the example above, the app module imports dialog before defining anything. Then, the dialog module imports app. Since app still hasn’t finished running—it’s currently importing dialog—the app module is just an empty shell (from step #4). The AttributeError is raised (during step #5 for dialog) because the code that defines prefs hasn’t run yet (step #5 for app isn’t complete).

#### Solution 1 - bottom of the dependency tree
The best solution to this problem is to refactor your code so that the prefs data structure is at the bottom of the dependency tree. Then, both app and dialog can import the same utility module and avoid any circular dependencies. But such a clear division isn’t always possible or could require too much refactoring to be worth the effort.
There are three other ways to break circular dependencies.
Reordering Imports

#### Sol 2 - Reordering Imports

The first approach is to change the order of imports. For example, if you import the dialog module toward the bottom of the app module, after its contents have run, the AttributeError goes away.
```
# app.py
class Prefs(object): #...
   prefs = Prefs()
   import dialog  # Moved
   dialog.show()
```
This works because, when the dialog module is loaded late, its recursive import of app will find that app.prefs has already been defined (step #5 is mostly done for app).

Although this avoids the AttributeError, it goes against the PEP 8 style guide (see
Item 2: “Follow the PEP 8 Style Guide”). The style guide suggests that you always put imports at the top of your Python files. This makes your module’s dependencies clear to new readers of the code. It also ensures that any module you depend on is in scope and available to all the code in your module.

Having imports later in a file can be brittle and can cause small changes in the ordering of your code to break the module entirely. Thus, you should avoid import reordering to solve your circular dependency issues.

#### Sol 3 - Import, Configure, Run

A second solution to the circular imports problem is to have your modules minimize side effects at import time. You have your modules only define functions, classes, and constants.

You avoid actually running any functions at import time. Then, you have each module provide a configure function that you call once all other modules have finished importing. The purpose of configure is to prepare each module’s state by accessing the attributes of other modules.

You run configure after all modules have been imported (step #5 is complete), so all attributes must be defined.

Here, I redefine the dialog module to only access the prefs object when configure is called:
```
Click here to view code image
# dialog.py
   import app
   class Dialog(object):
#...
   save_dialog = Dialog()
def show(): #...
   def configure():
       save_dialog.save_dir = app.prefs.get(‘save_dir’)
I also redefine the app module to not run any activities on import.
# app.py
   import dialog
   class Prefs(object):
#...
prefs = Prefs()
def configure(): #...
```
Finally, the main module has three distinct phases of execution: import everything, configure everything, and run the first activity.
```
# main.py
   import app
   import dialog
   app.configure()
   dialog.configure()
   dialog.show()
   ```
This works well in many situations and enables patterns like dependency injection. But sometimes it can be difficult to structure your code so that an explicit configure step is possible. Having two distinct phases within a module can also make your code harder to
￼
read because it separates the definition of objects from their configuration.
```
####  Sol 4- Dynamic Import

The third—and often simplest—solution to the circular imports problem is to use an import statement within a function or method. This is called a dynamic import because the module import happens while the program is running, not while the program is first starting up and initializing its modules.

Here, I redefine the dialog module to use a dynamic import. The dialog.show function imports the app module at runtime instead of the dialog module importing app at initialization time.

```
Click here to view code image
-- dialog.py
class Dialog(object): #...
   save_dialog = Dialog()
def show():
import app # Dynamic import 
save_dialog.save_dir = app.prefs.get(‘save_dir’) 
The app module can now be the same as it was in the original example. It imports dialog at the top and calls dialog.show at the bottom.

--app.py

   import dialog
class Prefs(object): #...
   prefs = Prefs()
   dialog.show()
   
```

This approach has a similar effect to the import, configure, and run steps from before. The difference is that this requires no structural changes to the way the modules are defined and imported. You’re simply delaying the circular import until the moment you must access the other module. At that point, you can be pretty sure that all other modules have already been initialized (step #5 is complete for everything).

In general, it’s good to avoid dynamic imports like this. The cost of the import statement is not negligible and can be especially bad in tight loops. By delaying execution, dynamic imports also set you up for surprising failures at runtime, such as SyntaxError exceptions long after your program has started running (see Item 56: “Test Everything with unittest” for how to avoid that). However, these downsides are often better than the alternative of restructuring your entire program.



### Item 53: Use Virtual Environments for Isolated and Reproducible Dependencies

Building larger and more complex programs often leads you to rely on various packages from the Python community (see Item 48: “Know Where to Find Community-Built Modules”). You’ll find yourself running pip to install packages like pytz, numpy, and many others.

The problem is that, by default, pip installs new packages in a global location. That causes all Python programs on your system to be affected by these installed modules. In theory, this shouldn’t be an issue. If you install a package and never import it, how could it affect your programs?

The trouble comes from transitive dependencies: the packages that the packages you install depend on. For example, you can see what the Sphinx package depends on after installing it by asking pip.

New versions of a library can subtly change behaviors that API- consuming code relies on. Users on a system may upgrade one package to a new version but not others, which could dependencies. There’s a constant risk of the ground moving beneath your feet.

These difficulties are magnified when you collaborate with other developers who do their work on separate computers. It’s reasonable to assume that the versions of Python and global packages they have installed on their machines will be slightly different than your own. This can cause frustrating situations where a codebase works perfectly on one programmer’s machine and is completely broken on another’s.

The solution to all of these problems is a tool called pyvenv, which provides virtual environments. Since Python 3.4, the pyvenv command-line tool is available by default along with the Python installation (it’s also accessible with python -m venv). Prior versions of Python require installing a separate package (with pip install virtualenv) and using a command-line tool called virtualenv.

pyvenv allows you to create isolated versions of the Python environment. Using pyvenv, you can have many different versions of the same package installed on the same system at the same time without conflicts. This lets you work on many different projects and use many different tools on the same computer.
pyvenv does this by installing explicit versions of packages and their dependencies into completely separate directory structures. This makes it possible to reproduce a Python environment that you know will work with your code. It’s a reliable way to avoid surprising breakages.


* Virtual environments allow you to use pip to install many different versions of the same package on the same machine without conflicts.
* Virtual environments are created with pyvenv, enabled with source bin/activate, and disabled with deactivate.
* You can dump all of the requirements of an environment with pip freeze. You can reproduce the environment by supplying the requirements.txt file to pip install -r.
* In versions of Python before 3.4, the pyvenv tool must be downloaded and installed separately. The command-line tool is called virtualenv instead of pyvenv.

### Item 54: Consider Module-Scoped Code to Configure Deployment Environments

A deployment environment is a configuration in which your program runs. Every program has at least one deployment environment, the production environment. The goal of writing a program in the first place is to put it to work in the production environment and achieve some kind of outcome.

Writing or modifying a program requires being able to run it on the computer you use for developing. The configuration of your development environment may be much different from your production environment. For example, you may be writing a program for supercomputers using a Linux workstation.

Tools like pyvenv (see Item 53: “Use Virtual Environments for Isolated and Reproducible Dependencies”) make it easy to ensure that all environments have the same Python packages installed. The trouble is that production environments often require many external assumptions that are hard to reproduce in development environments.

For example, say you want to run your program in a web server container and give it access to a database. This means that every time you want to modify your program’s code, you need to run a server container, the database must be set up properly, and your program needs the password for access. That’s a very high cost if all you’re trying to do is verify that a one-line change to your program works correctly.

The best way to work around these issues is to override parts of your program at startup time to provide different functionality depending on the deployment environment. For example, you could have two different __main__ files, one for production and one for development.

* Programs often need to run in multiple deployment environments that each have unique assumptions and configurations.
* You can tailor a module’s contents to different deployment environments by using normal Python statements in module scope.
* Module contents can be the product of any external condition, including host introspection through the sys and os modules

### Item 55: Use repr Strings for Debugging Output

```
When debugging a Python program, the print function (or output via the logging built-in module) will get you surprisingly far. Python internals are often easy to access via plain attributes (see Item 27: “Prefer Public Attributes Over Private Ones”). All you need to do is print how the state of your program changes while it runs and see where it goes wrong.
The print function outputs a human-readable string version of whatever you supply it. For example, printing a basic string will print the contents of the string without the surrounding quote characters.
   print(‘foo bar’)
   >>>
   foo bar
This is equivalent to using the '%s' format string and the % operator.
   print(‘%s’ % ‘foo bar’)
   >>>
   foo bar
The problem is that the human-readable string for a value doesn’t make it clear what the actual type of the value is. For example, notice how in the default output of print you can’t distinguish between the types of the number 5 and the string '5'.
   print(5)
   print(‘5’)
>>> 5 5
If you’re debugging a program with print, these type differences matter. What you almost always want while debugging is to see the repr version of an object. The repr built-in function returns the printable representation of an object, which should be its most clearly understandable string representation. For built-in types, the string returned by repr is a valid Python expression.
￼￼￼￼￼
a = ‘\x07’
   print(repr(a))
>>> ‘\x07’
Passing the value from repr to the eval built-in function should result in the same Python object you started with (of course, in practice, you should only use eval with extreme caution).
   b = eval(repr(a))
   assert a == b
When you’re debugging with print, you should repr the value before printing to ensure that any difference in types is clear.
   print(repr(5))
   print(repr(‘5’))
>>> 5 ‘5’
This is equivalent to using the '%r' format string and the % operator. print(‘%r’ % 5)
   print(‘%r’ % ‘5’)
>>> 5 ‘5’
```

Things to Remember
* Calling print on built-in Python types will produce the human-readable string version of a value, which hides type information.
* Calling repr on built-in Python types will produce the printable string version of a value. These repr strings could be passed to the eval built-in function to get back the original value.
* %s in format strings will produce human-readable strings like str. %r will produce printable strings like repr.
* You can define the __repr__ method to customize the printable representation of a class and provide more detailed debugging information.
* You can reach into any object’s __dict__ attribute to view its internals.

### Item 56: Test Everything with unittest

* The only way to have confidence in a Python program is to write tests.
* The unittest built-in module provides most of the facilities you’ll need to write good tests.
* You can define tests by subclassing TestCase and defining one method per behavior you’d like to test. Test methods on TestCase classes must start with the word test.
* It’s important to write both unit tests (for isolated functionality) and integration tests (for modules that interact).* 


### Item 57: Consider Interactive Debugging with pdb

Everyone encounters bugs in their code while developing programs. Using the print function can help you track down the source of many issues (see Item 55: “Use repr Strings for Debugging Output”). Writing tests for specific cases that cause trouble is another great way to isolate problems (see Item 56: “Test Everything with unittest”).

But these tools aren’t enough to find every root cause. When you need something more powerful, it’s time to try Python’s built-in interactive debugger. The debugger lets you inspect program state, print local variables, and step through a Python program one statement at a time.

In most other programming languages, you use a debugger by specifying what line of a source file you’d like to stop on, then execute the program. In contrast, with Python the easiest way to use the debugger is by modifying your program to directly initiate the debugger just before you think you’ll have an issue worth investigating. There is no difference between running a Python program under a debugger and running it normally.

To initiate the debugger, all you have to do is import the pdb built-in module and run its set_trace function. You’ll often see this done in a single line so programmers can comment it out with a single # character.
Click here to view code image
def complex_func(a, b, c): #...
import pdb; pdb.set_trace()
As soon as this statement runs, the program will pause its execution. The terminal that
￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼
started your program will turn into an interactive Python shell.
Click here to view code image
   -> import pdb; pdb.set_trace()
   (Pdb)
At the (Pdb) prompt, you can type in the name of local variables to see their values printed out. You can see a list of all local variables by calling the locals built-in function. You can import modules, inspect global state, construct new objects, run the help built-in function, and even modify parts of the program—whatever you need to do to aid in your debugging. In addition, the debugger has three commands that make inspecting the running program easier.

bt: Print the traceback of the current execution call stack. This lets you figure out where you are in your program and how you arrived at the pdb.set_trace trigger point.
up: Move your scope up the function call stack to the caller of the current function. This allows you to inspect the local variables in higher levels of the call stack.
down: Move your scope back down the function call stack one level.
Once you’re done inspecting the current state, you can use debugger commands to resume the program’s execution under precise control.

step: Run the program until the next line of execution in the program, then return control back to the debugger. If the next line of execution includes calling a function, the debugger will stop in the function that was called.

next: Run the program until the next line of execution in the current function, then return control back to the debugger. If the next line of execution includes calling a function, the debugger will not stop until the called function has returned.
return: Run the program until the current function returns, then return control back to the debugger.
continue: Continue running the program until the next breakpoint (or set_trace is called again).
Things to Remember
* You can initiate the Python interactive debugger at a point of interest directly in your program with the import pdb; pdb.set_trace() statements.
* The Python debugger prompt is a full Python shell that lets you inspect and modify the state of a running program.
* pdb shell commands let you precisely control program execution, allowing you to alternate between inspecting program state and progressing program execution.

In [None]:
def gcd(a,b):
    for i in range(min(a, b),0,-1):
        #import pdb;pdb.set_trace()
        if a%i == 0 and b%i == 0:
            return i
print(gcd(3,99))
        

### Item 58: Profile Before Optimizing

* It’s important to profile Python programs before optimizing because the source of slowdowns is often obscure.
* Use the cProfile module instead of the profile module because it provides more accurate profiling information.
￼
* The Profile object’s runcall method provides everything you need to profile a tree of function calls in isolation.
* The Stats object lets you select and print the subset of 


### Item 59: Use tracemalloc to Understand Memory Usage and Leaks

Memory management in the default implementation of Python, CPython, uses reference counting. This ensures that as soon as all references to an object have expired, the referenced object is also cleared. CPython also has a built-in cycle detector to ensure that self-referencing objects are eventually garbage collected.

In theory, this means that most Python programmers don’t have to worry about allocating or deallocating memory in their programs. It’s taken care of automatically by the language and the CPython runtime. However, in practice, programs eventually do run out of memory due to held references. Figuring out where your Python programs are using or leaking memory proves to be a challenge.

The first way to debug memory usage is to ask the gc built-in module to list every object currently known by the garbage collector. Although it’s quite a blunt tool, this approach does let you quickly get a sense of where your program’s memory is being used.

Here, I run a program that wastes memory by keeping references. It prints out how many objects were created during execution and a small sample of allocated objects.
Click here to view code image

```
import gc
   found_objects = gc.get_objects()
   print(‘%d objects before’ % len(found_objects))
   import waste_memory
   x = waste_memory.run()
   found_objects = gc.get_objects()
   print(‘%d objects after’ % len(found_objects))
   for obj in found_objects[:3]:
       print(repr(obj)[:100])
```

* It can be difficult to understand how Python programs use and leak memory.
* The gc module can help you understand which objects exist, but it has no information about how they were allocated.
* The tracemalloc built-in module provides powerful tools for understanding the source of memory usage.
* tracemalloc is only available in Python 3.4 and above.