# Effective Python: 59 ways to write better python

This iPython Notebook goes through the book *Effective Python: 59 specific ways to write better Python*. The goal of this iPython notebook is to learn more efficient ways to utilize python. Each item will keep in mind Chapter 1's main topic, *Pythonic Thinking*, a coding paradigm specific to python that is widely spread throughout the community.

## Chapter 1: Pythonic Thinking

<a id="zen"></a>
The Zen of Python, by Tim Peters

Beautiful is better than ugly. <br>
Explicit is better than implicit. <br>
Simple is better than complex. <br>
Complex is better than complicated. <br>
Flat is better than nested. <br>
Sparse is better than dense. <br>
Readability counts. <br>
Special cases aren't special enough to break the rules. <br>
Although practicality beats purity. <br>
Errors should never pass silently. <br>
Unless explicitly silenced. <br>
In the face of ambiguity, refuse the temptation to guess. <br>
There should be one-- and preferably only one --obvious way to do it. <br>
Although that way may not be obvious at first unless you're Dutch.<br>
Now is better than never.<br>
Although never is often better than *right* now.<br>
If the implementation is hard to explain, it's a bad idea.<br>
If the implementation is easy to explain, it may be a good idea.<br>
Namespaces are one honking great idea -- let's do more of those!<br>

### Item 1: Know what version of python you're using

Many packages and dependencies were changed as Python made the transition from python2 to python3. By default, when typing **python (filename)** into the terminal, the latest version of python2 is used. By contrast, running **python3** in the shell runs the latest version of python3. Below are a few ways to check the version(s) of python installed

In [94]:
# Using bang (!) to run in-shell commands
!python --version 

Python 2.7.10


In [95]:
!python3 --version

Python 3.6.3


In [96]:
import sys
print(sys.version_info)
print(sys.version)

sys.version_info(major=3, minor=6, micro=3, releaselevel='final', serial=0)
3.6.3 (default, Oct  4 2017, 06:09:15) 
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.37)]


### Item 2: PEP8 style-guide

PEP8 (Python Enhancement Proposal #8) is a guide for formatting python code. The following link contains the guide in entirety, and is worth checking out: http://www.python.org/dev/peps/pep-0008/ <br>
Below are some primary rules worth following immediately (I've only selected the ones that I consider to be clearly relevant): <br>

**White Space:** Keep in mind that white space is syntactically significant <br>
<ul>
    <li> Lines should be less than 79 characters in length
    <li> Continuations of long lines should be indented properly (as in R)
    <li> In a file, functions and classes should be separated by two blank lines
    <li> In a class, methods should be separated by one blank line
    <li> Put one, and only one, space before and after variable assignment
    <li> Don't put spaces around list indices, function calls, or keyword argument assignments
</ul>

**Naming:** A particular style is associated with different names within the language, here's a guide: <br>
<ul>
    <li> Functions, variables, and attributes follow `lowercase_underscore`
    <li> Protected instance attributes should follow `_leading_underscore`
    <li> Private instance attributes should follow `__double_leading_underscore`
    <li> Classes and exceptions should be in `ReverseCamelCase`
    <li> Module-level constants should follow `ALL_CAPS`
    <li> Instance methods within a class should pass `self` as the first argument (referring to object)
    <li> Class methods should use `cls` as the name of the first parameter (referring to the class)
</ul>

**Expressions and Statements:** Referring to [The Zen of Python](#zen), "There should be one-- and preferably only one --obvious way to do it." The PEP8 guide eliminates some ambiguity in decision making, as seen as follows: <br>
<ul>
    <li> Use inline negation (`if a is not b`) rather than negation of positive expressions (`if not as is b`)
    <li> Don't check for empty values (like `[]` or `''`) by checking the length (`if len(somelist) == 0`). Use `if not somelist` and assume empty values evaluate to `False`
    <li> Similarly for non-empty values (like `['hello_world']`. Assume `if somelist` will evaluate to `True` if non-empty
    <li> Avoid single line `if` statements, `for` and `while` loops. Spread them out for clarity.
    <li> `import` statement belong at the top of the file
    <li> When importing a module, be sure to use the absolute path: `from foo import bar`
    <li> Imports should be in sections in the following order: standard library modules, third-party modules, your own modules. Each subsection should have imports in alphabetical order.
</ul>

### Item 3: Differences between bytes, str, and unicode
A lot to this topic surely, below is a summary of the key points made:
<ul>
    <li> **Python3:** Sequences of characters can be `bytes` or `str`. Instances of `bytes` contain raw 8-bit values; instances of `str` contain Unicode characters
    <li> **Python2:** Sequences of characters can be `str` or `unicode`. Instances of `str` contain raw 8-bit values; instances of `unicode` contain Unicode characters
</ul>

Have caution with which sort of sequence of characters you'd like to work.
    
  

### Item 5: Slice Sequences

Often we are working with an iterable, so we will need to extract subsets of data. Python offers a variety of ways designed to make that as easy as possible.


In [97]:
a = ['a','b','c','d','e','f','g','h']
print("First Four:", a[:4])
print("Last Four:", a[-4:])
print("Middle Two:", a[3:-3])

First Four: ['a', 'b', 'c', 'd']
Last Four: ['e', 'f', 'g', 'h']
Middle Two: ['d', 'e']


When slicing from the start of the list, leave out the zero index

In [98]:
b = a[0:8] # NO
b = a[:8] # YES
b

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']

Similarly, when slicing from the end of the list leave out the last index

In [99]:
b = a[5:len(a)] # NO
b = a[5:] # YES
b

['f', 'g', 'h']

Here are a variety of ways to slice

In [100]:
print(a[:])     # ['a','b','c','d','e','f','g','h']
print(a[:5])    # ['a','b','c','d','e'] 
print(a[:-1])   # ['a','b','c','d','e','f','g'] 
print(a[4:])    #                 ['e','f','g','h']
print(a[-3:])   #                     ['f','g','h']
print(a[2:5])   #         ['c','d','e']
print(a[2:-1])  #         ['c','d','e','f','g']
print(a[-3:-1]) #                     ['f','g']

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
['a', 'b', 'c', 'd', 'e']
['a', 'b', 'c', 'd', 'e', 'f', 'g']
['e', 'f', 'g', 'h']
['f', 'g', 'h']
['c', 'd', 'e']
['c', 'd', 'e', 'f', 'g']
['f', 'g']


**Note:** An IndexError is returned if the requested slice goes out of bounds.

The result of a list is another list, that is a copy is made. References from the original list are maintained:

In [101]:
b = a[4:]
print("Before: ", b)
b[1] = 99
print("After: ", b)
print("No change: ", a)

Before:  ['e', 'f', 'g', 'h']
After:  ['e', 99, 'g', 'h']
No change:  ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']


### Item 6: Avoid using start, end and stride in Single Slice

Python has syntax to easily allow iterate every nth element. The example below groups the array by even and odd indices

In [102]:
# Evens : 0, 2, 4, ...
# Odds  : 1, 3, 5, ...
a = ['red','orange','yellow','green','blue','purple']
evens = a[::2]
odds = a[1::2]
print(evens)
print(odds)

['red', 'yellow', 'blue']
['orange', 'green', 'purple']


The problem is that the `stride` syntax causes unexpected behavior. For example, a common Python trick is to reverse the characters of a bit string by passing -1 into the `stride` argument:

In [103]:
x = b'mongoose'
y = x[::-1]
print(y)

b'esoognom'


This works fine for now, but has problems for unicode characters encoded as UTF-8 strings. Here are some cases where one may find it useful:

In [104]:
a = ['a','b','c','d','e','f','g','h']
print(a[::2])
print(a[-2::2])
print(a[-2:2:-2])
print(a[2:2:-2])

['a', 'c', 'e', 'g']
['g']
['g', 'e']
[]


### Item 7: Use list comprehensions instead of *map* and *filter*

Two derive one list from another, python has a very useful syntax: *list comprehension*. Consider the following code below,

In [105]:
a = [i for i in range(0,10)]

In [106]:
a

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [107]:
b = [elm**2 for elm in a]

In [108]:
b

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

*List comprehensions* are much clearer than map, except for simple cases. `map` requires the creation of a lambda function. See the following:

In [109]:
squares = map(lambda x: x**2, a)

*List comprehensions* are more useful since they allow users to specify conditionals more easily than `map`:

In [110]:
even_squares = [x**2 for x in a if x%2 == 0]

In [111]:
even_squares

[0, 4, 16, 36, 64]

The `filter` method can be used alongside map for the same effect, but is noticely uglier to read

In [112]:
alt = map(lambda x: x**2, filter(lambda x: x%2 == 0, a))

Dictionaries and sets are congruent. *List Comprehension* may be performed in the following way,

In [113]:
chile_ranks = {'ghost': 1, 'habanero': 2, 'cayenne': 3}
rank_dict = {rank: name for name, rank in chile_ranks.items()}
chile_len_set = {len(name) for name in rank_dict.values()}

chile_len_set

{5, 7, 8}

### Item 8: Avoid more than two expressions in list comprehensions

One may pass multiple expressions using *list comprehension* as well. Consider turning a matrix (list of lists) into a flat list as follows,

In [114]:
matrix = [[1,2,3],[4,5,6],[7,8,9]]
flat = [x for row in matrix for x in row]
print(flat)

[1, 2, 3, 4, 5, 6, 7, 8, 9]


Similarly, consider if one wanted to square every element in a matrix and preserve the matrix structure:

In [115]:
squared = [[x**2 for x in row] for row in matrix]

In [116]:
squared

[[1, 4, 9], [16, 25, 36], [49, 64, 81]]

*List comprehensions* also support multiple conditional statement

In [117]:
a = [i for i in range(1,11)]
b = [x for x in a if x > 4 if x%2 == 0]
c = [x for x in a if x > 4 and x%2 == 0]

In [118]:
print(b); print(c)

[6, 8, 10]
[6, 8, 10]


### Item 9: Consider *generator expresssions* for large *list comprehensions*

*List comprehensions* create a copy of same length as the input list, hence they consume a lot of memory. For large data sets, utilize *generator expressions*. A *generator expression* is similar to an iterator in C++, and the syntax is to wrap the line around in (). Consider the following,

In [119]:
itr = (len(x) for x in open('./tmp/my_file.txt'))

In [120]:
itr

<generator object <genexpr> at 0x10c988518>

In [121]:
type(itr)

generator

Use the `next()` function in orer to iterate through the *generator*,

In [122]:
print(next(itr))
print(next(itr))

7
7


*Generators* may also be composed together. Consider the following piece of code that uses an iterator returned by a generator as the input for another generator expression:

In [123]:
roots = ((x,x**0.5) for x in itr)

In [124]:
print(next(roots))

(7, 2.6457513110645907)


Each time an iterator is advanced, it also advanced the inner iterator creating a domino effect of looping, evaluating conditional expressions, and passing around inputs and outputs. Chaining *generators* in Python executes very quickly. For large inputs, *generators* should be the de facto method. The only problem is that generators are **stateful**, meaning that they can only be used once.

### Item 10: Prefer *enumerate* over *range* where applicable

*Range* is useful when iterating over a set of integers. For example,

In [125]:
from random import randint

random_bits = 0
for i in range(64):
    if randint(0,1):
        random_bits |= 1 << i

Python makes iterating over many data structures quite simple. Consider the following method of iterating over a list,

In [126]:
flavor_list = ['vanilla','chocolate','pecan','strawberry']
for flavor in flavor_list:
    print('%s is delicious' % flavor)

vanilla is delicious
chocolate is delicious
pecan is delicious
strawberry is delicious


Suppose you're interested iterating over a list and outputting each values **and** its corresponding index. *Range* could be used for this as follows,

In [127]:
for i in range(len(flavor_list)):
    flavor = flavor_list[i]
    print('%d: %s' % (i+1, flavor))

1: vanilla
2: chocolate
3: pecan
4: strawberry


Compare the above with the following chunk using *enumerate*,

In [128]:
for i, flavor in enumerate(flavor_list):
    print('%d: %s' % (i+1,flavor))

1: vanilla
2: chocolate
3: pecan
4: strawberry


*Enumerate* has an optional argument that will offset the count shown below,

In [129]:
for i, flavor in enumerate(flavor_list, 1):
    print('%d: %s' % (i, flavor))

1: vanilla
2: chocolate
3: pecan
4: strawberry


### Item 11: Use *zip* to process iterators in parallel

While *List Comprehensions* make it easy to derive one list from another through some expression, 

In [130]:
names = ['Cecilia', 'Lise', 'Marie']
letters = [len(n) for n in names]

We now have two lists that are highly related to one another. Suppose you're interested in determining the maximum letters in the longest name, the following implementation using *range* will suffice,

In [131]:
longest_name = None
max_letters = 0
for i in range(len(names)):
    count = letters[i]
    if count > max_letters:
        longest_name = names[i]
        max_letters = count
print('The longest name is %s, and it contains %d letters' % (longest_name, max_letters))

The longest name is Cecilia, and it contains 7 letters


*Enumerate* makes this a little more visually appealing,

In [132]:
longest_name = None
max_letters = 0
for i, name in enumerate(names):
    count = letters[i]
    if count > max_letters:
        longest_name = name
        max_letters = count
print('The longest name is %s, and it contains %d letters' % (longest_name, max_letters))

The longest name is Cecilia, and it contains 7 letters


For these situations, Python offers the *zip* function built-in. In Python3, *zip* wraps two or more iterators with a lazy *generator*. The *zip* generator yields tuples containing the next value from each iterator. It is much more efficient (using a *generator*) and is much more aesthetic as seen below,

In [133]:
for name, count in zip(names, letters):
    if count > max_letters:
        longest_name = name
        max_letters = count
        
print('The longest name is %s, and it contains %d letters' % (longest_name, max_letters))

The longest name is Cecilia, and it contains 7 letters


**Warnings/Notes:** In Python2, *zip* is not a *generator*, it will fully exhaust the supplied generators and return a *list* of all the *tuples* it created. This could potentially use a lot of memory (think of every n-pair combintation $\frac{n(n+1)}{2}$). Hence, if using *zip* in Python2 and this is a concern, consider using *izip* from the `itertools` built-in module. Secondly, *zip* has unpredictable behavior for input iterators of different lengths:

In [134]:
names.append('Jordan')

In [135]:
for name, count in zip(names, letters):
    print(name)

Cecilia
Lise
Marie


Notice that Jordan is not included. This is because Python3's version of *zip* continues iterating until one of the iterators is completely exhausted. For many uses, one uses *zip* after performing a *list comprehension*, and is confident that two iterables are of the same length. If this is a worry, consider using *zip_longest* from the `itertools` built-in module.

### Item 13: Take advantage of each block in `try/except/else/finally`

There are four cases in which one may want to perform some action when exception handling. Each of the `try/except/else/finally` blocks serves its own purpose.

**Finally Blocks:** <br>
Use `try/finally` when you want exceptions to propagate up, but you also want to run cleanup code even when exceptions occur. A common usage of `try/finally` is for file handling:

In [136]:
handle = open('./tmp/my_file.txt') # May raise IOError
try:
    data = handle.read()           # May raise UnicodeEncodeDecode Error
finally:
    handle.close()                 # Always runs after try, good because we should always close the file

**Else Blocks:** <br>
Use `try/except/else` to be clear about which exceptions will be handled by the code, and which will propagate up. When `try` doesn't raise an exception, the `else` block will run. The point of the `else` block is to minimize the amount of code written in the `try` block. For example, imagine you want to load a JSON dictionary data from a string and return the value of a key it contains:

In [137]:
def load_json_key(data, key):
    try:
        result_dict = json.loads(data) # May raise ValueError
    except ValueError as e:
        raise KeyError from e
    else:
        return result_dict[key]        # May raise KeyError

**Note:** This is what is meant by propogation. Consider the situation above: when json.load() is called in the `try` block, it may raise a `ValueError` and enter the `except` block. If however, no `ValueError` is during the `try` block it will immediately skip to the `else` block. When trying to access a particular key of a dictionary, a `KeyError` may be raised. It doesn't appear that this will be handled, but surely it will since it will propagate up and the `raise KeyError from e` will be called.

**Try/Except/Else/Finally:** <br>
Use `try/except/else/finally` when you'd like to do it all in one. Suppose you'd like to read a description from a file, process it, and then update the file in place. Here, the `try` block is used to read the file and process it. The `except` block is used to handle exceptions from the `try` block that are anticipated. The `else` block is used to update the file in place and to allow related exceptions to propagate up. The `finally` block cleans up the file handle

In [138]:
UNDEFINED = object()

def divide_json(path):
    handle = open(path, 'r+')              # May raise IOError
    try: 
        data = handle.read()               # May raise UnicodeDecodeError
        op = json.loads(data)              # May raise ValueError
        value = (
            op['numerator'] / 
            op['denominator'])             # May raise ZeroDivisionError
    except ZeroDivisionError as e:
        return UNDEFINED
    else:
        op['result'] = value
        result = json.dumps(op)
        handle.seek(0)
        hand.write(result)                 # May raise IOError
        return value
    finally: 
        handle.close()

## Chapter 2: Functions