# Purpose

These are some idiomatic `Pythonic` ways to write code based on this [video](https://www.youtube.com/watch?v=OSGv2VnC0go).

# Looping over a range of numbers

The key is to avoid creating an array. Use the `range` function instead as it will make your code more concise and is more memory efficient.

## Don't do this

In [1]:
for i in [0, 1, 2, 3, 4, 5]:
    print(i ** 2)

0
1
4
9
16
25


## Do this

In [2]:
for i in range(6):
    print(i ** 2)

0
1
4
9
16
25


# Looping over a collection

Avoid using an index to access your elements in the array.

## Don't do this

In [3]:
names = ['john', 'jane', 'jeremy', 'janice', 'joyce', 'jonathan']

for i in range(len(names)):
    print(names[i])

john
jane
jeremy
janice
joyce
jonathan


## Do this

In [4]:
for name in names:
    print(name)

john
jane
jeremy
janice
joyce
jonathan


# Looping backwards

The key here is to avoid the awkward `-1` values and nested functions (look at how many parenthesis pairs are involved). Use `reverse` to make your code more elegant.

## Don't do this

In [5]:
for i in range(len(names) - 1, -1, -1):
    print(names[i])

jonathan
joyce
janice
jeremy
jane
john


## Do this

In [6]:
for name in reversed(names):
    print(name)

jonathan
joyce
janice
jeremy
jane
john


# Looping over a collection and indicies

The key here is to use `enumerate` which will return the index with the element.

## Don't do this

In [7]:
for i in range(len(names)):
    print(i, names[i])

0 john
1 jane
2 jeremy
3 janice
4 joyce
5 jonathan


## Do this

In [8]:
for i, name in enumerate(names):
    print(i, name)

0 john
1 jane
2 jeremy
3 janice
4 joyce
5 jonathan


# Looping over two collections

The key is to avoid accessing elements by indicies and also managing the concern of which list is smaller than which. Use `zip` to iterate over the two lists; the iteration will only go until the end of the shorter list.

## Don't do this

In [9]:
names = ['john', 'jane', 'jeremy', 'janice', 'joyce', 'jonathan']
colors = ['red', 'green', 'blue', 'orange', 'purple', 'pink']

n = min(len(names), len(colors))
for i in range(n):
    print(names[i], colors[i])

john red
jane green
jeremy blue
janice orange
joyce purple
jonathan pink


## Do this

In [10]:
for name, color in zip(names, colors):
    print(name, color)

john red
jane green
jeremy blue
janice orange
joyce purple
jonathan pink


# defaultdict

The key is to avoid checking to see if a key exists in the dictionary, and if not, then initialize its associated value. The use of `defaultdict` will initialize a value associated with a key that does not yet exists upon first access.

## Don't do this

In [11]:
d = {}
for name in names:
    key = len(name)
    if key not in d:
        d[key] = []
    d[key].append(name)
    
print(d)

{4: ['john', 'jane'], 6: ['jeremy', 'janice'], 5: ['joyce'], 8: ['jonathan']}


## Do this

In [12]:
from collections import defaultdict

d = defaultdict(list)
for name in names:
    key = len(name)
    d[key].append(name)

print(d)

defaultdict(<class 'list'>, {4: ['john', 'jane'], 6: ['jeremy', 'janice'], 5: ['joyce'], 8: ['jonathan']})


# ChainMap

The key is to avoid copying and updating dictionaries just to override values. `ChainMap` will take care of this concern. Notice how the discouraged approached copies `d1` then updates with `d2`, while `ChainMap` starts with `d2` followed by `d1`. This part of the `ChainMap` is awkward.

## Don't do this

In [13]:
d1 = {'color': 'red', 'user': 'jdoe'}
d2 = {'color': 'blue', 'first_name': 'john', 'last_name': 'doe'}

d = d1.copy()
d.update(d2)

for k, v in d.items():
    print(k, v)

color blue
user jdoe
first_name john
last_name doe


## Do this

In [14]:
from collections import ChainMap

d1 = {'color': 'red', 'user': 'jdoe'}
d2 = {'color': 'blue', 'first_name': 'john', 'last_name': 'doe'}

d = ChainMap(d2, d1)
for k, v in d.items():
    print(k, v)

color blue
user jdoe
first_name john
last_name doe


# Counter

Like `defaultdict`, `Counter` initialize values associated with keys to 0. Note how we get rid of checking to see if a key entry exists?

## Don't do this

In [15]:
d = {}
for name in names:
    key = len(name)
    if key not in d:
        d[key] = 0
    d[key] = d[key] + 1
    
print(d)

{4: 2, 6: 2, 5: 1, 8: 1}


## Do this

In [16]:
d = defaultdict(int)

for name in names:
    key = len(name)
    d[key] = d[key] + 1

print(d)

defaultdict(<class 'int'>, {4: 2, 6: 2, 5: 1, 8: 1})


In [17]:
from collections import Counter

d = Counter()
for name in names:
    key = len(name)
    d[key] = d[key] + 1
    
print(d)

Counter({4: 2, 6: 2, 5: 1, 8: 1})


# namedtuple

The key here is to avoid accessing `tuples` by indicies since those indicies are meaningless. Instead, use `namedtuple` and access elements of the tuple by a meaningful name.

## Don't do this

In [18]:
scores = [80, 90, 95, 88, 99, 93]

students = [(name, score) for name, score in zip(names, scores)]
for student in students:
    print('{} {}'.format(student[0], student[1]))

john 80
jane 90
jeremy 95
janice 88
joyce 99
jonathan 93


## Do this

In [19]:
from collections import namedtuple

Student = namedtuple('Student', 'name score')

students = [Student(name, score) for name, score in zip(names, scores)]
for student in students:
    print('{} {}'.format(student.name, student.score))

john 80
jane 90
jeremy 95
janice 88
joyce 99
jonathan 93


# Unpacking sequences

The key is to avoid long code that breaks up the coherent intention. In the discouraged approach, we receive a tuple, and store it in `s` and then for each element in `s`, use a different line to access the values. In the encouraged approach, the tuple is unpacked neatly into one line.


## Don't do this

In [20]:
def get_student():
    return 'john', 'doe', 88

s = get_student()
first_name = s[0]
last_name = s[1]
score = s[2]

print(first_name, last_name, score)

john doe 88


## Do this

In [21]:
first_name, last_name, score = get_student()

print(first_name, last_name, score)

john doe 88


# String concatentation

The key here is to avoid writing too much code just to concatenate a string. In the discouraged approach, note how we have to add logic to append a comma `,`? In the encourage approach, the for loop is gone and there is no more need for when to add a comma.

## Don't do this

In [22]:
s = ''
for i, name in enumerate(names):
    s += name
    if i < len(names) - 1:
        s += ', '

s

'john, jane, jeremy, janice, joyce, jonathan'

## Do this

In [23]:
', '.join(names)

'john, jane, jeremy, janice, joyce, jonathan'

# Updating sequences

There is not much differences between the discouraged and encouraged approaches here. However, removing an element by value rather than by index seems much more meaningful. 

## Don't do this

In [24]:
names = ['john', 'jane', 'jeremy', 'janice', 'joyce', 'jonathan']

del names[0]
print(names)

names.pop(0)
print(names)

names.insert(0, 'jerry')
print(names)

['jane', 'jeremy', 'janice', 'joyce', 'jonathan']
['jeremy', 'janice', 'joyce', 'jonathan']
['jerry', 'jeremy', 'janice', 'joyce', 'jonathan']


## Do this

In [25]:
from collections import deque

names = ['john', 'jane', 'jeremy', 'janice', 'joyce', 'jonathan']

names.remove('john')
print(names)

names.pop(0)
print(names)

names.insert(0, 'jerry')
print(names)

['jane', 'jeremy', 'janice', 'joyce', 'jonathan']
['jeremy', 'janice', 'joyce', 'jonathan']
['jerry', 'jeremy', 'janice', 'joyce', 'jonathan']


# decorators

The key here is to use the `lru_cache` decorator to cache results of functions that [idempotent](https://en.wikipedia.org/wiki/Idempotence), especially if they are expensive to call. Note how calls to `add` takes about 700 milliseconds? However, using the `lru_cache` decorator, subsequent calls are on the order of microseconds.

## Don't do this

In [26]:
def add(n):
    return sum([i for i in range(n)])

In [27]:
%%time
add(10000000)

CPU times: user 530 ms, sys: 188 ms, total: 718 ms
Wall time: 720 ms


49999995000000

In [28]:
%%time
add(10000000)

CPU times: user 498 ms, sys: 162 ms, total: 660 ms
Wall time: 659 ms


49999995000000

## Do this

In [29]:
from functools import lru_cache

@lru_cache(maxsize=32)
def add(n):
    return sum([i for i in range(n)])

In [30]:
%%time
add(10000000)

CPU times: user 487 ms, sys: 162 ms, total: 649 ms
Wall time: 649 ms


49999995000000

In [31]:
%%time
add(10000000)

CPU times: user 3 µs, sys: 0 ns, total: 3 µs
Wall time: 5.96 µs


49999995000000

# Reading a file

The key here is to use a context manager to manage resources. 

## Don't do this

In [32]:
f = open('README.md')
try:
    data = f.read()
    print(data)
finally:
    f.close()

# Purpose

Some useful Jupyter notebook recipes.


## Do this

In [33]:
with open('README.md') as f:
    data = f.read()
    print(data)

# Purpose

Some useful Jupyter notebook recipes.


# Deleting a file

The key here is to avoid the `try/except` code and favor a context manager approach.

## Don't do this

In [34]:
import os

try:
    os.remove('test.tmp')
except OSError:
    pass

## Do this

In [35]:
from contextlib import suppress

with suppress(OSError):
    os.remove('test.tmp')

# List vs generator comprehensions

The key here is to avoid looping over elements and storing results. Instead, use a for or generator comprehension. Note that the `for` (note the brackets) comprehension eagerly evaluates the expressions and returns a list, but the `generator` (note the parentheses) lazily evaluates the expressions.

## Don't do this

In [36]:
results = []
for i in range(10):
    s = i ** 2
    results.append(s)
total = sum(results)
print(total)

285


## Do this

In [37]:
total = sum([i ** 2 for i in range(10)])
print([i ** 2 for i in range(10)])
print(total)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
285


In [38]:
total = sum((i ** 2 for i in range(10)))
print((i ** 2 for i in range(10)))
print(total)

<generator object <genexpr> at 0x104d4ed68>
285
