# Purpose

These are some idiomatic `Pythonic` ways to write code based on this [video](https://www.youtube.com/watch?v=OSGv2VnC0go) by [Raymond Hettinger](https://twitter.com/raymondh). Under each major section, you will see two sub-sections: `Don't do this` and `Do this`. Code under `Don't do this` are discouraged, and following the adjective of [Jeff Knupp](https://jeffknupp.com/writing-idiomatic-python-ebook/), are `harmful`. Code under `Do this` are the encouraged, `beautiful` and `idiomatic` Pythonic way to write the code instead. 

Additional idiomatic Pythonic syntax has also been added in while some from the original video were left out (we will try to find alternative working examples).

# Looping over a range of numbers

The key is to avoid creating an array. Use the `range` function instead as it will make your code more concise and is more memory efficient.

## Don't do this

In [1]:
for i in [0, 1, 2, 3, 4, 5]:
    print(i ** 2)

0
1
4
9
16
25


## Do this

In [2]:
for i in range(6):
    print(i ** 2)

0
1
4
9
16
25


# Looping over a collection

Avoid using an index to access your elements in the array.

## Don't do this

In [3]:
names = ['john', 'jane', 'jeremy', 'janice', 'joyce', 'jonathan']

for i in range(len(names)):
    print(names[i])

john
jane
jeremy
janice
joyce
jonathan


## Do this

In [4]:
for name in names:
    print(name)

john
jane
jeremy
janice
joyce
jonathan


# Looping backwards

The key here is to avoid the awkward `-1` values and nested functions (look at how many parenthesis pairs are involved). Use `reverse` to make your code more elegant.

## Don't do this

In [5]:
for i in range(len(names) - 1, -1, -1):
    print(names[i])

jonathan
joyce
janice
jeremy
jane
john


## Do this

In [6]:
for name in reversed(names):
    print(name)

jonathan
joyce
janice
jeremy
jane
john


# Looping over a collection and indicies

The key here is to use `enumerate` which will return the index with the element.

## Don't do this

In [7]:
for i in range(len(names)):
    print(i, names[i])

0 john
1 jane
2 jeremy
3 janice
4 joyce
5 jonathan


## Do this

In [8]:
for i, name in enumerate(names):
    print(i, name)

0 john
1 jane
2 jeremy
3 janice
4 joyce
5 jonathan


# Looping over two collections

The key is to avoid accessing elements by indicies and also managing the concern of which list is smaller than which. Use `zip` to iterate over the two lists; the iteration will only go until the end of the shorter list.

## Don't do this

In [9]:
names = ['john', 'jane', 'jeremy', 'janice', 'joyce', 'jonathan']
colors = ['red', 'green', 'blue', 'orange', 'purple', 'pink']

n = min(len(names), len(colors))
for i in range(n):
    print(names[i], colors[i])

john red
jane green
jeremy blue
janice orange
joyce purple
jonathan pink


## Do this

In [10]:
for name, color in zip(names, colors):
    print(name, color)

john red
jane green
jeremy blue
janice orange
joyce purple
jonathan pink


# defaultdict

The key is to avoid checking to see if a key exists in the dictionary, and if not, then initialize its associated value. The use of `defaultdict` will initialize a value associated with a key that does not yet exists upon first access.

## Don't do this

In [11]:
d = {}
for name in names:
    key = len(name)
    if key not in d:
        d[key] = []
    d[key].append(name)
    
print(d)

{4: ['john', 'jane'], 6: ['jeremy', 'janice'], 5: ['joyce'], 8: ['jonathan']}


## Do this

In [12]:
from collections import defaultdict

d = defaultdict(list)
for name in names:
    key = len(name)
    d[key].append(name)

print(d)

defaultdict(<class 'list'>, {4: ['john', 'jane'], 6: ['jeremy', 'janice'], 5: ['joyce'], 8: ['jonathan']})


# Dictionary default values

Use the dictionary `.get` method with a supplied default value.

## Don't do this

In [13]:
d = {
    'username': 'jdoe'
}

is_authorized = False
if 'auth_token' in d:
    is_authorized = True
    
print(is_authorized)

False


## Do this

In [14]:
is_authorized = d.get('auth_token', False)

print(is_authorized)

False


# ChainMap

The key is to avoid copying and updating dictionaries just to override values. `ChainMap` will take care of this concern. Notice how the discouraged approached copies `d1` then updates with `d2`, while `ChainMap` starts with `d2` followed by `d1`. This part of the `ChainMap` is awkward.

## Don't do this

In [15]:
d1 = {'color': 'red', 'user': 'jdoe'}
d2 = {'color': 'blue', 'first_name': 'john', 'last_name': 'doe'}

d = d1.copy()
d.update(d2)

for k, v in d.items():
    print(k, v)

color blue
user jdoe
first_name john
last_name doe


## Do this

In [16]:
from collections import ChainMap

d1 = {'color': 'red', 'user': 'jdoe'}
d2 = {'color': 'blue', 'first_name': 'john', 'last_name': 'doe'}

d = ChainMap(d2, d1)
for k, v in d.items():
    print(k, v)

color blue
user jdoe
first_name john
last_name doe


# Counter

Like `defaultdict`, `Counter` initialize values associated with keys to 0. Note how we get rid of checking to see if a key entry exists?

## Don't do this

In [17]:
d = {}
for name in names:
    key = len(name)
    if key not in d:
        d[key] = 0
    d[key] = d[key] + 1
    
print(d)

{4: 2, 6: 2, 5: 1, 8: 1}


## Do this

In [18]:
d = defaultdict(int)

for name in names:
    key = len(name)
    d[key] = d[key] + 1

print(d)

defaultdict(<class 'int'>, {4: 2, 6: 2, 5: 1, 8: 1})


In [19]:
from collections import Counter

d = Counter()
for name in names:
    key = len(name)
    d[key] = d[key] + 1
    
print(d)

Counter({4: 2, 6: 2, 5: 1, 8: 1})


# Ignoring tuples

Try not to create that extra variable declaration when unpacking tuples.

## Don't do this

In [20]:
def get_info():
    return 'John', 'Doe', 28

fname, lname, tmp = get_info()
print(fname, lname)

John Doe


## Do this

In [21]:
def get_info():
    return 'John', 'Doe', 28

fname, lname, _ = get_info()
print(fname, lname)

John Doe


# namedtuple

The key here is to avoid accessing `tuples` by indicies since those indicies are meaningless. Instead, use `namedtuple` and access elements of the tuple by a meaningful name.

## Don't do this

In [22]:
scores = [80, 90, 95, 88, 99, 93]

students = [(name, score) for name, score in zip(names, scores)]
for student in students:
    print('{} {}'.format(student[0], student[1]))

john 80
jane 90
jeremy 95
janice 88
joyce 99
jonathan 93


## Do this

In [23]:
from collections import namedtuple

Student = namedtuple('Student', 'name score')

students = [Student(name, score) for name, score in zip(names, scores)]
for student in students:
    print('{} {}'.format(student.name, student.score))

john 80
jane 90
jeremy 95
janice 88
joyce 99
jonathan 93


# Unpacking sequences

The key is to avoid long code that breaks up the coherent intention. In the discouraged approach, we receive a tuple, and store it in `s` and then for each element in `s`, use a different line to access the values. In the encouraged approach, the tuple is unpacked neatly into one line.


## Don't do this

In [24]:
def get_student():
    return 'john', 'doe', 88

s = get_student()
first_name = s[0]
last_name = s[1]
score = s[2]

print(first_name, last_name, score)

john doe 88


## Do this

In [25]:
first_name, last_name, score = get_student()

print(first_name, last_name, score)

john doe 88


# String concatentation

The key here is to avoid writing too much code just to concatenate a string. In the discouraged approach, note how we have to add logic to append a comma `,`? In the encourage approach, the for loop is gone and there is no more need for when to add a comma.

## Don't do this

In [26]:
s = ''
for i, name in enumerate(names):
    s += name
    if i < len(names) - 1:
        s += ', '

s

'john, jane, jeremy, janice, joyce, jonathan'

## Do this

In [27]:
', '.join(names)

'john, jane, jeremy, janice, joyce, jonathan'

# Updating sequences

There is not much differences between the discouraged and encouraged approaches here. However, removing an element by value rather than by index seems much more meaningful. 

## Don't do this

In [28]:
names = ['john', 'jane', 'jeremy', 'janice', 'joyce', 'jonathan']

del names[0]
print(names)

names.pop(0)
print(names)

names.insert(0, 'jerry')
print(names)

['jane', 'jeremy', 'janice', 'joyce', 'jonathan']
['jeremy', 'janice', 'joyce', 'jonathan']
['jerry', 'jeremy', 'janice', 'joyce', 'jonathan']


## Do this

In [29]:
from collections import deque

names = ['john', 'jane', 'jeremy', 'janice', 'joyce', 'jonathan']

names.remove('john')
print(names)

names.pop(0)
print(names)

names.insert(0, 'jerry')
print(names)

['jane', 'jeremy', 'janice', 'joyce', 'jonathan']
['jeremy', 'janice', 'joyce', 'jonathan']
['jerry', 'jeremy', 'janice', 'joyce', 'jonathan']


# decorators

The key here is to use the `lru_cache` decorator to cache results of functions that are [idempotent](https://en.wikipedia.org/wiki/Idempotence), especially if they are expensive to call. Note how calls to `add` takes about 700 milliseconds? However, using the `lru_cache` decorator, subsequent calls are on the order of microseconds.

## Don't do this

In [30]:
def add(n):
    return sum([i for i in range(n)])

In [31]:
%%time
add(10000000)

CPU times: user 523 ms, sys: 168 ms, total: 690 ms
Wall time: 691 ms


49999995000000

In [32]:
%%time
add(10000000)

CPU times: user 493 ms, sys: 151 ms, total: 644 ms
Wall time: 643 ms


49999995000000

## Do this

In [33]:
from functools import lru_cache

@lru_cache(maxsize=32)
def add(n):
    return sum([i for i in range(n)])

In [34]:
%%time
add(10000000)

CPU times: user 526 ms, sys: 177 ms, total: 703 ms
Wall time: 706 ms


49999995000000

In [35]:
%%time
add(10000000)

CPU times: user 3 µs, sys: 0 ns, total: 3 µs
Wall time: 6.2 µs


49999995000000

# Reading a file

The key here is to use a context manager to manage resources. 

## Don't do this

In [36]:
f = open('README.md')
try:
    data = f.read()
    print(data)
finally:
    f.close()

# Purpose

Some useful Jupyter notebook recipes.

* [do-this](https://nbviewer.jupyter.org/github/oneoffcoder/jupyter/blob/master/recipe/do-this.ipynb?flush_cache=true)
* [generate-multivariate-gaussian-data](https://nbviewer.jupyter.org/github/oneoffcoder/jupyter/blob/master/recipe/generate-multivariate-guassian-data.ipynb?flush_cache=true)
* [subplots-off-by-one](https://nbviewer.jupyter.org/github/oneoffcoder/jupyter/blob/master/recipe/subplots-off-by-one.ipynb?flush_cache=true)



## Do this

In [37]:
with open('README.md') as f:
    data = f.read()
    print(data)

# Purpose

Some useful Jupyter notebook recipes.

* [do-this](https://nbviewer.jupyter.org/github/oneoffcoder/jupyter/blob/master/recipe/do-this.ipynb?flush_cache=true)
* [generate-multivariate-gaussian-data](https://nbviewer.jupyter.org/github/oneoffcoder/jupyter/blob/master/recipe/generate-multivariate-guassian-data.ipynb?flush_cache=true)
* [subplots-off-by-one](https://nbviewer.jupyter.org/github/oneoffcoder/jupyter/blob/master/recipe/subplots-off-by-one.ipynb?flush_cache=true)



# Deleting a file

The key here is to avoid the `try/except` code and favor a context manager approach.

## Don't do this

In [38]:
import os

try:
    os.remove('test.tmp')
except OSError:
    pass

## Do this

In [39]:
from contextlib import suppress

with suppress(OSError):
    os.remove('test.tmp')

# List vs generator comprehensions

The key here is to avoid looping over elements and storing results. Instead, use a for or generator comprehension. Note that the `for` (note the brackets) comprehension eagerly evaluates the expressions and returns a list, but the `generator` (note the parentheses) lazily evaluates the expressions.

## Don't do this

In [40]:
results = []
for i in range(10):
    s = i ** 2
    results.append(s)
total = sum(results)
print(total)

285


## Do this

In [41]:
total = sum([i ** 2 for i in range(10)])
print([i ** 2 for i in range(10)])
print(total)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
285


In [42]:
total = sum((i ** 2 for i in range(10)))
print((i ** 2 for i in range(10)))
print(total)

<generator object <genexpr> at 0x105e14048>
285


# Filtering lists

Use a for comprehension to filter out values, not a for loop.

## Don't do this

In [43]:
nums = []
for i in range(100):
    if i % 2 == 0:
        nums.append(i)
print(nums)

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98]


## Do this

In [44]:
nums = [i for i in range(100) if i % 2 == 0]
print(nums)

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98]


# Clarify function calls with keyword arguments

When passing in values/arguments to a method, try to associate the values with the argument names.

## Don't do this

In [45]:
def format_information(first_name, last_name, age):
    return '{} {} is {} years old'.format(first_name, last_name, age)

format_information('John', 'Doe', 28)

'John Doe is 28 years old'

## Do this

In [46]:
format_information(first_name='John', last_name='Doe', age=28)

'John Doe is 28 years old'

In [47]:
format_information(**{
    'first_name': 'John',
    'last_name': 'Doe',
    'age': 28
})

'John Doe is 28 years old'

# Simultaneous state updates

The key here is to make your code more concise and avoid nuisance variables. In the discouraged approach, you create temporary variables to avoid mutating `x` and `y`. In the encouraged approach, all mutations occur in one coherent line.

## Don't do this

In [48]:
def update_x(x):
    return x + 1

def update_y(y):
    return y + 1

x = 3
y = 4
dx = 4
dy = 5

tmp_x = x + dx
tmp_y = y + dy
tmp_dx = update_x(x)
tmp_dy = update_y(y)

x = tmp_x
y = tmp_y
dx = tmp_dx
dy = tmp_dy

print(x, y, dx, dy)

7 9 4 5


## Do this

In [49]:
x = 3
y = 4
dx = 4
dy = 5

x, y, dx, dy = (x + dx, y + dy, update_x(x), update_y(y))

print(x, y, dx, dy)

7 9 4 5


# Single line function declarations

If you have one-liner functions, avoid using function declaration with `def`. Instead, use `lambda`.

## Don't do this

In [50]:
def add_one(x):
    return x + 1

add_one(3)

4

# Do this

In [51]:
add_one = lambda x: x + 1

add_one(3)

4

# Generator functions

Avoid generating large number of values or objects as they may take up memory. Use `yield` inside a function to generate values or objects as needed. Functions generating collections using `yield` are more space efficient and faster.

## Don't do this

In [52]:
%%time
def generate_sequential_numbers(n):
    nums = []
    for i in range(n):
        nums.append(i)
    return nums

sum(generate_sequential_numbers(10000000))

CPU times: user 933 ms, sys: 194 ms, total: 1.13 s
Wall time: 1.13 s


49999995000000

## Do this

In [53]:
%%time
def generate_sequential_numbers(n):
    for i in range(n):
        yield i

sum(generate_sequential_numbers(10000000))

CPU times: user 511 ms, sys: 4.58 ms, total: 515 ms
Wall time: 518 ms


49999995000000

## Or do this

In [54]:
%%time

generate_sequential_numbers = lambda n: (i for i in range(n))
sum(generate_sequential_numbers(10000000))

CPU times: user 522 ms, sys: 4.81 ms, total: 527 ms
Wall time: 530 ms


49999995000000

# Dictionary comprehension

Here, we want to create two dictionaries; index-to-word `i2w` and word-to-index `w2i`. In the discouraged approach, we create two dictionaries, use a for loop, and set the key-value pair with the help of `enumerate`; there are 5 lines of code. In the encouraged approach, using two lines of code, we can declare and instantiate the dictionaries with a for comprehension.

## Don't do this

In [55]:
words = ['i', 'like', 'to', 'eat', 'pizza', 'and', 'play', 'tennis']

i2w = {}
w2i = {}
for i, word in enumerate(words):
    i2w[i] = word
    w2i[word] = i
    
print(i2w)
print(w2i)

{0: 'i', 1: 'like', 2: 'to', 3: 'eat', 4: 'pizza', 5: 'and', 6: 'play', 7: 'tennis'}
{'i': 0, 'like': 1, 'to': 2, 'eat': 3, 'pizza': 4, 'and': 5, 'play': 6, 'tennis': 7}


## Do this

In [56]:
i2w = {i: word for i, word in enumerate(words)}
w2i = {word: i for i, word in enumerate(words)}

print(i2w)
print(w2i)

{0: 'i', 1: 'like', 2: 'to', 3: 'eat', 4: 'pizza', 5: 'and', 6: 'play', 7: 'tennis'}
{'i': 0, 'like': 1, 'to': 2, 'eat': 3, 'pizza': 4, 'and': 5, 'play': 6, 'tennis': 7}


# Set comprehension

Set comprehension avoids for loops.

## Don't do this

In [57]:
words = ['i', 'like', 'to', 'eat', 'pizza', 'and', 'play', 'tennis']

vocab = set()
for word in words:
    vocab.add(word)
    
print(vocab)

{'play', 'tennis', 'and', 'like', 'to', 'eat', 'i', 'pizza'}


## Do this

In [58]:
vocab = {word for word in words}
print(vocab)

{'play', 'tennis', 'and', 'like', 'to', 'eat', 'i', 'pizza'}


# Chained comparison operators

Some chained comparisons, like the one below, should be avoided. Notice the use of `and`? 

## Don't do this

In [59]:
x = 10
y = 15
z = 20

if x <= y and y <= z:
    print('hi')

hi


## Do this

In [60]:
if x <= y <= z:
    print('hi')

hi


# Falsy and truthy 

It's enough to use the variable to test for falsy or truthy.

## Don't do this

In [61]:
is_male = True

if is_male == True:
    print('is male is true')

is male is true


## Do this

In [62]:
if is_male:
    print('is male is true')

is male is true


# Ternary operator

There is no official ternary operator in Python, but we may use the `if/else` statement as follows to mimic the ternary operator.

## Don't do this

In [63]:
is_male = True

if is_male:
    gender = 'male'
else:
    gender = 'female'
    
print(gender)

male


## Do this

In [64]:
gender = 'male' if is_male else 'female'
print(gender)

male


# String interpolation

Note how we have to substitute `name` in twice? If we used variable names inside the substitution place holders, we only have to pass it in once. Also, note the use of `f-string` and `Template`.

## Don't do this

In [65]:
name = 'John'
food = 'pizza'
sport = 'tennis'

sentence = '{} likes to eat {}. {} likes to play {}.'.format(name, food, name, sport)
print(sentence)

John likes to eat pizza. John likes to play tennis.


## Do this

In [66]:
name = 'John'
food = 'pizza'
sport = 'tennis'

# variable substitution
sentence = '{name} likes to eat {}. {name} likes to play {}.'.format(food, sport, name=name)
print(sentence)

# f-string
sentence = f'{name} likes to eat {food}. {name} likes to play {sport}.'
print(sentence)

# string template
from string import Template
sentence = Template('$name likes to eat $food. $name likes to play $sport.')
print(sentence.substitute(name=name, food=food, sport=sport))

John likes to eat pizza. John likes to play tennis.
John likes to eat pizza. John likes to play tennis.
John likes to eat pizza. John likes to play tennis.


# Don't Repeat Yourself (DRY)

It's easier to do `'-'*15` to produce 15 consecutive dashes, than to type them all out.

## Don't do this

In [67]:
print('---------------')

---------------


## Do this

In [68]:
print('-'*15)

---------------


## Double undescores, dunders, __str__

Exploit dunders when doing object-oriented programming in Python. In particular, override the `__str__` dunder to enable a printer friendly representation of the object.

## Don't do this

In [69]:
class Student():
    def __init__(self, first_name, last_name):
        self.first_name = first_name
        self.last_name = last_name
        
student = Student('John', 'Doe')
print(student)

<__main__.Student object at 0x105e233c8>


## Do this 

In [70]:
class Student():
    def __init__(self, first_name, last_name):
        self.first_name = first_name
        self.last_name = last_name
        
    def __str__(self):
        return f'{self.first_name} {self.last_name}'
        
student = Student('John', 'Doe')
print(student)

John Doe
