# Coding Best Practices with Python

In [1]:
import pandas as pd
import numpy as np
import sys

# 1. Writing Efficient Python Code

Defining "efficient":      

Efficient code satisfy 2 key concepts:      
1) minimal completion time (fast runtime)       
2) minimal resource consumption (small memory footprint)        
i.e. reduce latency and memory overhead.      

Defining "Pythonic":       

Pythonic code tend to be less verbose and easier to interpret. (e.g. use list comprehension rather than for loop + append). Pythonic code is usually efficient code.        


Suppose you wanted to collect the names in the above list that have six letters or more. In other programming languages, the typical approach is to create an index variable (i), use i to iterate over the list, and use an if statement to collect the names with six letters or more

In [2]:
names = ['Jerry', 'Kramer', 'Elaine', 'George', 'Newman']

In [3]:
# Print the list created using the Non-Pythonic approach
i = 0
new_list= []
while i < len(names):
    if len(names[i]) >= 6:
        new_list.append(names[i])
    i += 1
print(new_list)

['Kramer', 'Elaine', 'George', 'Newman']


In [4]:
# more pythonic
# Print the list created by looping over the contents of names
better_list = []
for name in names:
    if len(name) >= 6:
        better_list.append(name)
print(better_list)

['Kramer', 'Elaine', 'George', 'Newman']


In [5]:
# best pythonic
# Print the list created by using list comprehension
best_list = [name for name in names if len(name) >= 6]
print(best_list)

['Kramer', 'Elaine', 'George', 'Newman']


In [6]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


## 1.1 Building with built-ins

Built-in components are referred to as the Python Standard Library.        
Built-in types: list, tuple, set, dict and others.       
Built-in func: print(), len(), range(), round(), enumerate(), map(), zip() etc.       
Built-in module: os, sys, itertools, collections, math etc.      


In [7]:
# range()
# Explicitly typing a list of numbers:
# nums = [0,1,2,3,4,5,6,7,8,9,10]

# using range(start,stop) and list
nums = range(0,11)
nums_list = list(nums)
print(nums_list)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


In [8]:
# range(stop)
nums = range(11)
nums_list = list(nums)
print(nums_list)

# note that range func returns a range object, which we can convert into a list

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


In [9]:
# range() with a step value
even_nums = range(2,11,2)
even_nums_list = list(even_nums)
print(even_nums_list)

[2, 4, 6, 8, 10]


In [10]:
# enumerate()
# enumerate() creates an index item pair for each item in the object provided.
letters = ["a", "b", "c", "d"]
indexed_letters = enumerate(letters)

indexed_letters_list = list(indexed_letters)
print(indexed_letters_list)

# enumerate will return an enumerate object, then can be converted into a list.

[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd')]


In [11]:
# enumerate with starting index
letters = ["a", "b", "c", "d"]
indexed_letters2 = enumerate(letters, start=5)

indexed_letters2_list = list(indexed_letters2)
print(indexed_letters2_list)

[(5, 'a'), (6, 'b'), (7, 'c'), (8, 'd')]


In [12]:
# map()
# map applies a function to each element in an object

nums = [1.5, 2.3, 3.4, 4.6, 5.0]

rnd_nums = map(round, nums)
print(list(rnd_nums))

[2, 2, 3, 5, 5]


In [13]:
# map() with lambda func.
nums = [1,2,3,4,5]
sqrd_nums = map(lambda x: x**2, nums)

print(list(sqrd_nums))

[1, 4, 9, 16, 25]


In [14]:
# Create a range object that goes from 0 to 5
nums = range(0,6)
print(type(nums))

# Convert nums to a list
nums_list = list(nums)
print(nums_list)

# Create a new list of odd numbers from 1 to 11 by unpacking a range object
nums_list2 = [*range(1,12,2)]
print(nums_list2)

<class 'range'>
[0, 1, 2, 3, 4, 5]
[1, 3, 5, 7, 9, 11]


suppose you had a list of people that arrived at a party you are hosting. The list is ordered by arrival (Jerry was the first to arrive, followed by Kramer, etc.

In [15]:
names = ['Jerry', 'Kramer', 'Elaine', 'George', 'Newman']
# non-pythonic way
indexed_names = []
for i in range(len(names)):
    index_name = (i, names[i])
    indexed_names.append(index_name)
    
print(indexed_names)

[(0, 'Jerry'), (1, 'Kramer'), (2, 'Elaine'), (3, 'George'), (4, 'Newman')]


In [16]:
# more pythonic
# Rewrite the for loop to use enumerate
indexed_names = []
for i,name in enumerate(names):
    index_name = (i,name)
    indexed_names.append(index_name) 
print(indexed_names)

# even more pythonic
# Rewrite the above for loop using list comprehension
indexed_names_comp = [(i,name) for i,name in enumerate(names)]
print(indexed_names_comp)

# very pythonic
# Unpack an enumerate object with a starting index of one
indexed_names_unpack = [*enumerate(names, start=1)]
print(indexed_names_unpack)

[(0, 'Jerry'), (1, 'Kramer'), (2, 'Elaine'), (3, 'George'), (4, 'Newman')]
[(0, 'Jerry'), (1, 'Kramer'), (2, 'Elaine'), (3, 'George'), (4, 'Newman')]
[(1, 'Jerry'), (2, 'Kramer'), (3, 'Elaine'), (4, 'George'), (5, 'Newman')]


Suppose you wanted to create a new list (called names_uppercase) that converted all the letters in each name to uppercase. you could accomplish this with the below for loop:

In [17]:
names = ['Jerry', 'Kramer', 'Elaine', 'George', 'Newman']

# not so pythonic
names_uppercase = []

for name in names:
  names_uppercase.append(name.upper())

print(names_uppercase)

['JERRY', 'KRAMER', 'ELAINE', 'GEORGE', 'NEWMAN']


In [18]:
# Use map to apply str.upper to each element in names
names_map  = map(str.upper, names)

# Print the type of the names_map
print(type(names_map))

# Unpack names_map into a list
names_uppercase = [*names_map]

# Print the list created above
print(names_uppercase)

<class 'map'>
['JERRY', 'KRAMER', 'ELAINE', 'GEORGE', 'NEWMAN']


## 1.2 The power of NumPy arrays

NumPy arrays provide a fast and memory efficient alternative to Python list.         
numpy arrays are homogeneous, means they must contain elements of the same type.      


In [19]:
nums_list = list(range(5))
print(nums_list)

[0, 1, 2, 3, 4]


In [20]:
nums_np = np.array(range(5))
print(nums_np)

[0 1 2 3 4]


In [21]:
# np array homogeneity
nums_np_ints = np.array([1,2,3])
print(nums_np_ints)

print(nums_np_ints.dtype)

nums_np_floats = np.array([1,2.5,3])
print(nums_np_floats)

print(nums_np_floats.dtype)

[1 2 3]
int64
[1.  2.5 3. ]
float64


In [22]:
# np array broadcasting
nums_np = np.array([-2,-1,0,1,2])
nums_np**2

array([4, 1, 0, 1, 4])

In [23]:
# 2-D list/array comparison

# list
nums2 = [[1,2,3],
       [4,5,6]]

# array 
nums_np = np.array(nums2)

# list slicing
print(nums2[0][1])
# array slicing
print(nums_np[0,1])

#return first col of list
print([row[0] for row in nums2])

# return first col of array
print(nums_np[:,0])

2
2
[1, 4]
[1 4]


In [24]:
# np boolean indexing
nums = [-2, -1, 0, 1, 2]
nums_np = np.array(nums)

# bool mask
print(nums_np > 0)

print(nums_np[nums_np > 0])


[False False False  True  True]
[1 2]


You have a list of guests (the names list). Each guest, for whatever reason, has decided to show up to the party in 10-minute increments. For example, Jerry shows up to Festivus 10 minutes into the party's start time, Kramer shows up 20 minutes into the party, and so on and so forth.        

We want to write a few simple lines of code, using the built-ins we have covered, to welcome each of your guests and let them know how many minutes late they are to your party

In [25]:
names = ['Jerry', 'Kramer', 'Elaine', 'George', 'Newman']

In [26]:
# Create a list of arrival times
arrival_times = [*range(10,60,10)]

print(arrival_times)

# clock is 3 min faster
# Convert arrival_times to an array and update the times
arrival_times_np = np.array(arrival_times)
new_times = arrival_times_np - 3
print(new_times)

# Use list comprehension and enumerate to pair guests to new times
guest_arrivals = [(names[i],time) for i,time in enumerate(new_times)]
print(guest_arrivals)



[10, 20, 30, 40, 50]
[ 7 17 27 37 47]
[('Jerry', 7), ('Kramer', 17), ('Elaine', 27), ('George', 37), ('Newman', 47)]


## 1.3 Runtime and profiling code 

magic command: enhancement on top of normal python syntax. They are prefix "%"         

IPython has magic command %timeit

In [27]:
%timeit rand_nums = np.random.rand(1000)
# no. of run: how many iteration we want to estimate runtime (-r)
# no. of loop: how many times the code is executed per run (-n)

9.42 µs ± 107 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [28]:
%timeit -r2 -n10 rand_nums = np.random.rand(1000)
# 2 runs each with 10 execution, i.e. 20 times is run

The slowest run took 7.41 times longer than the fastest. This could mean that an intermediate result is being cached.
42.9 µs ± 32.7 µs per loop (mean ± std. dev. of 2 runs, 10 loops each)


In [29]:
# single line of code
%timeit nums = [x for x in range(10)]

646 ns ± 12.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [30]:
# multiple lines of code
# %%timeit
# nums=[]
# for x in range(10):
#     nums.append(x)

In [31]:
# saving the timeit output
times = %timeit -o rand_nums = np.random.rand(1000)
print(times)

9.23 µs ± 73.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
9.23 µs ± 73.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [32]:
times.timings

[9.228286860000025e-06,
 9.172768669999982e-06,
 9.227165070000005e-06,
 9.206729550000006e-06,
 9.321932890000007e-06,
 9.340711689999992e-06,
 9.114619270000013e-06]

In [33]:
times.best

9.114619270000013e-06

In [34]:
times.worst

9.340711689999992e-06

In [35]:
# python data structure creation

# using formal name
formal_list = list()
formal_dict = dict()
formal_tuple = tuple()

# using (shorthand) literal syntax
literal_list = []
literal_dict = {}
literal_tuple = ()

# compare timing of creation
# formal
f_time = %timeit -o formal_dict = dict()

# literal
l_time = %timeit -o literal_dict = {}

103 ns ± 1.48 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
32.8 ns ± 0.492 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [36]:
# Create a list of integers (0-50) using list comprehension
%timeit nums_list_comp = [num for num in range(51)]

# Create a list of integers (0-50) by unpacking range
%timeit nums_unpack = [*range(51)]

1.67 µs ± 26.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
553 ns ± 8.34 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


## 1.4 Code profiling for runtime

What if we want to time a large code base or see line-by-line within a function ? -> code profiling      

Code Profiling is a technique used to describe how ling, and how often, various parts of a program are executed.    
It can be used for line-by-line analysis, and provide stats on individual pieces of our code w/o magic command %timeit.       

package used: line_profiler

In [37]:
heroes = ["Batman", "Superman", " Wonder Woman"]

hts = np.array([188.0, 191.0, 183.0])
wts = np.array([95.0, 101.0, 74.0])

In [38]:
def convert_units(heroes, heights, weights):
    new_hts = [ht*0.39370 for ht in heights]
    new_wts = [wt*0.39370 for wt in weights]
    
    hero_data = {}
    
    for i,hero in enumerate(heroes):
        hero_data[hero] = (new_hts[i], new_wts[i])
        
    return hero_data

In [39]:
convert_units(heroes, hts, wts)

{'Batman': (74.01559999999999, 37.4015),
 'Superman': (75.19669999999999, 39.7637),
 ' Wonder Woman': (72.0471, 29.1338)}

In [40]:
# use %timeit
%timeit convert_units(heroes, hts, wts)
# this only give us total execution time
# we could technically use %timeit on every line in the function

3.81 µs ± 83.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [41]:
# use line_profiler extension
%load_ext line_profiler

In [42]:
# magic command %lprun is from line_profiler
# -f: we want to profile a func
# follow by the name of the func, w/o ().
# then the func with arg
%lprun -f convert_units convert_units(heroes, hts, wts)
# hits: how many time that line is executed
# time: uses time unit
# per hit: avg amt of time spent executing a single line: Time/Hits
# % Time: percentage of time spent on a line wrt total time in the func

Timer unit: 1e-06 s

Total time: 2.3e-05 s
File: <ipython-input-38-57a50ec7a699>
Function: convert_units at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
     1                                           def convert_units(heroes, heights, weights):
     2         1         11.0     11.0     47.8      new_hts = [ht*0.39370 for ht in heights]
     3         1          4.0      4.0     17.4      new_wts = [wt*0.39370 for wt in weights]
     4                                               
     5         1          1.0      1.0      4.3      hero_data = {}
     6                                               
     7         4          4.0      1.0     17.4      for i,hero in enumerate(heroes):
     8         3          3.0      1.0     13.0          hero_data[hero] = (new_hts[i], new_wts[i])
     9                                                   
    10         1          0.0      0.0      0.0      return hero_data

In [43]:
def convert_units_broadcast(heroes, heights, weights):

    # Array broadcasting instead of list comprehension
    new_hts = heights * 0.39370
    new_wts = weights * 2.20462

    hero_data = {}

    for i,hero in enumerate(heroes):
        hero_data[hero] = (new_hts[i], new_wts[i])

    return hero_data

In [44]:
%lprun -f convert_units_broadcast convert_units_broadcast(heroes, hts, wts)

Timer unit: 1e-06 s

Total time: 3.4e-05 s
File: <ipython-input-43-097b3089decf>
Function: convert_units_broadcast at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
     1                                           def convert_units_broadcast(heroes, heights, weights):
     2                                           
     3                                               # Array broadcasting instead of list comprehension
     4         1         21.0     21.0     61.8      new_hts = heights * 0.39370
     5         1          3.0      3.0      8.8      new_wts = weights * 2.20462
     6                                           
     7         1          1.0      1.0      2.9      hero_data = {}
     8                                           
     9         4          4.0      1.0     11.8      for i,hero in enumerate(heroes):
    10         3          5.0      1.7     14.7          hero_data[hero] = (new_hts[i], new_wts[i])
    11                               

## 1.5 Code profiling for memory usage

can use built-in module: sys.     
This module contains system specific func and contains a nice method: sys.getsizeof(nums_list), which return the size of the object in bytes. 

In [45]:
# single obj size
nums_np = np.array(range(1000))
sys.getsizeof(nums_np)

8096

In [46]:
# line-by-line memory footprint
%load_ext memory_profiler

%mprun -f convert_units convert_units(heroes, hts, wts)
# any func profiled for memory must be defined in a file and imported. so here wont work
# so have to save in .py file

ERROR: Could not find file <ipython-input-38-57a50ec7a699>
NOTE: %mprun can only be used on functions defined in physical files, and not in the IPython environment.





In [47]:
from hero_funcs import convert_units

%load_ext memory_profiler

%mprun -f convert_units convert_units(heroes, hts, wts)

# results will be differeent on different platform and runs

The memory_profiler extension is already loaded. To reload it, use:
  %reload_ext memory_profiler



Filename: /Users/XavierTang/Documents/Data Science/Python/python_basics/hero_funcs.py

Line #    Mem usage    Increment   Line Contents
     1     96.2 MiB     96.2 MiB   def convert_units(heroes, heights, weights):
     2     96.2 MiB      0.0 MiB       new_hts = [ht*0.39370 for ht in heights]
     3     96.2 MiB      0.0 MiB       new_wts = [wt*0.39370 for wt in weights]
     4                                 
     5     96.2 MiB      0.0 MiB       hero_data = {}
     6                                 
     7     96.2 MiB      0.0 MiB       for i,hero in enumerate(heroes):
     8     96.2 MiB      0.0 MiB           hero_data[hero] = (new_hts[i], new_wts[i])
     9                                     
    10     96.2 MiB      0.0 MiB       return hero_data