Accelerate your Python code and way to more efficient programming 

Meng Lu

Of course, it will take a long time to gain skills and grow intuitiion, but this tutorial can already help you to develop some good habits and program more efficiently. 

Main message: 

1. Use %timeit (% for a line and %% for the whole chunk, e.g. %%timeit) and %prun to test the running time. Profiling Memory Use: %memit and %mprun (need to install the IPython exention memory_profiler).



In [52]:
#! pip install memory_profiler
#%load_ext memory_profiler

Collecting memory_profiler
  Downloading memory_profiler-0.59.0.tar.gz (38 kB)
Building wheels for collected packages: memory-profiler
  Building wheel for memory-profiler (setup.py) ... [?25ldone
[?25h  Created wheel for memory-profiler: filename=memory_profiler-0.59.0-py3-none-any.whl size=31307 sha256=63a1e820c98aa05489156a201d5e28450093079f283beffbae40de2938b2052f
  Stored in directory: /Users/meng/Library/Caches/pip/wheels/5d/66/23/1e7f1719b959ee9093d5025dbdcbe4c43a548ca510997f318f
Successfully built memory-profiler
Installing collected packages: memory-profiler
Successfully installed memory-profiler-0.59.0


In [43]:
%timeit x = map(lambda x: x**2, range(1,10))

318 ns ± 5.61 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [57]:
#Peak memory refers to the peak memory usage of your system (including memory usage of other processes) during the program runtime.

#Increment is the increment in memory usage relative to the memory usage just before the program is run (i.e. increment = peak memory - starting memory).
%memit map(lambda x: x**2, range(1,10))

peak memory: 89.45 MiB, increment: 0.20 MiB


2. Python already optimise a lot of things in pandas and numpy. Use the built-in functions instead of writing a function on your own. 

3. Use list comprehension, vectorisation, and at least apply/map/filter to replace the for loop. 

4. Use numpy.where and numpy.select to replace ifelse. 



5. -Useful: Pandas is built on numpy, so all the numpy optimisations are applicable to Pandas.  

In [1]:
import pandas as pd
import numpy as np

#download data
spreadurl = 'https://raw.githubusercontent.com/mengluchu/uncertainty/master/data_vis_exp/DENL17_uc.csv'
# load the data
ap = pd.read_csv(spreadurl)

ap_road = ap.filter (regex="road_class_2_50|wkd_day_value")

#### Python already optimise a lot of things in pandas and numpy. Use the built-in functions instead of writing a function on your own.


"The practice of replacing explicit loops with array expressions is commonly referred to as vectorisation." 


For example, we want to add an "1" to each elements of a column, we can do three ways: 
1. for-loop or while loop (will skip the code here)
2. use apply, 
3. use the build-in method so-called vectorisation. 

In [5]:
%timeit ap_road.apply(lambda x: x+1, axis=1)

68.3 ms ± 2.57 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [6]:
%timeit ap_road+1

77 µs ± 1.48 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [46]:
%%prun -s cumulative -l 6 
ap_road.apply(b, axis = 1) # -s cumulative means to sort by cumulative time, -l 10 means to sort the first ten 

 

#### Whenever possible, avoid "for loops"

Use list comprehension, vectorisation, and at least apply/map/filter to replace the for loop. 

In [62]:
# Your home work
# given a list, list(range(100))
# replace to last ten values by adding a 20 to each of them. e.g. 91 becomes 111.

In [63]:
%%timeit
#for loop
nums = list(range(100))

counter=10
 
for i in nums[-10:]:
    nums[ -counter] = i + 20
    counter -= 1


1.87 µs ± 29.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [64]:
%%timeit

nums = list(range(100))

nums_lasttenadd20 = nums[90:100]
add = []
for x in nums_lasttenadd20:
    add.append(x + 20)


1.77 µs ± 12.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [65]:
%%timeit
#list comprehension

nums = list(range(100))
nums[-10:]=[x+10 for x in nums[-10:]]

1.74 µs ± 70.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [81]:
a = np.random.random([100])
b = np.random.random([100])

Another example, for loop vs. vectorisation

In [82]:
%%timeit
outerproduct = np.zeros([100])
for i in range(len(a)): 
        outerproduct[j]= a[i]*b[i]



29.3 µs ± 760 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [83]:
%timeit np.multiply(a, b)

419 ns ± 7.69 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


#### Use numpy.where and numpy.select to replace ifelse.\


In [2]:
def b (df):
    if df["road_class_2_50"] > 1:
        return(1)
    else:
        return(0)
    
%timeit ap_road.apply(b, axis = 1)

# this is better than iterrow 

3.36 ms ± 165 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [3]:
%timeit a = np.where (ap_road["road_class_2_50"]>1, 1, 0) 

131 µs ± 1.78 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [47]:
%%prun -s cumulative -l 6 
a = np.where (ap_road["road_class_2_50"]>1, 1, 0)

 

np.select() for multiple if.. then 

In [4]:
%timeit np.where (ap_road["road_class_2_50"].values>1, 1, 0) 

# ap_road["road_class_2_50"].values get an array from a dataframe and can further speed it up.


6.2 µs ± 171 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


Exercise: 


1. Given two numpy arrays, perform dot multiplication: 1) with for loop, 2) with vectorisation. 

2. Use the dataframe ap, add a column "wkd_day_class" that is 1 for all the wkd_day_value > 20, and 0 otherwise. Don't use ifelse.  
        
        
        