<img src="img/uomlogo.png" align="left"/><br><br>
# PHYS20762 - Timing Your Code

(c) Hywel Owen  
University of Manchester  
29th April 2020

In this notebook, I show some hints and tricks for timing your code and improving its speed.

## Timing Code

First, we load the usual packages into Python:

In [18]:
# Uncomment the line below to be able to spin all the plots.
# %matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt
import math
from mpl_toolkits.mplot3d import Axes3D

plt.rcParams.update({'font.size': 14})
plt.style.use('default')

Jupyter provides us with so-called *magic* commands that operate within the Jupyter notebook but which are not really part of the Python language. For timing code, we have the most important commands:

* The line-magic timing commands **%timeit** and **%time**
* The cell-magic timing command **%%timeit** and **%time**
* The line-magic profiler command **%prun**

Line-magic commands operate on a single line of code (i.e. a command). Cell-magic commands operate on a whole Jupyter cell. Let's look at a simple example:

In [1]:
%timeit sum(range(1000))

13.7 µs ± 69.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


The **%timeit** command times a line of code, and automatically runs that line a number of times to give an average execution time. The number of times (the *loops* value) is automatically adjusted.

The cell-magic command **%%timeit** times the execution of a whole cell:

In [5]:
%%timeit
total = 0
for i in range(1000):
    for j in range(1000):
        total += i * j

81.8 ms ± 2.61 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


If we have a single line of code that takes a long-ish time to execute, then we can use the **%time** command to do a single time measurement

In [12]:
import random
L = [random.random() for i in range(1000000)]
%time L.sort()

CPU times: user 288 ms, sys: 7.48 ms, total: 295 ms
Wall time: 309 ms


Similarly, if we have a cell that we want to time once, then we can use **%%time**:

In [14]:
%%time
total = 0
for i in range(5000):
    for j in range(5000):
        total += i * j

CPU times: user 3.49 s, sys: 15.9 ms, total: 3.51 s
Wall time: 3.55 s


We can get more detail about how a code (with sub-functions) runs by using the code profiler **%prun**. Let's try it on the standard **Numba** example that calculates $\pi$:

In [59]:
def monte_carlo_pi(nsamples):
    acc = 0
    for i in range(nsamples):
        x = np.random.random()
        y = np.random.random()
        if (x ** 2 + y ** 2) < 1.0:
            acc += 1
    return 4.0 * acc / nsamples

%prun monte_carlo_pi(100000)

 

## Improving Execution Speed

Let's look at some basic methods of improving execution speed.

### Method 1 - Optimising Loops

We imagine carrying out some integral in 2 dimensions, where the integral depends on some parameters (here, **a**, **b** and **c**). We time the code as usual:

In [34]:
%%timeit
# Integrating a function over some x,y domain
cum_int = 0
for x in np.arange(0,2*np.pi,0.01):
    for y in np.arange(0,2*np.pi,0.01):
        a = 5.2 # Define some parameters to be used in the integral function
        b = np.exp(a)
        c = a*b
        cum_int = cum_int + np.sin(c*x)*np.cos(c*y)
print(cum_int)

0.31358747187644437
0.31358747187644437
0.31358747187644437
0.31358747187644437
0.31358747187644437
0.31358747187644437
0.31358747187644437
0.31358747187644437
1.75 s ± 44.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


A basic way to speed things up is to move the repeated calculation of **c** out of the loop, since it is a constant value:

In [36]:
%%timeit
# Integrating a function over some x,y domain
cum_int = 0
a = 5.2 # Define some parameters to be used in the integral function
b = np.exp(a)
c = a*b
for x in np.arange(0,2*np.pi,0.01):
    for y in np.arange(0,2*np.pi,0.01):
        cum_int = cum_int + np.sin(c*x)*np.cos(c*y)
print(cum_int)

0.31358747187644437
0.31358747187644437
0.31358747187644437
0.31358747187644437
0.31358747187644437
0.31358747187644437
0.31358747187644437
0.31358747187644437
1.28 s ± 8.74 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


### Method 2 - Compiling code with Numba

We can drastically speed up code by compiling it. Python provides a way to do this easily with **Numba**. First, we create a compiled function that does the integral:

In [44]:
from numba import jit

@jit(nopython=True)
def do_integral():
    cum_int = 0
    a = 5.2 # Define some parameters to be used in the integral function
    b = np.exp(a)
    c = a*b
    for x in np.arange(0,2*np.pi,0.01):
        for y in np.arange(0,2*np.pi,0.01):
            cum_int = cum_int + np.sin(c*x)*np.cos(c*y)
    return(cum_int)
    
answer = do_integral()
print(answer)

0.31358747186603225


Then we time how long it takes to do the calculation:

In [45]:
%%timeit
do_integral()

5.02 ms ± 51.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


This is a *lot* faster.

### Method 3 - Using NumPy

We can also speed up code by using NumPy. Here, we compare two ways of numerically integrating $\cos (x)$ over $2 \pi$. First, we use a loop:

In [49]:
# Using a for loop
def int_cos_x_1():
    cum_int = 0
    for x in np.arange(0,2*np.pi,0.0001):
        cum_int = cum_int + np.cos(x)
    return(cum_int)

%timeit int_cos_x_1()

91.4 ms ± 1.02 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


Second, we use NumPy's array operations, and remove the loop along with the definition of any temporary variables:

In [58]:
# Using NumPy instead of a for loop
def int_cos_x_2():
    return np.sum(np.cos(np.arange(0,2*np.pi,0.0001)))

%timeit int_cos_x_2()

821 µs ± 85.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


By the way, both methods do give the same answer (pretty much):

In [57]:
print(int_cos_x_1())
print(int_cos_x_2())

0.14692820412840502
0.14692820406344254
