# Performance Python

In physics, we want to do calculations to either analyze existing data or to model physical processes. Very often, numerical methods courses will use a single programming language to teach various methods to do these types of calculations. The issue with this is that research groups will be using a variety of different languages, and the specifics of any particular numerical technique does not translate from one language to another.

In this short talk, we will take a look at an easy problem. We will use matrix multiplication as our sample problem to see what happens when you don't take the eccentricities of your language into account.

## Naive code reuse

Many examples and texts will have code examples for various techniques written in older primitive languages, such as C or FORTRAN. These types of languages are strongly typed, and have no syntax to easily support techniques like object oriented programming. When you want to use these numerical techniques in more modern languages, you may see some unintended consequences.

We will start by creating our initial two matrices for the rest of the talk. We will use the numpy module to generate a couple of large matrices with random floating point numbers in them.

In [12]:
import numpy as np
import math
rows = 200
cols = 200
A = np.random.random((rows,cols))
B = np.random.random((rows,cols))

From here, we might be tempted to just do the naive copy-paste of some matrix multiplication routine from C. When we do that, we might get something that looks like the following:

In [10]:
%%timeit
C = np.zeros((rows,cols))

i = 0
while (i < rows):
    j = 0
    while (j < cols):
        k = 0
        while (k < rows):
            C[i,j] += A[i,k] * B[k,j]
            k += 1
        j += 1
    i += 1


10.6 s ± 18.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [65]:
C[1,1]

8297.66274749814

In jupyter, we have access to builtin functions to manage timing of code. This is the '%%timeit' at the top of the following code block.

This is far from performant. The issue with this type of code is the object model in Python. Core Python is an untyped language. This means that any variable name can be used to refer to any type of object. So, whenever you use a variable anywhere in your code, Python needs to check what is referenced by that name and whether the operation you want to execute can be applied to that type of object. This means that Python is doing checks on every iteration of the loops above.

We can remove some of these checks by doing the loops in a more Pythonic way.

In [11]:
%%timeit
C = np.zeros((rows,cols))

for i in range(rows):
    for j in range(cols):
        for k in range(rows):
            C[i,j] += A[i,k] * B[k,j]


9.8 s ± 18.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [67]:
C[1,1]

8297.66274749814

This speed things up a bit, but not as much we could do.


## Using modules

A huge advantage of Python is the large environment of modules made available. We already cheated a bit by using numpy to create our matrices above. Below, we can see the effects of including more and more of the functionality of numpy. As an example, we'll look at applying a trigonometric function to an array of values. The purely Python version looks like

In [23]:
%%timeit

size = 10000000
vals = list(range(size))
ans = list(range(size))
for i in range(size):
    ans[i] = math.tan(vals[i])

5.22 s ± 82.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


It doesn't seem too bad in this case, since the work being done isn't too complicated. But it could be much worse for a more computationally intensive job. Python is wasting time on every call to tan, trying to figure out whether it can apply that function or not. We can force strict typing in Python by using numpy objects. Also, numpy provides functions that operate on entire objects at once.

In [31]:
%%timeit
size = 10000000
nd = np.arange(size)
nd_ans = np.tan(nd)

740 ms ± 14.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [25]:
size = 1000000
vals = list(range(size))
ans = list(range(size))
nd = np.arange(size)
nd_ans = np.arange(size)

In [28]:
%%timeit
for i in range(size):
    ans[i] = math.tan(vals[i])

473 ms ± 7.33 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [29]:
%%timeit
for i in range(size):
    nd_ans[i] = math.tan(nd[i])

803 ms ± 8.41 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
