# Numpy/list/tuple speed benchmark

This is just a quick example of what to do to fulfil your need for speed in Python!

Python has the reputation to be slow but with a few tricks, it can be made much faster!

Usually, it just boils down to avoid certain mistakes. In particular, the use of loop to work on array => numpy is meant to solve this problem!

Let's proceed forward and call numpy first.

In [1]:
import numpy as np

This is the function we will use: a simple sum over the elements in something.

We create another function that check the type of the elements we sum, for comparison.

In [17]:
def sum_(x):
    su = 0.0
    for i in x:
        su += i
    return su

def sum_float(x):
    su = 0.0
    for i in x:
        su += float(i)
    return su

Let's do a first test with a list. We define it outside the timing cell to avoid counting the assignement time.

In [22]:
data = [float(i) for i in range(100)]
print(type(data))

<class 'list'>


 with the first function (good type in entry):

In [24]:
%%timeit
sum_(data)

2.67 µs ± 37.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


Now with the function that check/changes the type

In [25]:
%%timeit
sum_float(data)

11 µs ± 186 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


So sum with a loop over a list lis 1) slow, 2) much slower if checking the type.

Now let's try our first function with a tuple:

In [26]:
data = tuple(data)
print(type(data))

<class 'tuple'>


In [27]:
%%timeit
sum_(data)

2.67 µs ± 68.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


Apparently there is no difference between tuples and lists... And if we were passing a numpy array and loop over it?

In [28]:
data = np.array(list(data))
print(type(data))

<class 'numpy.ndarray'>


In [29]:
%%timeit
sum_(data)

9.89 µs ± 465 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


Wow. This is actually slow... So bottom line: don't do that!

Let's compare with the built-in sum function of numpy

In [30]:
%%timeit
data.sum

38.3 ns ± 0.709 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


There it is. **This is the way to go > we gained a factor ~ 100 in speed!**

The only thing to notice is that assigning an array is slower than the tuple or list:

In [31]:
%%timeit
data = np.array([0.,1.,2.,3.,4.,5.,6.,7.,8.,9.])

887 ns ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [32]:
%%timeit
data = [0.,1.,2.,3.,4.,5.,6.,7.,8.,9.]

69.5 ns ± 2.6 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [33]:
%%timeit
data = np.array([0.,1.,2.,3.,4.,5.,6.,7.,8.,9.])

899 ns ± 7.28 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


# Concluding remarks:

- use numpy when possible
- assign your arrays outside of loops or functions that will be called during iterative processes for gaining speed
- make sure that everything has the right type BEFORE providing arrays to functions. This is the weakness of a language like Python, where type is not usually something we worry about. Just declare all your arrays as float arrays (this is default in numpy), make sure to use float numbers in functions (e.g. 1.0 * myarray) and there is no problem of type!
- avoid looping over numpy array if possible, particularly in functions that will be called by solvers, etc.
