## Numpy arrays are great, but not always faster than lists.

Numpy arrays get speed-up mainly from better memory allocation and avoiding redundant type-checking. Construction and accessing one element in Numpy array are slow. If you need to frequently access individual elements in an array, lists may be a better choice (compared to Numpy arrays).

In [73]:
import numpy as np

n = 50

def d(x, y):
    return np.log(np.sin(norm(x - y)) + 1)
def f(n):
    a = np.empty((n, n))
    for i in data:
        for j in data:
            

def fibonacci_numpy_array(n):
    array = np.ones(n, dtype=int)
    for i in range(2, n):
        array[i] = array[i - 1] + array[i - 2]
    return array


def fibonacci_list(n): 
    array = np.ones(n, dtype=int)
    array = array.tolist()
    for i in range(2, n):
        array[i] = array[i - 1] + array[i - 2]
    return np.array(array)
# We change the type of array twice in this function, but this is still faster.

assert all(fibonacci_numpy_array(n) == fibonacci_list(n))

print 'Numpy array'
%timeit -n1000 fibonacci_numpy_array(n)
print 'List'
%timeit -n1000 fibonacci_list(n)

Numpy array
1000 loops, best of 3: 24.9 µs per loop
List
1000 loops, best of 3: 13.3 µs per loop


## In-place operations

Again, avoid unnecessary copy of objects.

In [74]:
def multiplication():
    a = np.ones(10000000)
    a = a * 2 
    return a

def inplace_multiplication():
    a = np.ones(10000000)
    a *= 2     # In-place; mathematically equivalent to a = a * 2
    return a

assert all(multiplication() == inplace_multiplication())

print 'Mulitplication'
%timeit multiplication()
print 'In-place multiplication'
%timeit inplace_multiplication()

Mulitplication
10 loops, best of 3: 31.8 ms per loop
In-place multiplication
10 loops, best of 3: 23.9 ms per loop


## Multiplication versus division
Division is one of the slowest arithmetic operation on a CPU. Sometimes we can get speedup by avoiding division.

In [75]:
a = np.random.normal(size=100000)

def division():
    return a / 2

def multiplication():
    return a * (1.0 / 2) 

assert all(division() == multiplication())

print 'Divided by 2'
%timeit division()              # 100000 divisions
print 'Multiplied by 0.5'
%timeit multiplication()        # 100000 mulitplications and one division

Divided by 2
10000 loops, best of 3: 149 µs per loop
Multiplied by 0.5
10000 loops, best of 3: 51.5 µs per loop


## Get it even faster
Python is not designed for very extensive computation, but why not make scripts run faster if we can?

1. **Check package "Cython".** 
Cython provides a way to make Python scripts more C/C++ like. We may get speedup by giving up some Python/Numpy features such as duck typing and boundscheck.

2. **Get a better BLAS.**
When we call most matrix operations (addition, multiplication, spectral decomposition, SVD, etc.) in Numpy, it actually calls external "BLAS" libraries to do these tasks (https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms). To speedup matrix operations, we can compile Numpy with other BLAS libraries, e.g. Altas, OpenBLAS (both are BSD licensed), and Intel MKL (free for non-commercial use on Linux). Compiling Numpy with BLAS is a little bit non-trivial, but totally worth your time. See the below comparison.

In [76]:
n = 2000
A = np.random.normal(size=(n, n))
B = np.random.normal(size=(n, n))

print 'Default BLAS'
%timeit A.dot(B)

Default BLAS
1 loops, best of 3: 17.8 s per loop


In [2]:
import numpy_openblas as np
n = 2000
A = np.random.normal(size=(n, n))
B = np.random.normal(size=(n, n))

print 'Numpy with OpenBLAS'
%timeit A.dot(B)

Numpy with OpenBLAS
1 loops, best of 3: 693 ms per loop
