This notebook uses cython, which requires a C compiler. Linux comes with a compiler. Install xcode for OSX and Visual Studio for windows.

In [1]:
%load_ext cython

In [2]:
import array
a = array.array('l',range(100))
s = 0

Sum up an array of numbers using python

In [3]:
def python_sum(a):
    global s
    s = 0
    for i in range(len(a)):
        for j in range(10000):
            s = s + a[i]
    return s

In [4]:
%timeit python_sum(a)

1 loop, best of 3: 142 ms per loop


Use cython, without changing the code

In [5]:
%%cython --annotate
def cython_sum1(a):
    global s
    s = 0
    for i in range(len(a)):
        for j in range(10000):
            s = s + a[i]
    return s

In [6]:
print('python sum: ',python_sum(a))
print('cython sum1: ',cython_sum1(a))
print('python sum')
%timeit python_sum(a)
print('cython sum1')
%timeit cython_sum1(a)

python sum:  49500000
cython sum1:  49500000
python sum
1 loop, best of 3: 134 ms per loop
cython sum1
10 loops, best of 3: 63.2 ms per loop


Does making s a local variable help?

In [7]:
%%cython --annotate
def cython_sum2(a):
    s = 0
    for i in range(len(a)):
        for j in range(10000):
            s = s + a[i]
    return s

In [8]:
print('python sum: ',python_sum(a))
print('cython sum1: ',cython_sum1(a))
print('cython sum2: ',cython_sum2(a))
print('python sum')
%timeit python_sum(a)
print('cython sum1')
%timeit cython_sum1(a)
print('cython sum2')
%timeit cython_sum2(a)

python sum:  49500000
cython sum1:  49500000
cython sum2:  49500000
python sum
1 loop, best of 3: 393 ms per loop
cython sum1
10 loops, best of 3: 56.7 ms per loop
cython sum2
10 loops, best of 3: 39.5 ms per loop


In [9]:
%%cython --annotate
from cpython cimport array

def cython_sum3(a):
    cdef long s = 0
    cdef array.array ta = a
    cdef long * ap = ta.data.as_longs
    for i in range(len(ta)):
        for j in range(10000):
            s = s + ap[i]
    return s

In [10]:
print('python sum: ',python_sum(a))
print('cython sum1: ',cython_sum1(a))
print('cython sum2: ',cython_sum2(a))
print('cython sum3: ',cython_sum3(a))
print('python sum')
%timeit python_sum(a)
print('cython sum1')
%timeit cython_sum1(a)
print('cython sum2')
%timeit cython_sum2(a)
print('cython sum3')
%timeit cython_sum3(a)

python sum:  49500000
cython sum1:  49500000
cython sum2:  49500000
cython sum3:  49500000
python sum
1 loop, best of 3: 415 ms per loop
cython sum1
10 loops, best of 3: 58.8 ms per loop
cython sum2
10 loops, best of 3: 50.8 ms per loop
cython sum3
The slowest run took 31.76 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 193 ns per loop


In [11]:
from numba import jit
@jit
def numba_sum(a):
    s = 0
    for i in range(len(a)):
        for j in range(10000):
            s = s + a[i]
    return s

In [13]:
print('python sum: ',python_sum(a))
print('cython sum1: ',cython_sum1(a))
print('cython sum2: ',cython_sum2(a))
print('cython sum3: ',cython_sum3(a))
print('numba sum: ', numba_sum(a))
print('python sum')
%timeit python_sum(a)
print('cython sum1')
%timeit cython_sum1(a)
print('cython sum2')
%timeit cython_sum2(a)
print('cython sum3')
%timeit cython_sum3(a)
print('numba sum')
%timeit numba_sum(a)

python sum:  49500000
cython sum1:  49500000
cython sum2:  49500000
cython sum3:  49500000
numba sum:  49500000
python sum
1 loop, best of 3: 165 ms per loop
cython sum1
10 loops, best of 3: 69 ms per loop
cython sum2
10 loops, best of 3: 36.5 ms per loop
cython sum3
The slowest run took 27.89 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 180 ns per loop
numba sum
The slowest run took 20.09 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 765 ns per loop
