In [1]:
import numpy as np


# Efficient Computing with NumPy
with Jake Vanderplas

**Topics**
* numpy (efficient python)

In [2]:
def func(N):
    d = 0.
    for i in range(N):
        d += (i%3 - 1) *i
    return d

%timeit func(10000)

1000 loops, best of 3: 1.56 ms per loop


17,9 micro.s for fortran

- each operations comes with small type-checking overheadd
- reference operation

numpy is desidned for :

- fast developement time of python
- fast execution time of C/fortran
by pushing repeated operations into a statically-type

4 strategies:

- use **ufunc**
- use **aggregating**
- use **broadcasting**
- use **slicing, masking and fancy indexing**

Overall goal: 
Push repeated operations into compiled code and * **get rid of slow loops** *

## Ufunc

element-wise operations

In [3]:
import numpy as np

a = list(range(100000))
%timeit [val+5 for val in a]

a = np.array(a)
%timeit a+5

100 loops, best of 3: 4.85 ms per loop
The slowest run took 5.09 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 74.4 µs per loop


- arithmetic : + - * / // % **
- bitwise : & | ~ ^ >> <<
- Comparison : > < >= <= == !=
- trig
- exp
- special

## Aggregations

functions which summarize the values in an array :
min, max, sum, mean...

In [4]:
from random import random
c = [random() for i in range(100000)]

%timeit min(c)

c = np.array(c)
%timeit c.min()

100 loops, best of 3: 2.58 ms per loop
The slowest run took 17.11 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 43.6 µs per loop


In [5]:
M = np.random.randint(1,10,(3,5))
print M
print ' '
print M.sum()
print ' '
print M.sum(axis=0)
print M.sum(axis=1)

[[9 7 3 8 4]
 [5 6 7 2 3]
 [9 7 1 2 2]]
 
75
 
[23 20 11 12  9]
[31 23 21]


## Broadcasting

set of rules by whitch ufunc operate on arrays of sizes and/or dimensions

In [6]:
a = np.arange(3) + 1
b = np.ones((3,3)) + np.arange(3)
c = np.arange(3)
c = c.reshape((3,1)) + c

print a
print ' '
print b
print ' '
print c

[1 2 3]
 
[[ 1.  2.  3.]
 [ 1.  2.  3.]
 [ 1.  2.  3.]]
 
[[0 1 2]
 [1 2 3]
 [2 3 4]]


Rule of broadcasting :

- 1: if array shapes differ, lef-pad the smaller shape with 1s
- 2: if any dimension does not match, broadcast the dimension with size 1
- 3: if neither non-matching dimension is 1, raise error

## Slicing, masking, fancy index

In [7]:
L = [1,2,3,4,5]
print L[0], L[1:-2]

1 [2, 3]


In [18]:
l = np.arange(12)
mask = (l < 4) | (l > 8)
print(mask)
print(l[mask])

[ True  True  True  True False False False False False  True  True  True]
[ 0  1  2  3  9 10 11]


In [9]:
l[[0,2,4,6,8]]

array([0, 2, 4, 6, 8])

In [29]:
l = np.arange(9).reshape(3,3)
print(l)

[[0 1 2]
 [3 4 5]
 [6 7 8]]


axis

In [36]:
print(l.sum(axis=0))
print(l.sum(axis=0) > 4)
print(l.sum(axis=1) > 4)

[ 9 12 15]
[ True  True  True]
[False  True  True]


In [37]:
print(l[l.sum(axis=0) > 4])
print(l[l.sum(axis=1) > 4])
print(l[l.sum(axis=1) > 4, 1:])

[[0 1 2]
 [3 4 5]
 [6 7 8]]
[[3 4 5]
 [6 7 8]]
[[4 5]
 [7 8]]
