<img src="python_ecosystem.png">

(image: Ondřej Čertík)

In [2]:
import numpy as np

In [3]:
lst = list(range(10))

a = np.asarray(lst)

a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Arithmetic operations with arrays typically work elementwise

In [4]:
a + 1

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [5]:
a * 2

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

... which differs from Python lists and other sequences!

In [6]:
lst * 2

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [7]:
a**2

array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

In [31]:
a + 1/a

  a + 1/a


array([       inf, 2.5       , 4.25      , 6.16666667, 8.125     ])

In [44]:
np.nan+1,np.inf+1,np.inf*0,1./np.inf,np.inf/np.inf

(nan, inf, nan, 0.0, nan)

In [48]:
np.nan==np.nan,np.inf==np.inf

(False, True)

In [8]:
a[1]

1

In [83]:
a-=12
a

array([-24, -23, -22, -21, -20, -19, -18, -17, -16, -15])

In [82]:
a[[5,2,0,0,0,0]]

array([ -7, -10, -12, -12, -12, -12])

In [9]:
a[1]=3
print(a)

[0 3 2 3 4 5 6 7 8 9]


In [84]:
a[1:3]

array([-23, -22])

In [89]:
a[4:1:-2]

array([-20, -22])

In [90]:
a[::-1]

array([-15, -16, -17, -18, -19, -20, -21, -22, -23, -24])

In [91]:
a[3:5] = a[2:4]

In [92]:
a

array([-24, -23, -22, -22, -21, -19, -18, -17, -16, -15])

In [96]:
b=a[:]
b[1]=0
print(a)

[-24   0 -22 -22 -21 -19 -18 -17 -16 -15]


In [97]:
b=a.copy()
b[2]=0
print(b)
print(a)

[-24   0   0 -22 -21 -19 -18 -17 -16 -15]
[-24   0 -22 -22 -21 -19 -18 -17 -16 -15]


### Creating

In [16]:
def f(i):
    print('index=', i)
    return i**2
a=np.fromfunction(f,(5,),dtype=np.int64)
print(a)

index= [0 1 2 3 4]
[ 0  1  4  9 16]


In [18]:
a=np.zeros(3)
print(a)

[0. 0. 0.]


In [20]:
b=np.ones(3,dtype=np.int64)
print(b)

[1 1 1]


In [22]:
a=np.arange(0,9,2)
print(a)

[0 2 4 6 8]


In [24]:
a=np.linspace(0,8,5)
print(a)

[0. 2. 4. 6. 8.]


In [26]:
b=np.logspace(0,1,5)
print(b)

[ 1.          1.77827941  3.16227766  5.62341325 10.        ]


In [29]:
from numpy.random import random,normal
print(random(5))

[0.24789833 0.06625408 0.80727108 0.08520809 0.77777608]


In [30]:
print(normal(size=5))

[-0.03162672  0.64102851 -0.28263685  0.81379757 -0.48214996]


In [42]:
i=np.ones(5,dtype=np.int64)
a +=i

##  Boolean arrays

In [46]:
a**2 < 42

array([ True,  True,  True,  True,  True,  True,  True, False, False,
       False])

In [34]:
any(a**2 < 42), all(a**2 < 42)

(True, False)

In [16]:
lst < 42

TypeError: '<' not supported between instances of 'list' and 'int'

If for you the result of `lst < 42` is `False`, it means you are running Python 2, where a non-empty sequence is *"truthy"*.

** Boolean arrays can be used as indexing masks: **

In [47]:
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [48]:
a[a**2 < 42]

array([0, 1, 2, 3, 4, 5, 6])

In [50]:
3 < 5 < 6

True

In [51]:
10 < a**2 < 42

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

In [52]:
(10 < a**2) & (a**2 < 42)

array([False, False, False, False,  True,  True,  True, False, False,
       False])

Note the need to use the bitwise-AND operator and brackets.

Alternatively, use the special numpy syntax:

In [None]:
np.logical_and(10 < a**2, a**2 < 42)

**Caveat**: Assignments with boolean indexing modify the array in-place

In [53]:
a[(10 < a**2) & (a**2 < 42)] = -42

In [54]:
a

array([  0,   1,   2,   3, -42, -42, -42,   7,   8,   9])

More info: read up on *fancy indexing* vs *basic indexing*

## Arrays have useful methods

In [55]:
len(dir(a))

162

In [54]:
am = (a - a.mean()) / a.std()

In [55]:
am

array([-1.41421356, -0.70710678,  0.        ,  0.70710678,  1.41421356])

In [28]:
# compare to
from __future__ import division

a_mean = sum(x for x in a) / len(a)

from math import sqrt
a_std = sqrt(sum((x - a_mean)**2 for x in a) / len(a))

am_list = [(x - a_mean) / a_std for x in a]

In [29]:
am_list - am

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [31]:
np.median(a)

1.5

In [61]:
a *= am

In [62]:
np.sort(a)

array([-11.3137085 ,  -5.65685425,   0.        ,  45.254834  ,
       141.42135624])

In [63]:
a.sort()
a

array([-11.3137085 ,  -5.65685425,   0.        ,  45.254834  ,
       141.42135624])

## Arrays can be reshaped in $O(1)$ in time and memory

In [71]:
a = np.arange(0,10,1)

In [72]:
a.shape

(10,)

In [73]:
a.reshape(2, -1)

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [100]:
b = a.reshape(-1, 2)

In [101]:
b

array([[-24,   0],
       [-22, -22],
       [-21, -19],
       [-18, -17],
       [-16, -15]])

In [102]:
b.T @ b

array([[2081, 1429],
       [1429, 1359]])

In [104]:
b @ b.T

array([[576, 528, 504, 432, 384],
       [528, 968, 880, 770, 682],
       [504, 880, 802, 701, 621],
       [432, 770, 701, 613, 543],
       [384, 682, 621, 543, 481]])

In [107]:
v=np.array([1,-1],dtype=np.float64)
v=np.array([1,-1],dtype=np.float64)
print(b@v)

[-24.   0.  -2.  -1.  -1.]


In [109]:
u=np.linspace(1,2,2)
v=np.linspace(2,4,3)
print(u)
print(v)

[1. 2.]
[2. 3. 4.]


$a_{ij} = u_i v_j$ -- Outer product

In [111]:
a=np.outer(u,v)
print(a)

[[2. 3. 4.]
 [4. 6. 8.]]


$x_{ij} = u_j$ - horizontal
$y_{ij} = v_i$

In [113]:
x,y=np.meshgrid(u,v)
print(x)
print(y)

[[1. 2.]
 [1. 2.]
 [1. 2.]]
[[2. 2.]
 [3. 3.]
 [4. 4.]]


In [116]:
np.eye(4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [119]:
def f(i,j):
    print(i)
    print(j)
    return 10*i+j
t = np.fromfunction(f,(4,4),dtype=np.int64)
print(t.shape)
t

[[0 0 0 0]
 [1 1 1 1]
 [2 2 2 2]
 [3 3 3 3]]
[[0 1 2 3]
 [0 1 2 3]
 [0 1 2 3]
 [0 1 2 3]]
(4, 4)


array([[ 0,  1,  2,  3],
       [10, 11, 12, 13],
       [20, 21, 22, 23],
       [30, 31, 32, 33]])

In [131]:
a = np.random.rand(5,5)
from numpy.linalg import det,inv,solve,eig
det(a)
a1=inv(a)
print(a1)

[[-1.58516762  1.63580416  2.31054329  3.078005   -4.96854836]
 [-0.05419368 -0.97952298 -1.33741715 -2.83511691  5.44336628]
 [ 2.10733537  0.51048428 -2.15758197 -1.98006179  1.67288379]
 [ 0.47923682 -1.22950113 -0.66149778  0.77684929  1.81933974]
 [ 0.01484029 -0.39291703  3.55700431  2.94502574 -5.37518922]]


In [133]:
a@a1

array([[ 1.00000000e+00,  0.00000000e+00,  2.22044605e-16,
        -2.22044605e-16,  0.00000000e+00],
       [ 8.08814821e-17,  1.00000000e+00,  2.22044605e-16,
         5.55111512e-17,  0.00000000e+00],
       [ 9.62771529e-17,  1.66533454e-16,  1.00000000e+00,
        -2.22044605e-16,  0.00000000e+00],
       [ 2.66415579e-16,  4.70543743e-17, -7.80625564e-17,
         1.00000000e+00,  5.27355937e-16],
       [ 1.70436582e-16,  1.87350135e-16,  0.00000000e+00,
         1.66533454e-16,  1.00000000e+00]])

In [134]:
v=np.random.rand(5)
print(a1@v)

[ 2.96984271 -1.35571623 -1.77004811  0.20860948  2.36400912]


In [135]:
u=solve(a,v)
print(u)

[ 2.96984271 -1.35571623 -1.77004811  0.20860948  2.36400912]


In [136]:
print(a@u-v)

[1.11022302e-16 0.00000000e+00 0.00000000e+00 0.00000000e+00
 0.00000000e+00]


## Elementwise operations can be applied along an axis

In [124]:
b

array([[-24,   0],
       [-22, -22],
       [-21, -19],
       [-18, -17],
       [-16, -15]])

In [120]:
np.mean(b, axis=1)

array([-12. , -22. , -20. , -17.5, -15.5])

In [127]:
b.mean(0)

array([-20.2, -14.6])

In [38]:
np.mean(b, axis=1, keepdims=True)

array([[  0.5],
       [  2.5],
       [-42. ],
       [-17.5],
       [  8.5]])

## Slicing

A worked example: Neighborhood average of a two-dimensional array

From a student's email:

> A general issue of speed for the overall program. A single run with sufficient data points is taking about 2-3 weeks.


In [60]:
m, n = 4, 4
a = np.arange(m*n, dtype=float).reshape((m, n))
a

array([[ 0.,  1.,  2.,  3.],
       [ 4.,  5.,  6.,  7.],
       [ 8.,  9., 10., 11.],
       [12., 13., 14., 15.]])

In [61]:
# a non-vectorized way

b = np.zeros((m-1, n-1))
for i in range(m-1):
    for j in range(n-1):
        b[i, j] = a[i, j] + a[i+1, j] + a[i, j+1] + a[i+1, j+1]
b

array([[10., 14., 18.],
       [26., 30., 34.],
       [42., 46., 50.]])

In [62]:
# the syntax for a slice is `start:stop:step`

a[1:3, 0]

array([4., 8.])

In [63]:
a[1:-1, ...]

array([[ 4.,  5.,  6.,  7.],
       [ 8.,  9., 10., 11.]])

In [64]:
a[1:, ...]

array([[ 4.,  5.,  6.,  7.],
       [ 8.,  9., 10., 11.],
       [12., 13., 14., 15.]])

In [65]:
# a vectorized expression

b_vect = a[:-1, :-1] + a[1:, :-1] + a[:-1, 1:] + a[1:, 1:]

In [66]:
np.all(b_vect == b)

True

In [67]:
N = 1000
np.random.seed(1234)
r = np.random.random((N, N))

In [68]:
%%timeit 

r_av = np.zeros((N-1, N-1))
for i in range(N-1):
    for j in range(N-1):
        r_av[i, j] = r[i, j] + r[i+1, j] + r[i, j+1] + r[i+1, j+1]

1.91 s ± 14.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [69]:
%%timeit 

r_av = r[:-1, :-1] + r[1:, :-1] + r[:-1, 1:] + r[1:, 1:]

5.98 ms ± 256 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [70]:
1.31 / 8.43e-3

155.3973902728351

#### Conway's game of life

Cells live on a square grid. Each cell can be in either of two states: alive or dead. Cells interact with nearest neighbors.

At each *tick*, 

- Any live cell with `<2` neighbors dies, as if of underpopulation.
- Any live cell with `>3` neighbors dies, as if of overpopulation.
- Any dead cell with `=3` neighbors becomes a live cell, as if by reproduction.

From a cell-centric view to a whole-array formulation: for each cell, consider the sum of nine fields.
- If the sum `= 3`, central cenral cell's state is life.
- If the sum `= 4`, the state of the central cell does not change
- Otherwise, it dies.

In [71]:
def step(X):
    """Given a game board ``X``, make a time step and return the result.
    
    NB: In this implementation the game field is finite.

    """
    num_neighb = (X[:-2, :-2]  + X[1:-1, :-2]  + X[2:, :-2] +
                  X[:-2, 1:-1] + X[1:-1, 1:-1] + X[2:, 1:-1] +
                  X[:-2, 2:]   + X[1:-1, 2:]   + X[2:, 2:])
    
    X[1:-1, 1:-1][num_neighb == 3] = 1
    X[1:-1, 1:-1][(num_neighb != 4) & (num_neighb != 3)] = 0
    return X

https://jakevdp.github.io/blog/2013/08/07/conways-game-of-life

In [72]:
from scipy.signal import convolve2d

window = np.ones((3, 3))

def step_alternative(X):
    nbrs_count = convolve2d(X, window, mode='same', boundary='wrap') - X
    return (nbrs_count == 3) | (X & (nbrs_count == 2))

## Broadcasting

Overheard on the numpy-discussion mailing list at some point:

OP:

> I personally think that silent Broadcasting is not a good thing. I had recently a lot of trouble with row and column vectors which got bradcastet toghether ...

Chuck Harris (numpy RM):

> It's how numpy works. 

In [73]:
a = np.arange(5)
a

array([0, 1, 2, 3, 4])

In [74]:
a[:, None]

array([[0],
       [1],
       [2],
       [3],
       [4]])

In [75]:
a[:, None].shape

(5, 1)

In [76]:
a + a[:, None]

array([[0, 1, 2, 3, 4],
       [1, 2, 3, 4, 5],
       [2, 3, 4, 5, 6],
       [3, 4, 5, 6, 7],
       [4, 5, 6, 7, 8]])

<img src="fig_broadcast_visual_1.png">

http://www.astroml.org/book_figures/appendix/fig_broadcast_visual.html

Given two arrays, `S` and `P`, with shapes

$$
    \begin{aligned}
        \mathrm{S.shape} &= (\cdots, s_3, s_2, s_1) \\
        \mathrm{P.shape} &= (\cdots, p_3, p_2, p_1)
    \end{aligned}
$$

broadcasting works from the right backwards:

* If the number of dimensions of `S` and `P` is different, left-pad the smaller shape with ones.

* If $s_j = 1$ and $p_j \neq 1$, the corresponding axis of the `S` array is treated as if it were expanded to have $p_j$ elements.

* If $s_j \neq p_j \neq 1$, it's an error.

In [None]:
a + np.ones((6, 2))

## Universal functions, `ufuncs`

Universal functions of a single argument receive an array and work elementwise. Binary functions broadcast their arguments against each other and work elementwise.

In [77]:
a = np.array(list(range(10)), dtype=float) * np.pi / 10

In [78]:
np.sin(a)

array([0.        , 0.30901699, 0.58778525, 0.80901699, 0.95105652,
       1.        , 0.95105652, 0.80901699, 0.58778525, 0.30901699])

In [79]:
am = (a - a.mean()) / a.std()

1 / (1. + np.exp(am))

array([0.8273125 , 0.77180715, 0.70482648, 0.62766976, 0.54340985,
       0.45659015, 0.37233024, 0.29517352, 0.22819285, 0.1726875 ])

In [80]:
a = np.arange(10)

np.add(a, a)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [81]:
np.multiply(a, a)

array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

**NB**: `np.multiply` is *elementwise* multiplication. 

For linear algebra operations, use `np.dot`. Or, the matrix-multiply operator `@`, (Python >= 3.5 only)

In [83]:
a = np.ones((3, 4))
a

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [84]:
a.T @ a

array([[3., 3., 3., 3.],
       [3., 3., 3., 3.],
       [3., 3., 3., 3.],
       [3., 3., 3., 3.]])

### In-place operations

In [85]:
a = np.array(list(range(10)), dtype=float)
a

array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

In [86]:
a

array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

In [88]:
# pretty up printing 

opts = np.set_printoptions(precision=3)

In [89]:
# note the use of the `out=` argument 

np.exp(a, out=a)

array([1.000e+00, 2.718e+00, 7.389e+00, 2.009e+01, 5.460e+01, 1.484e+02,
       4.034e+02, 1.097e+03, 2.981e+03, 8.103e+03])

In [90]:
a

array([1.000e+00, 2.718e+00, 7.389e+00, 2.009e+01, 5.460e+01, 1.484e+02,
       4.034e+02, 1.097e+03, 2.981e+03, 8.103e+03])

#### `np.add.<TAB>`

- `np.add.reduce` is `np.sum`

- `np.add.accumulate` is `np.cumsum`

- `np.add.outer` has the outer-product semantics

In [None]:
# np.add.accumulate?

#### Cauchy matrix

Given two arrays $u_i$ and $v_i$, construct

$$
A_{ij} = \frac{1}{u_i - v_j}
$$

In [None]:
u = np.arange(3)
v = np.arange(3) + 0.5

A_ = np.zeros((len(u), len(v)))
for i in range(len(u)):
    for j in range(len(v)):
        A_[i, j] = 1. / (u[i] - v[j])
A_

In [None]:
A = 1. / np.subtract.outer(u, v)
A

Excercise: construct the Cauchy matrix from two 1D arrays without `np.subtract.outer`, using broadcasting only.


In [None]:
# Enter your code here

## Random number generation with [numpy.random](https://docs.scipy.org/doc/numpy/reference/routines.random.html)

In [None]:
np.random.seed(1234)    # For reproducibility, consider seeding your generator
np.random.random(size=10)

`np.random` provides a global stream of pseudo-random numbers. Under the hood, there is a single, global  `RandomState` object. In real code prefer using explicit `RandomState` objects

In [None]:
rndm = np.random.RandomState(1234)
rndm.uniform(size=11)

In [None]:
rndm.normal(loc=0, scale=8, size=11)

## (Some) `numpy` gotchas

#### `numpy` gotchas for python users

`lst[:]` is a copy of `lst`

`arr[:]` is a *view* into `arr` (for copying use `arr.copy()`)

#### `numpy` gotchas for pandas users

In [None]:
a = np.array([np.nan, 2., 3.])

import pandas as pd
s = pd.Series(a)

In [None]:
s.sum()

In `numpy`, NaN means "invalid" not "missing".

In [None]:
a.sum()

In [None]:
np.nansum(a)

#### never do `from numpy import *`

See [Exercise 26](https://github.com/rougier/numpy-100/blob/master/100%20Numpy%20exercises.md#26-what-is-the-output-of-the-following-script-) of Numpy 100 exercises.

### Further reading

Numpy docs http://docs.scipy.org/doc/numpy/reference/index.html

Jake Vanderplas' *Numpy Intro*
http://nbviewer.ipython.org/github/jakevdp/2013_fall_ASTR599/blob/master/notebooks/05_NumpyIntro.ipynb

and *Efficient Numpy* http://nbviewer.ipython.org/github/jakevdp/2013_fall_ASTR599/blob/master/notebooks/11_EfficientNumpy.ipynb

or *Loosing your loops*
https://speakerdeck.com/jakevdp/losing-your-loops-fast-numerical-computing-with-numpy-pycon-2015

Scipy lecture notes, incl Pauli Virtanen's *Advanced Numpy*
https://scipy-lectures.github.io/

Nicolas Rougier's *100 Numpy exercises*
https://github.com/rougier/numpy-100/