# Numpy
NumPy is which provides a multidimensional array object
to store homogeneous or heterogeneous data arrays and supports vectorization of
code.
The chapter covers the following data structures:
  

| Object type  | Meaning  |  Used for |   
|---|---|---|
|  ndarray (regular)  | n-dimensional array objec  | Large arrays of numerical data|  
|  ndarray (record)  | 2-dimensional array object  |  Tabular data organized in columns |  


We will cover:
* Arrays of data: handling arrays with pure python
* Regular numpy arrays: numerical data handled with Numpy ndarray class
* Structured numpy arrays: introdiction to structured or record ndarray objects for handling tabular data with columns
* Vectorization of code: how to vectorize functions
 

## Arrays of data
An array can be considered a vector if we are handling 1-dim objects, a matrix if we are working with 2-dim objects. This can be generalized to n-dimensional objects which in math are called tensors. In numpy we call them n-dimensional arrays or just ndarrays for short.

n-dimensional arrays are modelled in numpy by the <i>ndarray class<i/>.

## Arrays with python lists

In [1]:
v = [0.5, 0.75, 1.0, 1.5, 2.0]

In [2]:
# construct 2-dim array with lists
m = [v, v, v]
m

In [3]:
m[1]

In [4]:
m[1][0]

0.5

In [5]:
v1 = [0.5, 1.5]
v2 = [1, 2]
m = [v1, v2]
c = [m, m]
c

[[[0.5, 1.5], [1, 2]], [[0.5, 1.5], [1, 2]]]

In [6]:
c[1][1][0]

1

## Cube of numbers
We used reference pointers to the original objects to construct those objects.

In [7]:
v = [0.5, 0.75, 1.0, 1.5, 2.0]
m = [v, v, v]
m

[[0.5, 0.75, 1.0, 1.5, 2.0],
 [0.5, 0.75, 1.0, 1.5, 2.0],
 [0.5, 0.75, 1.0, 1.5, 2.0]]

In [8]:
v[0] = 'Python'
m

[['Python', 0.75, 1.0, 1.5, 2.0],
 ['Python', 0.75, 1.0, 1.5, 2.0],
 ['Python', 0.75, 1.0, 1.5, 2.0]]

## Use deepcopy()

In [9]:
from copy import deepcopy

In [10]:
 #v =  
# do we need the coma after deepcopy?

m = 3 * [deepcopy(v), ] # use copies instead of pointers

m


[['Python', 0.75, 1.0, 1.5, 2.0],
 ['Python', 0.75, 1.0, 1.5, 2.0],
 ['Python', 0.75, 1.0, 1.5, 2.0]]

In [11]:
a = ['a', 'b', 'c']
print(a)
b = a
b[0] = 'mylist'
print(b)
print(a)

['a', 'b', 'c']
['mylist', 'b', 'c']
['mylist', 'b', 'c']


In [12]:
c = deepcopy(a)
c[0] = 'another string'
print(a)
print(c)

['mylist', 'b', 'c']
['another string', 'b', 'c']


In [13]:
v[0] = 'Python'
m

[['Python', 0.75, 1.0, 1.5, 2.0],
 ['Python', 0.75, 1.0, 1.5, 2.0],
 ['Python', 0.75, 1.0, 1.5, 2.0]]

### Deep Copy
The copy method makes a complete copy of the array and its data.

## Python array class

In [14]:
import array

In [15]:
v = [0.5, 0.75, 1.0, 1.5, 2.0]
a = array.array('f', v)
a

array('f', [0.5, 0.75, 1.0, 1.5, 2.0])

In [16]:
a.append(0.5)
a

array('f', [0.5, 0.75, 1.0, 1.5, 2.0, 0.5])

In [17]:
a.extend([5.0, 6.75])
a

array('f', [0.5, 0.75, 1.0, 1.5, 2.0, 0.5, 5.0, 6.75])

In [18]:
a * 2

array('f', [0.5, 0.75, 1.0, 1.5, 2.0, 0.5, 5.0, 6.75, 0.5, 0.75, 1.0, 1.5, 2.0, 0.5, 5.0, 6.75])

In [19]:
try:
    a.append('hi')
except TypeError:
    print('must be a real number, not str')

must be a real number, not str


In [20]:
a.tolist()

[0.5, 0.75, 1.0, 1.5, 2.0, 0.5, 5.0, 6.75]

## Numpy arrays

In [21]:
import numpy as np

In [22]:
a = np.array([0, 0.5, 1.0, 1.5, 2.0])
a, type(a), a.shape

(array([0. , 0.5, 1. , 1.5, 2. ]), numpy.ndarray, (5,))

In [23]:
a = np.arange(2, 20, 2)
a

array([ 2,  4,  6,  8, 10, 12, 14, 16, 18])

In [24]:
a = np.array(['a', 'b', 'c'])
a

array(['a', 'b', 'c'], dtype='<U1')

In [25]:
a = np.arange(8, dtype=np.float)
a[5:]

array([5., 6., 7.])

In [26]:
a[:2]

array([0., 1.])

In [27]:
a.sum()

28.0

In [28]:
np.sum(a)

28.0

In [29]:
a.std()

2.29128784747792

In [30]:
a.cumsum()

array([ 0.,  1.,  3.,  6., 10., 15., 21., 28.])

In [31]:
a.mean()

3.5

In [32]:
l = [0., 0.5, 1.5, 3., 5.]
l * 2

[0.0, 0.5, 1.5, 3.0, 5.0, 0.0, 0.5, 1.5, 3.0, 5.0]

In [33]:
2 * a

array([ 0.,  2.,  4.,  6.,  8., 10., 12., 14.])

In [34]:
a ** 2

array([ 0.,  1.,  4.,  9., 16., 25., 36., 49.])

In [35]:
a ** a

array([1.00000e+00, 1.00000e+00, 4.00000e+00, 2.70000e+01, 2.56000e+02,
       3.12500e+03, 4.66560e+04, 8.23543e+05])

In [36]:
np.exp(a)

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03])

In [37]:
np.exp(l), type(l)

(array([  1.        ,   1.64872127,   4.48168907,  20.08553692,
        148.4131591 ]), list)

In [38]:
np.sqrt(a)

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131])

In [39]:
%timeit np.sqrt(2.5) # much slower for a python float than math.sqrt()

882 ns ± 43.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [40]:
import math
%timeit math.sqrt(2.5)

155 ns ± 4.12 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


## Multiple dimensions
The same operations as those performed on 1-dim objects carry over to n-dimensions

In [41]:
b = np.array([a, a * 2])
b

array([[ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.],
       [ 0.,  2.,  4.,  6.,  8., 10., 12., 14.]])

In [42]:
b[0]

array([0., 1., 2., 3., 4., 5., 6., 7.])

In [43]:
b[0, 2] # [][]

2.0

In [130]:
b[:, 1], b[:, 1].shape

(array([1., 2.]), (2,))

In [133]:
b[1, :]

array([ 0.,  2.,  4.,  6.,  8., 10., 12., 14.])

In [45]:
b.sum()

84.0

In [46]:
b.sum(axis=0)

array([ 0.,  3.,  6.,  9., 12., 15., 18., 21.])

In [47]:
b.sum(axis=1)

array([28., 56.])

## Initializing arrays

In [48]:
c = np.zeros((2, 3), dtype='i', order='C')
c

array([[0, 0, 0],
       [0, 0, 0]], dtype=int32)

In [49]:
c = np.ones((2, 3, 4), dtype='i', order='c')
c

array([[[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]],

       [[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]]], dtype=int32)

In [50]:
d = np.zeros_like(c, dtype='f', order='C')
d

array([[[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],

       [[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]]], dtype=float32)

In [51]:
d = np.ones_like(c, dtype='f', order='C')
d

array([[[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]],

       [[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]]], dtype=float32)

In [52]:
e = np.empty((2, 3, 2))
e
# try to use the function numpy_object.itemsize to the size of the object

array([[[0.0078125, 0.0078125],
        [0.0078125, 0.0078125],
        [0.0078125, 0.0078125]],

       [[0.0078125, 0.0078125],
        [0.0078125, 0.0078125],
        [0.0078125, 0.0078125]]])

In [53]:
f = np.empty_like(c)
f

array([[[1065353216, 1065353216, 1065353216, 1065353216],
        [1065353216, 1065353216, 1065353216, 1065353216],
        [1065353216, 1065353216, 1065353216, 1065353216]],

       [[1065353216, 1065353216, 1065353216, 1065353216],
        [1065353216, 1065353216, 1065353216, 1065353216],
        [1065353216, 1065353216, 1065353216, 1065353216]]], dtype=int32)

In [54]:
np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [55]:
g = np.linspace(5, 15, 12) # create array by slicing interval evenly in 12 parts
g

array([ 5.        ,  5.90909091,  6.81818182,  7.72727273,  8.63636364,
        9.54545455, 10.45454545, 11.36363636, 12.27272727, 13.18181818,
       14.09090909, 15.        ])

## Meta information about array

In [56]:
print('Number of elemts:', g.size)
print('Byte size:', g.itemsize)
print('Number of dimensions:', g.ndim)
print('Shape:', g.shape)
print('dtype:',g.dtype)

Number of elemts: 12
Byte size: 8
Number of dimensions: 1
Shape: (12,)
dtype: float64


## Reshapeing and Resizing
Although ndarray are immutable by default it is possible to reshape and resize them.

In [57]:
g = np.arange(15)
g

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [58]:
g.shape

(15,)

In [59]:
g.reshape((3, 5))#, g.reshape((4, 5)) # second statement is erroneous

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [136]:
g.reshape((-1, 5))

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [61]:
g.reshape((3, -1))

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [62]:
g

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [63]:
h = g.reshape((3, 5))
h

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [64]:
h.T

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

In [65]:
h.transpose()

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

In [66]:
h

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

Reshape does not change the number of elements in the array. Resizing does.

In [67]:
g

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [68]:
np.resize(g, (3, 1))

array([[0],
       [1],
       [2]])

In [69]:
np.resize(g, (1, 5))

array([[0, 1, 2, 3, 4]])

In [70]:
np.resize(g, (5, 4))

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14,  0],
       [ 1,  2,  3,  4]])

## Stacking requires the connecting dimension must be the same

In [71]:
h

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [72]:
np.hstack((h, 2 * h))

array([[ 0,  1,  2,  3,  4,  0,  2,  4,  6,  8],
       [ 5,  6,  7,  8,  9, 10, 12, 14, 16, 18],
       [10, 11, 12, 13, 14, 20, 22, 24, 26, 28]])

In [73]:
np.vstack((h, 2 *h))

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [ 0,  2,  4,  6,  8],
       [10, 12, 14, 16, 18],
       [20, 22, 24, 26, 28]])

## Flattening

In [74]:
h

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [75]:
h.flatten()

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [76]:
h

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [77]:
for i in h.flat:
    print(i, end=',')

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,

In [78]:
for i in h.ravel(order='C'):
    print(i, end=',')

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,

In [79]:
for i in h.ravel(order='F'):
    print(i, end=',')

0,5,10,1,6,11,2,7,12,3,8,13,4,9,14,

## Boolean arrays

In [80]:
h

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [81]:
h > 8

array([[False, False, False, False, False],
       [False, False, False, False,  True],
       [ True,  True,  True,  True,  True]])

In [82]:
h <= 7

array([[ True,  True,  True,  True,  True],
       [ True,  True,  True, False, False],
       [False, False, False, False, False]])

In [83]:
h == 5

array([[False, False, False, False, False],
       [ True, False, False, False, False],
       [False, False, False, False, False]])

In [84]:
(h == 5) & (h <= 12)

array([[False, False, False, False, False],
       [ True, False, False, False, False],
       [False, False, False, False, False]])

In [85]:
h[h > 8] # selection of objects

array([ 9, 10, 11, 12, 13, 14])

In [86]:
h[(h > 4) & (h <= 12)]

array([ 5,  6,  7,  8,  9, 10, 11, 12])

In [87]:
h[(h < 4) | (h >= 12)]

array([ 0,  1,  2,  3, 12, 13, 14])

In [88]:
h

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [89]:
np.where(h > 7, 1, 0)

array([[0, 0, 0, 0, 0],
       [0, 0, 0, 1, 1],
       [1, 1, 1, 1, 1]])

In [137]:
np.where(h % 2 == 0, 'steven', 'odd')

array([['steven', 'odd', 'steven', 'odd', 'steven'],
       ['odd', 'steven', 'odd', 'steven', 'odd'],
       ['steven', 'odd', 'steven', 'odd', 'steven']], dtype='<U6')

In [91]:
np.where(h <= 7, h * 2, h/2)

array([[ 0. ,  2. ,  4. ,  6. ,  8. ],
       [10. , 12. , 14. ,  4. ,  4.5],
       [ 5. ,  5.5,  6. ,  6.5,  7. ]])

## Speed comparison
Next example will generate a matrix with 5000 times 5000 elements drawn from the standard normal distribution.

In [92]:
import random

In [93]:
I = 5000

In [94]:
%time mat = [[random.gauss(0, 1) for j in range(I)] for i in range(I)]

Wall time: 22 s


In [95]:
mat[0][:5]

[-1.0150145685704535,
 -1.930265091823939,
 -1.324858551838952,
 -1.525607825474165,
 0.8312645728179264]

In [96]:
%time sum([sum(i) for i in mat])

Wall time: 132 ms


-707.6726131824801

In [97]:
import sys

In [98]:
sum([sys.getsizeof(i) for i in mat])

215200000

In [99]:
%time mat = np.random.standard_normal((I, I))

Wall time: 1.27 s


In [100]:
%time mat.sum()

Wall time: 34.9 ms


-4516.366867518507

In [101]:
mat.nbytes

200000000

In [102]:
sys.getsizeof(mat)

200000112

## Structured numpy arrays

In [103]:
# initialize array, similar to initializing a table in a SQL-database
dt = np.dtype([('Name', 'S10'), ('Age', 'i4'),
               ('Height', 'f'), ('Children/Pets', 'i4', 2)])
dt

dtype([('Name', 'S10'), ('Age', '<i4'), ('Height', '<f4'), ('Children/Pets', '<i4', (2,))])

In [104]:
s = np.array([('Smith', 45, 1.83, (0, 1)),
              ('Jones', 53, 1.72, (2, 2))], dtype=dt)
s

array([(b'Smith', 45, 1.83, [0, 1]), (b'Jones', 53, 1.72, [2, 2])],
      dtype=[('Name', 'S10'), ('Age', '<i4'), ('Height', '<f4'), ('Children/Pets', '<i4', (2,))])

In [105]:
s['Name']

array([b'Smith', b'Jones'], dtype='|S10')

In [106]:
s['Height'].mean()

1.7750001

In [107]:
s[0]

(b'Smith', 45, 1.83, [0, 1])

In [108]:
s[1]['Age']

53

# Vectorization of code
Code might be faster

In [109]:
np.random.seed(100)
r = np.arange(12).reshape((4, 3))
s = np.arange(12).reshape((4, 3)) * 0.5

In [110]:
r

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

In [111]:
s

array([[0. , 0.5, 1. ],
       [1.5, 2. , 2.5],
       [3. , 3.5, 4. ],
       [4.5, 5. , 5.5]])

In [112]:
r + s

array([[ 0. ,  1.5,  3. ],
       [ 4.5,  6. ,  7.5],
       [ 9. , 10.5, 12. ],
       [13.5, 15. , 16.5]])

In [113]:
r + 3 # numpy uses broadcasting

array([[ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [114]:
2 * r

array([[ 0,  2,  4],
       [ 6,  8, 10],
       [12, 14, 16],
       [18, 20, 22]])

In [115]:
2 * r + 3

array([[ 3,  5,  7],
       [ 9, 11, 13],
       [15, 17, 19],
       [21, 23, 25]])

# Broadcasting works as long as dimension match

## General Broadcasting Rules
When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions and works its way forward. Two dimensions are compatible when

* they are equal, or
* one of them is 1

If these conditions are not met, a ValueError: operands could not be broadcast together exception is thrown, indicating that the arrays have incompatible shapes. The size of the resulting array is the maximum size along each dimension of the input arrays.

Arrays do not need to have the same number of dimensions. For example, if you have a 256x256x3 array of RGB values, and you want to scale each color in the image by a different value, you can multiply the image by a one-dimensional array with 3 values. Lining up the sizes of the trailing axes of these arrays according to the broadcast rules, shows that they are compatible:

In [116]:
print(r)
print()
print(r.shape)

[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]

(4, 3)


In [117]:
s = np.arange(0, 12, 4)
s.shape, r.shape

((3,), (4, 3))

In [118]:
r + s

array([[ 0,  5, 10],
       [ 3,  8, 13],
       [ 6, 11, 16],
       [ 9, 14, 19]])

In [119]:
s = np.arange(0, 12, 3)
s.shape

(4,)

In [120]:
try:
    r + s
except ValueError:
    print('operands could not be broadcast together with shapes (4,3) (4,)')

operands could not be broadcast together with shapes (4,3) (4,)


In [121]:
r + s.reshape(-1, 1)

array([[ 0,  1,  2],
       [ 6,  7,  8],
       [12, 13, 14],
       [18, 19, 20]])

# Functions in python
Functions in python are defined with the **def** reserved word, a name for the function, a set of parenthesis and possibly a set of arguments and a colom followed by commands that are indented. In python multiple statements that are part of the same logic or set of instructions have to have the same indentation.

In [122]:
def f(x):
    return 3 * x + 5

In [123]:
f(0.5)

6.5

In [124]:
f(r)

array([[ 5,  8, 11],
       [14, 17, 20],
       [23, 26, 29],
       [32, 35, 38]])

In [125]:
def say_hello():
    print('Hello')

In [126]:
say_hello()

Hello


In [127]:
def square(x=2):
    return x ** 2
    

In [128]:
square(9)

81