# Intro to NumPy

## Greg Teichert
### Consulting for Statistics, Computation, and Analytics Research (CSCAR)

## Numpy
- is a Python package developed for scientific computing,
- is built largely around an n-dimensional array (ndarray) object
- includes various other capabilities, including linear algebra and random number generation. (see numpy.org)

### Resources
This workshop is based in part on the following:
- https://docs.scipy.org/doc/numpy/user/index.html (NumPy User Guide: Quickstart Tutorial, and NumPy Basics)
- https://github.com/kshedden/numpy_workshop (Previous NumPy workshop materials, by Kerby Shedden)

To get started, import the NumPy package.

In [2]:
import numpy as np

## Data types

- NumPy has multiple data types.
- In some cases, the number of bits (e.g. 16, 32, 64) can be specified.
- An underscore represents the default size.
For example:

In [3]:
np.int64(4.3)

4

In [4]:
np.float64(4.3)

4.3

In [5]:
np.bool_(1)

True

## Array creation

Create ndarray of zeros. Input is a tuple defining the shape.

In [6]:
np.zeros((3,2))

array([[0., 0.],
       [0., 0.],
       [0., 0.]])

Create ndarray from Python lists. You can specify the number type (default is `np.float64`).

In [7]:
np.array([[1.,2.,3.],[4.,5.,6.]],dtype=np.int_)

array([[1, 2, 3],
       [4, 5, 6]])

In [8]:
np.array([[1.,2.,3.],[4.,5.,6.]],dtype=np.float_)

array([[1., 2., 3.],
       [4., 5., 6.]])

In [9]:
np.array([[[1.,2.],[3.,4.]],[[5.,6.],[7.,8.]]],dtype=np.float_)

array([[[1., 2.],
        [3., 4.]],

       [[5., 6.],
        [7., 8.]]])

Use the `np.arange` function to create a ndarray with a given start, end, and stride:

In [10]:
np.arange(1.1,3.4,0.3)

array([1.1, 1.4, 1.7, 2. , 2.3, 2.6, 2.9, 3.2])

Create a ndarray of random numbers. We'll look more at NumPy's random number capabilities later.

In [11]:
np.random.rand(4,4)

array([[0.7574279 , 0.49612518, 0.99195377, 0.9888221 ],
       [0.45154743, 0.07085729, 0.21602362, 0.19455165],
       [0.98306179, 0.66897983, 0.36486491, 0.00692022],
       [0.48973467, 0.94011161, 0.22750853, 0.51504384]])

## Input/Output

Save a ndarray to a file:

In [12]:
a = np.random.rand(4,4)
np.savetxt('out.txt',a, fmt='%.4e',delimiter='\t',header='4 x 4 random array')

Load the file to a ndarray:

In [13]:
np.loadtxt('out.txt')

array([[0.77986 , 0.8971  , 0.62501 , 0.42358 ],
       [0.56674 , 0.67472 , 0.21924 , 0.043458],
       [0.23973 , 0.51787 , 0.59153 , 0.014791],
       [0.99883 , 0.83486 , 0.54268 , 0.44902 ]])

## Indexing/Slicing

One nice feature of NumPy arrays is the versatility in accessing their components.

### Indexing

In [14]:
b = np.array([[1.,2.,3.],[4.,5.,6.],[7.,8.,9.]])
print(b)

[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]


Array elements and subdimensional arrays can be accessed with square brackets:

In [15]:
b[0]

array([1., 2., 3.])

In [16]:
b[0][1]

2.0

It's actually more efficient to use the following syntax for multiple indices:

In [17]:
b[0,1]

2.0

In [18]:
print(b)

[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]


Lists of indices can also be used:

In [19]:
b[0,[0,2]]

array([1., 3.])

In [20]:
b[[0,2],[0,1]]

array([1., 8.])

A list or array of booleans can also be used:

In [21]:
c = np.arange(0,.6,.1)
print(c)

[0.  0.1 0.2 0.3 0.4 0.5]


In [22]:
c[[True,False,False,True,False,True]]

array([0. , 0.3, 0.5])

### Slicing

In [23]:
print(b)

[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]


The colon (:) is used to specify slicing and striding.

Initial and final indices can be given (the final index is not inclusive).

In [24]:
b[1:3,0:2]

array([[4., 5.],
       [7., 8.]])

In [25]:
print(b)

[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]


The initial index can be dropped to start at the beginning.
The final index can be dropped to go to the end.

In [26]:
b[1:,:2]

array([[4., 5.],
       [7., 8.]])

If both indices are dropped, all elements along that axis are taken:

In [30]:
b[0,:]

array([1., 2., 3.])

A second colon can be used to define a stride:

In [31]:
c = np.arange(0,1,.1)
print(c)

[0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9]


In [32]:
c[1:9:2]

array([0.1, 0.3, 0.5, 0.7])

## Copies and views

When assigning data to a ndarray from a previous ndarray, three things can happen:
- No copy is made
- A shallow copy (a "view") is made
- A deep copy is made

- A simple assignment from one ndarray to another results in no copy being made.
- The original ndarray is simple given an additional name.

In [33]:
a = np.random.rand(2,4)
b = a
a is b

True

- Changes can be made using either name:

In [34]:
print(a)

[[0.63871354 0.4689623  0.11469522 0.9943418 ]
 [0.71834694 0.42662878 0.85770796 0.29991994]]


In [35]:
b[0,0] = 0.
print(a)

[[0.         0.4689623  0.11469522 0.9943418 ]
 [0.71834694 0.42662878 0.85770796 0.29991994]]


- A shallow copy or "view" creates a new ndarray.
- Data is still shared with the original array.
- A view can be explicity created using the `view()` method.

In [36]:
c = a.view()
c is a

False

In [37]:
c.base is a

True

In [38]:
print(a)

[[0.         0.4689623  0.11469522 0.9943418 ]
 [0.71834694 0.42662878 0.85770796 0.29991994]]


In [39]:
c[0,1] = 1.
print(a)

[[0.         1.         0.11469522 0.9943418 ]
 [0.71834694 0.42662878 0.85770796 0.29991994]]


Simple slicing also returns a view:

In [40]:
d = a[:,1]
print(d)
print(a)

[1.         0.42662878]
[[0.         1.         0.11469522 0.9943418 ]
 [0.71834694 0.42662878 0.85770796 0.29991994]]


In [41]:
d[1] = 2.
print(d)
print(a)

[1. 2.]
[[0.         1.         0.11469522 0.9943418 ]
 [0.71834694 2.         0.85770796 0.29991994]]


- A deep copy creates a new ndarray.
- Data is NOT shared with the original ndarray.
- A deep copy can be forced by using the `copy` method:

In [31]:
e = a.copy()
e is a

False

In [32]:
e.base is a

False

In [33]:
print(a)

[[0.77986147 0.89709782 0.62501241 0.42358489]
 [0.56674392 0.6747183  0.21924444 0.04345819]
 [0.23972711 0.51786984 0.59153272 0.01479142]
 [0.99883436 0.8348612  0.54268048 0.44902392]]


In [34]:
e[0,0] = 1.5
print(e)

[[1.5        0.89709782 0.62501241 0.42358489]
 [0.56674392 0.6747183  0.21924444 0.04345819]
 [0.23972711 0.51786984 0.59153272 0.01479142]
 [0.99883436 0.8348612  0.54268048 0.44902392]]


In [35]:
print(a)

[[0.77986147 0.89709782 0.62501241 0.42358489]
 [0.56674392 0.6747183  0.21924444 0.04345819]
 [0.23972711 0.51786984 0.59153272 0.01479142]
 [0.99883436 0.8348612  0.54268048 0.44902392]]


- "Complicated" indexing returns a deep copy, not a view:

In [46]:
f = a[1,[0,2,3]]
f.base is a

False

## Exercise
- Some methods of indexing return a view and others return a deep copy. Check the following indexing methods to see which returns what:

In [41]:
a[1:,:]

array([[0.56674392, 0.6747183 , 0.21924444, 0.04345819],
       [0.23972711, 0.51786984, 0.59153272, 0.01479142],
       [0.99883436, 0.8348612 , 0.54268048, 0.44902392]])

In [42]:
a[[0,1,2,3],0]

array([0.77986147, 0.56674392, 0.23972711, 0.99883436])

In [43]:
a[2,0]

0.23972710883642234

## Combining arrays

NumPy has multiple functions that combine a sequence of ndarrays into a single ndarray.

### `np.concatenate`


`np.concatenate` combines ndarrays while maintaining the dimensionality of the original ndarrays
- e.g. 2D arrays combine to form a larger 2D array

In [47]:
a = np.random.rand(2,2)
b = np.random.rand(2,2)
np.concatenate((a,b))

array([[0.72067581, 0.79654029],
       [0.82771505, 0.52877691],
       [0.12533318, 0.08123625],
       [0.604609  , 0.33760617]])

Arrays must have the same size, except in the dimension along which they are being combined.
- e.g., two arrays with sizes (3,4,4) and (2,4,4) can be concatenated along the 0-axis, but not the 1-axis or 2-axis.

The axis argument is used to define the axis along which the arrays will be concatenated.
- e.g., `axis=0` combines the two arrays as concatenated rows, with a final size of (4,2).
- This is the default behavior.

In [48]:
np.concatenate((a,b),axis=0)

array([[0.72067581, 0.79654029],
       [0.82771505, 0.52877691],
       [0.12533318, 0.08123625],
       [0.604609  , 0.33760617]])

Using `axis=1` combines the arrays as concatenated columns, with a final size of (2,4).

In [49]:
np.concatenate((a,b),axis=1)

array([[0.72067581, 0.79654029, 0.12533318, 0.08123625],
       [0.82771505, 0.52877691, 0.604609  , 0.33760617]])

### `np.vstack` and `np.hstack`

The `np.vstack` function ("vertical stack") is a shortcut for `np.concatenate` with `axis=0`:

In [50]:
np.vstack((a,b))

array([[0.72067581, 0.79654029],
       [0.82771505, 0.52877691],
       [0.12533318, 0.08123625],
       [0.604609  , 0.33760617]])

The `np.hstack` function ("horizontal stack") is a shortcut for `np.concatenate` with `axis=1`:

In [51]:
np.hstack((a,b))

array([[0.72067581, 0.79654029, 0.12533318, 0.08123625],
       [0.82771505, 0.52877691, 0.604609  , 0.33760617]])

### `np.stack`

- The function `np.stack` combines a sequence of arrays while increasing the dimensionality by one.
- The arrays being combined must all have the same size.

In [52]:
np.stack((a,b))

array([[[0.72067581, 0.79654029],
        [0.82771505, 0.52877691]],

       [[0.12533318, 0.08123625],
        [0.604609  , 0.33760617]]])

## Manipulating array shape

The dimensions of a ndarray are called it's "shape."

In [53]:
a = np.random.rand(3,2)
print(a)

[[0.97961457 0.43216571]
 [0.55713279 0.86060086]
 [0.25535308 0.18484114]]


In [54]:
a.shape

(3, 2)

The ndarray can be flattened into a 1D array:

In [55]:
a.flatten()

array([0.97961457, 0.43216571, 0.55713279, 0.86060086, 0.25535308,
       0.18484114])

In [56]:
a.ravel()

array([0.97961457, 0.43216571, 0.55713279, 0.86060086, 0.25535308,
       0.18484114])

`flatten()` always returns a deep copy, whereas `ravel()` will return a view, if possible.

The dimensions can be modified more generally, as long as the total number of elements remains the same:

In [57]:
a.reshape(2,3)

array([[0.97961457, 0.43216571, 0.55713279],
       [0.86060086, 0.25535308, 0.18484114]])

In [58]:
a.reshape(2,1,3)

array([[[0.97961457, 0.43216571, 0.55713279]],

       [[0.86060086, 0.25535308, 0.18484114]]])

In [59]:
print(a)

[[0.97961457 0.43216571]
 [0.55713279 0.86060086]
 [0.25535308 0.18484114]]


Note that `reshape` is returning a view. To actually modify the ndarray, use `resize`:

In [60]:
a.resize(2,1,3)
print(a)

[[[0.97961457 0.43216571 0.55713279]]

 [[0.86060086 0.25535308 0.18484114]]]


## Arithmetic operators

Basis arithmetic operators act elementwise between two arrays of the same shape:

In [61]:
a = np.zeros((2,1))
b = np.ones((2,1))
a+b

array([[1.],
       [1.]])

In [62]:
a*b

array([[0.],
       [0.]])

### Broadcasting
- Operations can take place between arrays of different shape in certain conditions.
- This is called "broadcasting".
- Starting from the trailing dimensions, either:
    - the two dimensions must be equal, or
    - one dimesion must be 1.

In [63]:
a = np.array([[1.,2.],[3.,4.]])
b = np.array([[1.],[2.]])

Other examples that work:

In [64]:
a = 5.
b = np.ones((2,3))
c = a*b

In [65]:
a = np.random.rand(1,3,2,1,5)
b = np.random.rand(    2,4,5)
c = a+b

Examples that don't work:

In [66]:
a = np.ones((2,2))
b = np.ones((3,2))
c = a+b

ValueError: operands could not be broadcast together with shapes (2,2) (3,2) 

In [67]:
a = np.random.rand(1,3,2,2,5)
b = np.random.rand(    2,4,5)
c = a+b

ValueError: operands could not be broadcast together with shapes (1,3,2,2,5) (2,4,5) 

### Exercise
Which of the following pairs of arrays can be broadcast together?

In [68]:
a = np.random.rand(3,4,2,4)
b = np.random.rand(1)

In [69]:
a = np.random.rand(3,3,2,1)
b = np.random.rand(1,2,1,1)

In [70]:
a = np.random.rand(3,1,2,1)
b = np.random.rand(2,2,11)

In [71]:
a = np.random.rand(3,4)
b = np.random.rand(3,4,2)

## Linear algebra

2D NumPy arrays can be treated as matrices for basic linear algebra operations:

In [72]:
a = np.array([[1.,2.],[3.,4.]])
print(a)

[[1. 2.]
 [3. 4.]]


Transpose:

In [73]:
a.T

array([[1., 3.],
       [2., 4.]])

In [74]:
a.transpose()

array([[1., 3.],
       [2., 4.]])

Inverse:

In [75]:
inva = np.linalg.inv(a)
print(inva)

[[-2.   1. ]
 [ 1.5 -0.5]]


Matrix multiplication:
- Remember the * operator for elementwise multiplication
- The @ operator does matrix multiplication in Python 3

In [76]:
inva.dot(a)

array([[1.00000000e+00, 0.00000000e+00],
       [1.11022302e-16, 1.00000000e+00]])

In [77]:
inva @ a

array([[1.00000000e+00, 0.00000000e+00],
       [1.11022302e-16, 1.00000000e+00]])

Matrix/vector solve:

In [78]:
b = np.array([[3],[7]])
np.linalg.solve(a,b)

array([[1.],
       [1.]])

Eigenvalues/eigenvectors:

In [79]:
np.linalg.eig(a)

(array([-0.37228132,  5.37228132]), array([[-0.82456484, -0.41597356],
        [ 0.56576746, -0.90937671]]))

Identity matrix:

In [80]:
np.eye(2)

array([[1., 0.],
       [0., 1.]])

## Random sampling

NumPy has several functions for random sampling.

Uniformly sampled from $[0,1)$ for a given shape:

In [81]:
np.random.rand(2,3)

array([[0.74525264, 0.48067078, 0.74492236],
       [0.92188655, 0.25579643, 0.47999468]])

Random sample from standard normal distribution (0 mean, 1 std. dev.):

In [82]:
np.random.randn(1,2)

array([[0.66299574, 2.48012976]])

Random integers, specifying lower (inclusive) and upper (exclusive) bounds:

In [83]:
np.random.randint(1,11,(1,3))

array([[6, 6, 4]])

Shuffle contents of array:
- Only along first axis - e.g. shuffling a matrix maintains rows
- `shuffle` modifies the original array
- `permiuation` returns a new, shuffled array

In [84]:
a = np.arange(0,12)
np.random.permutation(a)

array([ 2,  6,  8,  7,  9,  0,  3, 10,  5, 11,  4,  1])

In [85]:
b = np.array([[1.,2.],[3.,4.],[5.,6.]])
np.random.shuffle(b)
print(b)

[[1. 2.]
 [3. 4.]
 [5. 6.]]


## Other useful functions and methods

### Sorting

NumPy's `sort` function will sort elements, from low to high:
- Note that rows, columns, etc. are not maintained 
- The axis can be specifed (the default is to sort along the last axis)

In [86]:
a = np.array([[1.,5.],[3.,1.],[2,0.]])
print(a)

[[1. 5.]
 [3. 1.]
 [2. 0.]]


In [87]:
np.sort(a)

array([[1., 5.],
       [1., 3.],
       [0., 2.]])

In [88]:
np.sort(a,axis=0)

array([[1., 0.],
       [2., 1.],
       [3., 5.]])

This is also available as a method that modifies the array:

In [89]:
a.sort()
print(a)

[[1. 5.]
 [1. 3.]
 [0. 2.]]


The indices of the sorted elements are given using `argsort`:
- This is useful if you want to, for example, sort an array by the first column and maintain rows

In [90]:
ind = np.argsort(a,axis=0)
print(ind)

[[2 2]
 [0 1]
 [1 0]]


### Exercise

Use `argsort` to sort the following matrix by the first column, while keeping the contents of each row unchanged:

In [91]:
a = np.array([[3.,1.],[0.,4.],[1.,8.]])
print(a)

[[3. 1.]
 [0. 4.]
 [1. 8.]]


The correct answer should give:

In [92]:
a = np.array([[0.,4.],[1.,8.],[3.,1.]])
print(a)

[[0. 4.]
 [1. 8.]
 [3. 1.]]


### Reduction/summarization functions

NumPy has several functions that "reduce" an array to a single value or smaller set of values, such as `min`, `max`, `sum`, and `prod`:

In [103]:
a = np.random.randint(2,11,5)
print(a)

[5 9 8 4 7]


In [104]:
np.min(a)

4

In [105]:
np.max(a)

9

In [106]:
np.sum(a)

33

In [107]:
np.prod(a)

10080

### "Universal" functions

NumPy has several standard mathematical functions that apply elementwise to ndarrays, e.g:

In [108]:
a = np.random.rand(2,2)

In [109]:
np.exp(a)

array([[1.15711605, 1.25250497],
       [2.09738971, 1.23158021]])

In [110]:
np.sqrt(a)

array([[0.38200882, 0.47449502],
       [0.86063557, 0.45639683]])

In [111]:
np.sin(a)

array([[0.14541334, 0.22324822],
       [0.67479993, 0.20679506]])

In [114]:
np.arctan(a)

array([[0.14490788, 0.22145295],
       [0.63751834, 0.20536159]])