# Demo of some `numpy` features

## MCS 275 Spring 2024 - Emily Dumas

This is a quick tour of some `numpy` features.  For more detail see:
* [Chapter 2 of VanderPlas](https://jakevdp.github.io/PythonDataScienceHandbook/02.00-introduction-to-numpy.html)
* [The numpy documentation](https://numpy.org/doc/stable/)

## Importing the module

And checking the version.

In [248]:
import numpy as np
np.__version__

'1.21.5'

## Creating arrays

They are iterable and type-homogeneous.  Can make one from any suitable iterable.

[List of built-in dtypes](https://numpy.org/doc/stable/reference/arrays.scalars.html#arrays-scalars-built-in).

In [3]:
# `np.array` will convert from an iterable
x = np.array([2,4,8,16,32])

In [8]:
# List of lists -> 4 row, 3 column matrix (2 dimensional array)
A = np.array([[1,2,3],[4,5,6],[7,8,9],[0,2,0]])

In [11]:
# nice display
print(x)
print()
print(A)

[ 2  4  8 16 32]

[[1 2 3]
 [4 5 6]
 [7 8 9]
 [0 2 0]]


In [13]:
# ndarray class
type(x)

numpy.ndarray

Check number of dimensions

In [6]:
x.ndim # how many dimensions?

1

In [14]:
A.ndim

2

Check shape (size in each dimension)

In [7]:
x.shape # size in each dimension, as a tuple

(5,)

In [15]:
A.shape

(4, 3)

Check "length" (first elt of shape)

In [16]:
len(x) # number of items in the vector

5

In [17]:
len(A) # number of rows in the matrix = A.shape[0]

4

In general, `len(m)` means `m.shape[0]` if `m` is a numpy array.

Check data type

In [18]:
x.dtype # int64 means (signed) integer, 64 bits

dtype('int64')

Data type typically inferred but can be specified (potential lossy process)

In [19]:
# Given a mix of integers and floats, numpy
# will choose a floating point dtype
y= np.array([5,6,7,7.289])

In [20]:
y

array([5.   , 6.   , 7.   , 7.289])

In [21]:
y.dtype # float64 means float, 64 bits (double)

dtype('float64')

In [22]:
y_force_int = y= np.array([5,6,7,7.289], dtype="int")

In [25]:
# Notice we lost precision by specifying dtype int
y_force_int

array([5, 6, 7, 7])

In [24]:
# Notice numpy chose a precise type compatible with
# the request "int"
y_force_int.dtype

dtype('int64')

In [26]:
# uint8 means UNSIGNED integer, 8 bits
# UNSIGNED = only 0 and positive values
# range is 0...255
z = np.array([1,-1,2,100,300,500,800,16384], dtype="uint8")

In [27]:
z

array([  1, 255,   2, 100,  44, 244,  32,   0], dtype=uint8)

In [28]:
# Why did 300 appear as 44 in the array above?
300 % 256

44

## Filled arrays

Can fill with zeros, ones, or make an array full of a general value.

In [29]:
# Filled with zeros
np.zeros( (3,12), dtype="int64")

array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

In [30]:
# Filled with ones
np.ones( (6,2), dtype="float64")

array([[1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.]])

In [31]:
# Filled with one value
np.full( (7,4), 42, dtype="uint8" )

array([[42, 42, 42, 42],
       [42, 42, 42, 42],
       [42, 42, 42, 42],
       [42, 42, 42, 42],
       [42, 42, 42, 42],
       [42, 42, 42, 42],
       [42, 42, 42, 42]], dtype=uint8)

Can also ask for an array filled with random values (floats between 0 and 1, never exactly 1, uniformly distributed).  Note `np.random` is a submodule, you want `np.random.random(...)`

In [32]:
# Filled with random numbers between 0 and 1
np.random.random( (4,5) )  # argument is the shape

array([[0.84985369, 0.52904623, 0.06757972, 0.60749338, 0.44579558],
       [0.94276045, 0.22398809, 0.99290933, 0.9505469 , 0.65531365],
       [0.0944089 , 0.67245052, 0.25851387, 0.81402906, 0.02156303],
       [0.16531779, 0.87858936, 0.60636781, 0.24837322, 0.67221406]])

## Special things about 2D arrays

Identity (eye-dentity) matrix

In [3]:
# Identity matrix
np.eye(5)  # identity matrix of size (5,5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [5]:
np.eye(6,dtype="int") # 6x6 identity matrix, but integer entries

array([[1, 0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 0],
       [0, 0, 0, 1, 0, 0],
       [0, 0, 0, 0, 1, 0],
       [0, 0, 0, 0, 0, 1]])

In [7]:
np.eye(3,dtype="bool")  # 0 -> False, 1-> True

array([[ True, False, False],
       [False,  True, False],
       [False, False,  True]])

Transpose

In [8]:
A = np.array([[1,2,3],[9,8,7]])
A

array([[1, 2, 3],
       [9, 8, 7]])

In [9]:
# The transpose of A, which switches row and column roles
A.T

array([[1, 9],
       [2, 8],
       [3, 7]])

In [10]:
# defining property
i=0
j=2
print(A[i,j])
print(A.T[j,i])
# these are the same whenever i,j are integers
# so that the first makes sense.

3
3


## Vector algebra

In [33]:
# two 3-dimensional vectors
v = np.array([1,2,5])
w = np.array([4,-8,0])

Dot product (and vector length)

In [34]:
v.dot(w) # dot product
#   1*4 + 2*(-8) + 5*0

-12

In [35]:
v.dot(v)**0.5 # length

5.477225575051661

Scalar multiplication

In [36]:
1.8 * v # scalar multiplication

array([1.8, 3.6, 9. ])

Elementwise sum

In [37]:
v+w # elementwise sum

array([ 5, -6,  5])

Elementwise product (?!)

In [38]:
v*w # elementwise product

array([  4, -16,   0])

## Arithmetic progressions

* `np.arange` is `start`, `stop`, `step`
* `np.linspace` is `first`,`last`,`number`

In [39]:
# Recall how you get a list of integer values
# in arithmetic progression using built-in stuff
list(range(3,20,2))

[3, 5, 7, 9, 11, 13, 15, 17, 19]

The similarly named `arange` from `numpy` does all this and more.

In [40]:
# From 2 up to but not including 3 in steps of size 0.1
np.arange(2,3,0.1)   # start, stop (not included), step

array([2. , 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9])

When you know how many points you want, rather than the spacing, it's better to use `np.linspace`.  It takes the first and last elements, then the number of evenly-spaced points you want between them.

In [41]:
# From 12 to 14 in 6 steps; 
np.linspace( 12, 14, 6 )   # first, last, number of elements

array([12. , 12.4, 12.8, 13.2, 13.6, 14. ])

## Accessing items

Zero-based indexing.  For multi-dimensional arrays, give several integer indices separated by commas.

In [47]:
v = np.arange(8,24,3)
v

array([ 8, 11, 14, 17, 20, 23])

In [44]:
A = np.array([[1,2,3],[4,5,6],[7,8,9],[0,-2,16]])
A

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [ 0, -2, 16]])

### Vector indexing: Just like lists

In [48]:
v[0]

8

In [49]:
v[4]

20

In [51]:
v[-1]

23

### Multidimensional indexing: use a tuple of indices

For matrices, it's `[row, col]`

In [53]:
print(A)
print()
print(A[0,1])  # row 0 column 1

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [ 0 -2 16]]

2


Omitted indices at the end mean "everything from those dimensions"

In [54]:
A[2] # Row 2

array([7, 8, 9])

Using `:` as an index means "everything from that dimension"

In [56]:
print(A)
print()
# All rows, column 1; that is, get column 1 as a vector
print(A[:,1])  

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [ 0 -2 16]]

[ 2  5  8 -2]


## Assigning items

**`numpy` arrays are mutable** 😱

In [12]:
B = np.array([[2,4,8,16,32],[1,-1,1,-1,1],[0,0,0,5,6],[7,6,5,4,3]])
B

array([[ 2,  4,  8, 16, 32],
       [ 1, -1,  1, -1,  1],
       [ 0,  0,  0,  5,  6],
       [ 7,  6,  5,  4,  3]])

In [13]:
# item assignment supported
B[2,0] = 275   # row 2, col 0 entry becomes 275

In [14]:
B # now changed

array([[  2,   4,   8,  16,  32],
       [  1,  -1,   1,  -1,   1],
       [275,   0,   0,   5,   6],
       [  7,   6,   5,   4,   3]])

In [15]:
B[3] = 100 # everything in row 3 becomes 100

In [16]:
B

array([[  2,   4,   8,  16,  32],
       [  1,  -1,   1,  -1,   1],
       [275,   0,   0,   5,   6],
       [100, 100, 100, 100, 100]])

In [20]:
# Single column replace
B[:,2] = [5,50,500,5000] # replace column 2 with the given vector

In [19]:
B # Note column 2 has changed

array([[   2,    4,    5,   16,   32],
       [   1,   -1,   50,   -1,    1],
       [ 275,    0,  500,    5,    6],
       [ 100,  100, 5000,  100,  100]])

## Slices

Can combine slice notation with multiple indices.

In [21]:
C = B[ 1:3 , 1:4 ]  # the submatrix from rows 1 and 2 and columns 1,2,3

In [22]:
C

array([[ -1,  50,  -1],
       [  0, 500,   5]])

In [23]:
C[:,0] = 87  # everything in column 0 of C becomes 87

In [24]:
C

array([[ 87,  50,  -1],
       [ 87, 500,   5]])

Slices return **views**, not copies.

In [27]:
# C was a slice of B so changing C changed B as well!
B # note the presence of the value 87

array([[   2,    4,    5,   16,   32],
       [   1,   87,   50,   -1,    1],
       [ 275,   87,  500,    5,    6],
       [ 100,  100, 5000,  100,  100]])

What if you wanted a slice that is a copy, not a view?

In [28]:
C2 = B[ 1:3 , 1:4 ].copy()  # Take a slice (view) but then make a copy

In [29]:
C2[:,:] = 0 # Zero it out
B # check B isn't changed!

array([[   2,    4,    5,   16,   32],
       [   1,   87,   50,   -1,    1],
       [ 275,   87,  500,    5,    6],
       [ 100,  100, 5000,  100,  100]])

In [31]:
# Can use extended slice syntax as well
B[::2,3]  # elements in column 3 that have EVEN row number

array([16,  5])

## Equality and bool

`.all()` checks if an array of booleans is all `True`.

In [32]:
v = np.array([2,4,6,8,10])
w = np.array([2,4,8,16,32])

In [33]:
v == w  # gives an array of booleans

array([ True,  True, False, False, False])

In [34]:
# VERY COMMON MISTAKE (will raise exception)
if v == w:
    print("v and w are equal")
else:
    print("v and w are not equal")

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

In [36]:
# Best way to fix - stops when first difference found
if np.array_equal(v,w):
    print("v and w are equal")
else:
    print("v and w are not equal")

v and w are not equal


In [37]:
# Also works, but compares all of v,w no matter what
if np.all(v==w):
    print("v and w are equal")
else:
    print("v and w are not equal")

v and w are not equal


In [39]:
(v==w).all() # same as applying np.all to v==w

False

## Ufuncs

Functions that automatically apply to each entry in an array.

### Some arrays to operate on

In [40]:
v = np.arange(-5,6,1)
v

array([-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5])

In [41]:
A = np.array(range(1,16)).reshape((3,5)) # (15,) -> (3,5)
A

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15]])

### Examples of numpy ufuncs

In [42]:
np.exp(v) # apply e^x to each entry

array([6.73794700e-03, 1.83156389e-02, 4.97870684e-02, 1.35335283e-01,
       3.67879441e-01, 1.00000000e+00, 2.71828183e+00, 7.38905610e+00,
       2.00855369e+01, 5.45981500e+01, 1.48413159e+02])

In [43]:
np.cos(A) # apply cosine to each entry

array([[ 0.54030231, -0.41614684, -0.9899925 , -0.65364362,  0.28366219],
       [ 0.96017029,  0.75390225, -0.14550003, -0.91113026, -0.83907153],
       [ 0.0044257 ,  0.84385396,  0.90744678,  0.13673722, -0.75968791]])

In [44]:
v**3 # cube each entry

array([-125,  -64,  -27,   -8,   -1,    0,    1,    8,   27,   64,  125])

In [46]:
67*A # multiply each element by 67

array([[  67,  134,  201,  268,  335],
       [ 402,  469,  536,  603,  670],
       [ 737,  804,  871,  938, 1005]])

In [48]:
1/(v + 30)  # apply the function 1/(x+30) to each element of v

array([0.04      , 0.03846154, 0.03703704, 0.03571429, 0.03448276,
       0.03333333, 0.03225806, 0.03125   , 0.03030303, 0.02941176,
       0.02857143])

Let $f(x) = 3x^2 - 8x + 14$.  Apply $f$ to each element of array `v`.

In [50]:
def f(x):
    return 3*(x**2) - 8*x + 14

In [51]:
# slow!
np.array([f(x) for x in v])

array([129,  94,  65,  42,  25,  14,   9,  10,  17,  30,  49])

In [52]:
# faster, easier to read
f(v)

array([129,  94,  65,  42,  25,  14,   9,  10,  17,  30,  49])

## Broadcasting

In [273]:
A

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9],
       [8, 2, 1]])

In [266]:
A + 1  # acts on every element

array([[ 2,  3,  4],
       [ 5,  6,  7],
       [ 8,  9, 10],
       [ 9,  3,  2]])

In [267]:
A + np.array([5,50,500])

array([[  6,  52, 503],
       [  9,  55, 506],
       [ 12,  58, 509],
       [ 13,  52, 501]])

In [274]:
np.array([[1],
          [2],
          [3]]) + np.array([4,5,6])   # "bi-broadcast"

array([[5, 6, 7],
       [6, 7, 8],
       [7, 8, 9]])

## Aggregations

`sum`, `max`, `min`, `argmax`, `argmin`, `mean`, `all`, `any`, `array_equal`

In [264]:
np.argmax( [1,2,1,5,4,7,3,2,6] )  # WHERE the maximum is found

5

In [276]:
A = np.array([[1,2,3],[4,5,6],[7,8,9],[8,2,1]])
A

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9],
       [8, 2, 1]])

In [250]:
np.sum(A)

45

In [277]:
sum(A)  # means sum the rows

array([20, 17, 19])

In [251]:
np.max(A)

9

In [253]:
np.mean(A)

5.0

In [278]:
np.sum(A,axis=0) # sum of the rows

array([20, 17, 19])

In [280]:
np.sum(A,axis=1) # sum of the columns

array([ 6, 15, 24, 11])

In [281]:
A

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9],
       [8, 2, 1]])

## Masks

In [60]:
# Let's make a nice matrix of values
tmp = np.zeros( (6,6), dtype="int" ) + [1,2,5,8,9,14]
A = tmp.T**2 - tmp
A


array([[  0,  -1,  -4,  -7,  -8, -13],
       [  3,   2,  -1,  -4,  -5, -10],
       [ 24,  23,  20,  17,  16,  11],
       [ 63,  62,  59,  56,  55,  50],
       [ 80,  79,  76,  73,  72,  67],
       [195, 194, 191, 188, 187, 182]])

In [61]:
is_negative = A < 0  # array of booleans answering the question
is_negative

array([[False,  True,  True,  True,  True,  True],
       [False, False,  True,  True,  True,  True],
       [False, False, False, False, False, False],
       [False, False, False, False, False, False],
       [False, False, False, False, False, False],
       [False, False, False, False, False, False]])

In [62]:
A[is_negative]  # vector of all the negative values in A
# This works whenever A[M] has M an array of booleans of
# the same shape as A

array([ -1,  -4,  -7,  -8, -13,  -1,  -4,  -5, -10])

In [63]:
A[is_negative] = 999 # replace everything negative in A with 999

In [64]:
A

array([[  0, 999, 999, 999, 999, 999],
       [  3,   2, 999, 999, 999, 999],
       [ 24,  23,  20,  17,  16,  11],
       [ 63,  62,  59,  56,  55,  50],
       [ 80,  79,  76,  73,  72,  67],
       [195, 194, 191, 188, 187, 182]])

Masks like this make it easy to select parts of an array based on inequalities or conditions on each element, then modify.

For example, here we'll clamp all elements of A between 20 and 190.  Anything under 20 becomes 20, and anything over 190 becomes 190.

In [65]:
A[A<20] = 20  # every entry of A less than 20 becomes 20
A[A>190] = 190 # every entry of A greater than 190 becomes 190

In [67]:
A # now clamped

array([[ 20, 190, 190, 190, 190, 190],
       [ 20,  20, 190, 190, 190, 190],
       [ 24,  23,  20,  20,  20,  20],
       [ 63,  62,  59,  56,  55,  50],
       [ 80,  79,  76,  73,  72,  67],
       [190, 190, 190, 188, 187, 182]])

## Pillow integration

* `np.array(img)` just works, if `img` is a `PIL.Image` object
* Use `PIL.Image.fromarray(A)` to make an image from an array
    * Shape `(height,width)` and dtype `uint8` for grayscale
    * Shape `(height,width,3)` and dtype `uint8` for color (last axis is red, green, blue)