# NumPy

Today we'll introduce NumPy, an essential Python package for data science (and more generally for any kind of numerical computing).

NumPy stands for *Num*erical *Py*thon and contains many features that are helpful when working with data.

We'll demonstrate some of the most important features in the lecture. See the NumPy [documentation](https://numpy.org) for full package details.

## Importing NumPy

While we could just write `import numpy`, this would require us to type out "numpy" before every function call from these modules.  Conventional abbreviations require less typing.  Thus:


In [None]:
import numpy as np

##  Lists vs NumPy arrays

- Lists are **general purpose**, arrays are **for math**.
- Lists are **untyped**, but arrays elements **must all be the same type**.
- Lists are **resizable**, arrays have a size **fixed on creation.**

## Creating 1D arrays

In [None]:
v = np.array([1, 2 ,3])
print(v)

[1 2 3]


## Creating 2D Arrays (Matrices)

In [None]:
A = np.array([[1, 0, 0],
              [0 ,2, 0],
              [0, 0, 3]])  # 3x3 with 1,2,3 along the diagonal
print(A)

[[1 0 0]
 [0 2 0]
 [0 0 3]]


## The shape of an array

It's very common to want to know the dimensions of an array of data - you may have just loaded the dataset without knowing how many rows of data are in it.  The shape *attribute* of an array (such as a matrix) is a *tuple* that tells you its dimensions.

In [None]:
print(A.shape)  # Tuples: like lists, but use () instead of []
print(v.shape)  # 1d outputs a comma to indicate it's still a tuple

(3, 3)
(3,)


## Operations with arrays

For arrays of the same dimension, we can add, subtract, multiply, and divide NumPy arrays of the same dimension. These default operators work element by element in the array.


In [None]:
v1 = v
print(v1)
v2 = np.array([4, 5, 6])
print(v2)
print("Adding 1D arrays: ",  v1 + v2)
print("Subtracting 1D arrays: ",  v1 - v2)
print("Multiplying 1D arrays: ", v1 * v2)
print("Dividing 1D arrays: ", v1 / v2)

[1 2 3]
[4 5 6]
Adding 1D arrays:  [5 7 9]
Subtracting 1D arrays:  [-3 -3 -3]
Multiplying 1D arrays:  [ 4 10 18]
Dividing 1D arrays:  [0.25 0.4  0.5 ]


### Broadcasting

The term [broadcasting](https://numpy.org/doc/stable/user/basics.broadcasting.html) describes how NumPy treats arrays with different shapes during arithmetic operations. If you try to add a vector to a scalar, the scalar's values will be replicated until it is a vector of the same size, and then addition will take place.  The same thing is true of a variety of arithmetic operations, and even a variety of sizes of array - if a dimension is 1 in one array but not the other, broadcasting will "stretch" the smaller array by replicating values to make the dimensions match.

In [None]:
print(v1)
print("Adding a scalar to a vector: ", v1 + 2)
print("Multiplying by a scalar: ", v1 * 2)
zeros = np.array([[0,0,0],[0,0,0],[0,0,0]])
print("Adding a scalar to a matrix: ")
print(zeros + 2)
print("Broadcast 1x3 to add to a 3x3 matrix:")
print(v1 + zeros)

[1 2 3]
Adding a scalar to a vector:  [3 4 5]
Multiplying by a scalar:  [2 4 6]
Adding a scalar to a matrix: 
[[2 2 2]
 [2 2 2]
 [2 2 2]]
Broadcast 1x3 to add to a 3x3 matrix:
[[1 2 3]
 [1 2 3]
 [1 2 3]]


## Matrix-vector multiplication

NumPy also makes matrix-vector multiplication easy, devoting an operator @ to carrying out matrix-vector and matrix-matrix multiplication. Multiplication by a matrix is a common operation in fields ranging from graphics to machine learning.

If we want to multiply matrix $M$ by vector $w$ by hand, the weights of each matrix row are the weights in a weighted sum of the vector elements.  For example, let's consider a $2 \times 2$ matrix $B$ defined as

$$
B =
\begin{bmatrix}
3 & 2 \\
4 & -1
\end{bmatrix},
$$

One interpretation of this matrix is that it's like reading the formulas

$$
B =
\begin{bmatrix}
3x & + & 2y \\
4x & + & -1y
\end{bmatrix},
$$

Multiplying with the matrix $B$ on the left is equivalent to substituting vector values into its equations.  Let $w$ represent the values $x = 1, y = -10$ with $$w = \begin{bmatrix}
1 \\
-10 \\
\end{bmatrix}$$

So if we multiply $B$ by $2 \times 1$ vector $w$ we get

$$
Bw =
\begin{bmatrix}
3 & 2 \\
4 & -1
\end{bmatrix}
\begin{bmatrix}
1 \\
-10 \\
\end{bmatrix}.
=
\begin{bmatrix}
3(1) + 2(-10) \\
4(1) + -1(-10)
\end{bmatrix}
=
\begin{bmatrix}
-17 \\
14
\end{bmatrix}
$$

In numpy, matrix multiplication can be carried out just by using the @ operator.

In [None]:
B = np.array([[3, 2],
              [4, -1]])
w = np.array([1, -10])
z = B @ w # Notice the matrix goes on the left for a basic mult by vector
print(z)

[-17  14]


More functions for dealing with matrices and vectors can be found in the numpy.linalg module.

## Slicing

One very convenient thing that you can do with numpy arrays (which you can also do with lists and strings) is "slicing," or grabbing values between particular indices.  The syntax for a 1D array is my_array[first_included_index:first_excluded_index], so the second index isn't in the result.


In [None]:
my_array = np.array([8, 6, 7, 5, 3, 0, 9])
print(my_array[1:3]) # prints index 1 and 2, not 3

[6 7]


Leaving off the second number means all the rest of the indices will be included.  Leaving off the first means all elements before the index after the colon will be included.

In [None]:
print(my_array)
print(my_array[1:])

[8 6 7 5 3 0 9]
[6 7 5 3 0 9]


In [None]:
my_array[:3]

array([8, 6, 7])

This all works in a similar way for the 2D case.  Slicing 2D arrays is very common when dealing with datasets.  You may choose to be interested in only certain rows, representing certain datapoints, or certain columns, representing certain features.

One detail is that the ranges are comma-separated.  (You might be tempted to use double-square-brackets if you thought of them as list-of-lists.)

In [None]:
# Could suppose this is data where each row is an observation
# of latitude, longitude, and temperature
my_matrix = np.array([[42.3, 71.1, 92],
                      [40.7, 70.0, 85],
                      [47.6, 122.0, 82]])
print(my_matrix)
# Now we decide to drop the first datapoint
# and only care about latitude/longitude
two_by_two_square = my_matrix[1:, :2]
print(two_by_two_square)

[[ 42.3  71.1  92. ]
 [ 40.7  70.   85. ]
 [ 47.6 122.   82. ]]
[[ 40.7  70. ]
 [ 47.6 122. ]]


A colon alone refers to grabbing all rows or columns.

In [None]:
no_last_column = my_matrix[:, :2] # no temperature
print(no_last_column)

[[ 42.3  71.1]
 [ 40.7  70. ]
 [ 47.6 122. ]]


## Zero arrays

If your code is going to count things, it's common to want to create a matrix or vector that is zero to start, so that other values can be added to this.  The zeros() function handily creates arrays that are all zeros, of arbitrary size.

In [None]:
print(np.zeros(3)) #create an array of zeros with length 3
print(np.zeros((2, 3))) # create a 2x3 matrix of zeros

[0. 0. 0.]
[[0. 0. 0.]
 [0. 0. 0.]]
