<a href="https://colab.research.google.com/github/jbpost2/ST-554-Big-Data-With-Python-Course-Notes/blob/main/01_Programming_in_python/12_Numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# `NumPy`

One of the most famous modules for statistis is called `numpy`

- A library for scientific computing

    + Provides the multidimensional array object (such as vectors and matrices) and methods for manipulating them

- Bring in the module with `import` and the convention is to reference it as `np`

In [None]:
import numpy as np

As with our other data types, let's go through and...

- Learn how to create
- Consider commonly used functions and methods
- See control flow and other tricks along the way

This topic, compound objects:
- **numpy array**

Recall: **functions** & **methods** act on objects. We'll see how to obtain **attributes** here as well!

---

## Creating an Array

- Arrays are like lists but process much faster
- They also require that the data be of the same type
- They can be multidimensional (like a matrix or even higher dimension

The picture below from <https://predictivehacks.com/tips-about-numpy-arrays/> shows a 1D, 2D, and 3D array visually.

![](https://drive.google.com/uc?export=view&id=10NhJk2BlGhzcXdPCeWw2iKXO9NH5EMxj)


- To create an `ndarray` object, pass a list, tuple, or any array-like object to `np.array()`


In [None]:
a = np.array(1)
a

In [None]:
type(a)

- ndarrays have a shape attribute
- Attributes can be accessed like methods except we don't use `()` at the end
- We did this with the `.__doc__` attribute on functions

In [None]:
a.shape

In [None]:
b = np.array([1, 2, 3])
print(b)
print(type(b))
print(b.shape)

---

### Array Dimension

- 0D arrays are a scalar (sort of... [see here for discussion](https://stackoverflow.com/questions/773030/why-are-0d-arrays-in-numpy-not-considered-scalar))
- 1D arrays are vectors
- 2D arrays are matrices
- 3D and up are just called arrays

- `.shape` attribute returns the dimensions of an array as a tuple

In [None]:
c = np.array([1, "a", True])
print(c)
c.shape

In [None]:
d = np.array([
  [1, 2, 3],
  [4, 5, 6]]
  )
print(d)
d.shape

### Functions for Fillling/Creating Arrays

Creating a vector or matrix of all zeros

- Row vector

In [None]:
A0 = np.zeros(4) #row vector of length 4
A0

- Column vector

In [None]:
A0 = np.zeros((4,1)) #column vector of length 4
A0

- Matrix of zeros

In [None]:
A = np.zeros((4,2)) #matrix with dimension 4, 2, given as a tuple
A

In [None]:
A.shape


- Row of all ones

In [None]:
b = np.ones(4) #row vector
b

- Matrix of all ones

In [None]:
B = np.ones((2,3))
B

- Matrix of 10's

In [None]:
C = np.ones((2, 3)) * 10
C

- `np.full()` does this automatically

In [None]:
C = np.full((2,3), 10) #specify the value to fill with after the tuple giving dimension
C

- Be careful! C is an integer valued array

In [None]:
C = np.full((2,3), 10)
C[0,0] = 6.5                 #replace the top left element
C

- Avoid by creating the matrix with a float instead

In [None]:
C = np.full((2,3), 10.0)  #or C = np.ones((2, 3)) * 10.0
C[0,0] = 6.5
C

- Create an identity matrix with `np.eye()` (this has 1's on the diagonal and 0's elsewhere)

In [None]:
D = np.eye(3)
D

- Create a random matrix (values between 0 and 1) with `np.random.random()`

In [None]:
E = np.random.random((3,5))
E

- [Many more ways to create!](https://numpy.org/doc/stable/reference/routines.array-creation.html)

---

## Reshaping an Array

- Reshape an array with the `.reshape()` method
- Changes the dimension in some way
- We'll need to do this type of thing when fitting models!

In [None]:
F = np.random.random((10,1))
F

In [None]:
F.shape

In [None]:
G = F.reshape(1, -1) #-1 flattens to a 1D array
G

In [None]:
G.shape

In [None]:
G = F.reshape(2, 5)
G

- Careful!  This is a view of the original array
- View means that we haven't created a new array, just a different way of viewing the values (essentially). The data is still stored in the same memory
- `.base` attribute tells you whether you are referencing another array

In [None]:
G.base is None

In [None]:
G.base

---

### Copying an Array

- To avoid getting a view, copy the array with `.copy()` method

In [None]:
H = F.reshape(2, 5).copy()
H.base is None

In [None]:
H.base

---

## Indexing an Array

- Access in the same was as lists `[]`
- With multiple dimensions, separate the indices you want with a `,`

In [None]:
b = np.array([1, 2, 3]) #row vector
b

In [None]:
print(b[0], b[1], b[2])

In [None]:
b[0] = 5 #overwrite the 0 element
b

- Depending on the dimensions, you add the required commas
- Here we have a 3D array so we have three slots
- Notation: `array[1stD, 2ndD, 3rdD]`

In [None]:
E = np.random.random((3, 2, 2))
E

In [None]:
E[0, 0, 0]

In [None]:
E[0, 1, 0]

In [None]:
E[1, 0, 1]

---

### Slicing an Array

- Recall `[start:end]` for slicing sequence type objects. We can do that with arrays as well

    + Returns everything from start up to and **excluding** end
    + Leaving start blank implies a 0
    + Leaving end blank returns everything from start through the end of the array


In [None]:
A = np.array([
  [1,2,3,4],
  [5,6,7,8],
  [9,10,11,12]])
A

In [None]:
B = A[:2, 1:3]
B

- Careful with modifying! We have a view here so the values in both A and B are referencing the same computer memory
- Changing an element of `B` changes `A`!

In [None]:
B[0, 0] = 919
A

- Returning All of One Index

- Use a `:` with nothing else

In [None]:
A = np.array([
  [1,2,3,4],
  [5,6,7,8],
  [9,10,11,12]])
A1 = A[1, :]
A1

In [None]:
A1.shape

In [None]:
A2 = A[1:3, :]
A2

In [None]:
A2.shape

---

## Operations on Arrays

- We saw that multiplying by a constant was performed elementwise  
- All basic functions act elementwise

In [None]:
x = np.array([
  [1,2],
  [3,4]])
y = np.array([
  [5,6],
  [7,8]])

x

In [None]:
y

In [None]:
x + 10

- Lots of methods exist such as the `.add()` method for adding arrays elementwise

In [None]:
np.add(x, y)


- If we just do something like `x * y` we get elementwise multiplication

In [None]:
x * y

- The `.multiply()` method does elementwise multiplication too
- Can also add in conditions on when to multiply though!
  + `where =` argument gives the condition on when to do the multiplication
  + `out =` tells it which values to use if you don't do the multiplication

In [None]:
np.multiply(x, y, where = (x >= 3), out = x)

- Elementwise division

In [None]:
x / y

- We can do matrix multiplication (if you are familiar with that) using the `.matmul()` method

In [None]:
np.matmul(x, y)

- `sqrt()` function can be used to find the square roots of the elements of a matrix

In [None]:
np.sqrt(x)

- `np.linalg.inv()` will provide the inverse of a square matrix (if you're familiar with that type of thing!)

In [None]:
np.linalg.inv(x)

### Computations on Arrays

- NumPy has some useful functions for performing basic computations on arrays

In [None]:
x = np.array([
  [1,2,10],
  [3,4,11]])
np.sum(x)

- Column-wise and row-wise sums

In [None]:
x.shape

In [None]:
np.sum(x, axis=0)

In [None]:
np.sum(x, axis=1)

- Combine arrays (appropriately sized)

In [None]:
x = np.array([
  [1,2],
  [3,4]])
y = np.array([
  [5,6],
  [7,8]])

np.hstack((x, y))

In [None]:
np.vstack((x, y))

- [Lots of other operations!](https://numpy.org/doc/stable/reference/routines.array-manipulation.html)


---

# Quick Video

This video shows the creation of numpy arrays, the use of `np.nditer()` for iterating over an array, and the use of `ufuncs` which act on the elements of an arry. (Coming Soon)

---

# Recap

- `NumPy` is a widely used library that provides arrays

    + Lots of functions to create arrays
    
    + Very fast computation and many useful functions for operating on arrays!