<a href="https://colab.research.google.com/github/jbpost2/ST-554-Big-Data-with-Python/blob/main/01_Programming_in_python/12-Numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# `NumPy`

One of the most famous modules for statistics is called `numpy`

- A library for scientific computing

    + Provides the multidimensional array object (such as vectors and matrices) and methods for manipulating them

- We can bring in the module with `import` and the convention is to reference it as `np`

In [1]:
import numpy as np

As with our other data types, let's go through and...

- Learn how to create
- Consider commonly used functions and methods
- See control flow and other tricks along the way

This topic, compound objects:
- **numpy array**

Recall: **functions** & **methods** act on objects. We'll see how to obtain **attributes** here as well!

Note: These types of webpages are built from Jupyter notebooks (`.ipynb` files). You can access your own versions of them by [clicking here](https://colab.research.google.com/github/jbpost2/ST-554-Big-Data-with-Python/blob/main/01_Programming_in_python/12-Numpy.ipynb). **It is highly recommended that you go through and run the notebooks yourself, modifying and rerunning things where you'd like!**

---

## Creating an Array

- Arrays are like lists but process much faster
- They also require that the data be of the same type
- They can be multidimensional (like a matrix or even higher dimension

The picture below from <https://predictivehacks.com/tips-about-numpy-arrays/> shows a 1D, 2D, and 3D array visually.

<img src = "https://www4.stat.ncsu.edu/online/datasets/nparrays.png" alt = "A 1D array is shown. This has only a '0' axis and has four values in an order. A 2D array is shown. This has a '0' axis and a '1' axis. The values are laid out in a rectangular grid. A 3D array is shown. This has a '0', '1', and '2' axis and the values are laid out in a rectangular prism">


- To create an `ndarray` object, pass a `list`, `tuple`, or any array-like object to `np.array()`


In [2]:
a = np.array(1)
a

array(1)

In [3]:
type(a)

numpy.ndarray

- `ndarrays` have a `shape` attribute
- Attributes can be accessed like methods except we don't use `()` at the end
- We did this with the `.__doc__` attribute on functions

In [4]:
a.shape

()

In [5]:
b = np.array([1, 2, 3])
print(b)
print(type(b))
print(b.shape)

[1 2 3]
<class 'numpy.ndarray'>
(3,)


---

### Array Dimension

- 0D arrays are a scalar (sort of... <a href = "https://stackoverflow.com/questions/773030/why-are-0d-arrays-in-numpy-not-considered-scalar" target = "_blank">see here for discussion</a>)
- 1D arrays are vectors
- 2D arrays are matrices
- 3D and up are just called arrays

- `.shape` attribute returns the dimensions of an array as a tuple

In [6]:
c = np.array([1, "a", True])
print(c)
c.shape

['1' 'a' 'True']


(3,)

In [7]:
d = np.array([
  [1, 2, 3],
  [4, 5, 6]]
  )
print(d)
d.shape

[[1 2 3]
 [4 5 6]]


(2, 3)

### Functions for Fillling/Creating Arrays

Creating a vector or matrix of all zeros

- Row vector

In [8]:
A0 = np.zeros(4) #row vector of length 4
A0

array([0., 0., 0., 0.])

- Column vector

In [9]:
A0 = np.zeros((4,1)) #column vector of length 4
A0

array([[0.],
       [0.],
       [0.],
       [0.]])

- Matrix of zeros

In [10]:
A = np.zeros((4,2)) #matrix with dimension 4, 2, given as a tuple
A

array([[0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.]])

In [11]:
A.shape

(4, 2)


- Row of all ones

In [12]:
b = np.ones(4) #row vector
b

array([1., 1., 1., 1.])

- Matrix of all ones

In [13]:
B = np.ones((2,3))
B

array([[1., 1., 1.],
       [1., 1., 1.]])

- Matrix of 10's

In [14]:
C = np.ones((2, 3)) * 10
C

array([[10., 10., 10.],
       [10., 10., 10.]])

- `np.full()` does this automatically

In [15]:
C = np.full((2,3), 10) #specify the value to fill with after the tuple giving dimension
C

array([[10, 10, 10],
       [10, 10, 10]])

- Be careful! C is an integer valued array

In [16]:
C = np.full((2,3), 10)
C[0,0] = 6.5                 #replace the top left element
C

array([[ 6, 10, 10],
       [10, 10, 10]])

- Avoid by creating the matrix with a float instead

In [17]:
C = np.full((2,3), 10.0)  #or C = np.ones((2, 3)) * 10.0
C[0,0] = 6.5
C

array([[ 6.5, 10. , 10. ],
       [10. , 10. , 10. ]])

- Create an identity matrix with `np.eye()` (this has 1's on the diagonal and 0's elsewhere)

In [18]:
D = np.eye(3)
D

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

- Create a random matrix (values between 0 and 1) with `np.random.random()`

In [29]:
np.random.seed(11) #setting the 'seed' allows us to get the same 'random' values each time we run this code
E = np.random.random((3,5))
E

array([[0.18026969, 0.01947524, 0.46321853, 0.72493393, 0.4202036 ],
       [0.4854271 , 0.01278081, 0.48737161, 0.94180665, 0.85079509],
       [0.72996447, 0.10873607, 0.89390417, 0.85715425, 0.16508662]])

- <a href = "https://numpy.org/doc/stable/reference/routines.array-creation.html" target = "_blank">Many more ways to create!</a>

---

## Reshaping an Array

- Reshape an array with the `.reshape()` method
- Changes the dimension in some way
- We'll need to do this type of thing when fitting models!

In [30]:
F = np.random.random((10,1))
F

array([[0.63233401],
       [0.02048361],
       [0.11673727],
       [0.31636731],
       [0.15791231],
       [0.75897959],
       [0.81827536],
       [0.34462449],
       [0.3187988 ],
       [0.11166123]])

In [31]:
F.shape

(10, 1)

In [32]:
G = F.reshape(1, -1) #-1 flattens to a 1D array
G

array([[0.63233401, 0.02048361, 0.11673727, 0.31636731, 0.15791231,
        0.75897959, 0.81827536, 0.34462449, 0.3187988 , 0.11166123]])

In [33]:
G.shape

(1, 10)

In [34]:
G = F.reshape(2, 5)
G

array([[0.63233401, 0.02048361, 0.11673727, 0.31636731, 0.15791231],
       [0.75897959, 0.81827536, 0.34462449, 0.3187988 , 0.11166123]])

- Careful!  `G` is actually a **view** of the original array
- View means that we haven't created a new array, just a different way of viewing the values (essentially). The data is still stored in the same memory
- `.base` attribute will tell you whether you are referencing another array

In [35]:
G.base

array([[0.63233401],
       [0.02048361],
       [0.11673727],
       [0.31636731],
       [0.15791231],
       [0.75897959],
       [0.81827536],
       [0.34462449],
       [0.3187988 ],
       [0.11166123]])

In [36]:
G.base is None #a way to return a bool based on whether it is a view or not

False

---

### Copying an Array

- To avoid getting a view, copy the array with `.copy()` method

In [37]:
H = F.reshape(2, 5).copy()
H.base is None

True

In [38]:
H.base

---

## Indexing an Array

- Access in the same was as lists `[]`
- With multiple dimensions, separate the indices you want with a `,`

In [39]:
b = np.array([1, 2, 3]) #row vector
b

array([1, 2, 3])

In [40]:
print(b[0], b[1], b[2])

1 2 3


In [41]:
b[0] = 5 #overwrite the 0 element
b

array([5, 2, 3])

- Depending on the dimensions, you add the required commas
- Here we have a 3D array so we have three slots
- Notation: `array[1stD, 2ndD, 3rdD]`

In [42]:
E = np.random.random((3, 2, 2))
E

array([[[0.08395314, 0.71272594],
        [0.5995434 , 0.05567368]],

       [[0.47979728, 0.40167648],
        [0.847979  , 0.71784918]],

       [[0.60206405, 0.55238382],
        [0.9491024 , 0.98667333]]])

In [43]:
E[0, 0, 0]

np.float64(0.08395314332205706)

In [44]:
E[0, 1, 0]

np.float64(0.5995433962576555)

In [45]:
E[1, 0, 1]

np.float64(0.4016764806306522)

---

### Slicing an Array

- Recall `[start:end]` for slicing sequence type objects. We can do that with arrays as well

    + Returns everything from start up to and **excluding** end
    + Leaving start blank implies a 0
    + Leaving end blank returns everything from start through the end of the array


In [46]:
A = np.array([
  [1,2,3,4],
  [5,6,7,8],
  [9,10,11,12]])
A

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [47]:
B = A[:2, 1:3]
B

array([[2, 3],
       [6, 7]])

- Careful with modifying! We have a view here so the values in both A and B are referencing the same computer memory
- Changing an element of `B` changes `A`!

In [48]:
B[0, 0] = 919
A

array([[  1, 919,   3,   4],
       [  5,   6,   7,   8],
       [  9,  10,  11,  12]])

- Returning All of One Index

- Use a `:` with nothing else

In [49]:
A = np.array([
  [1,2,3,4],
  [5,6,7,8],
  [9,10,11,12]])
A1 = A[1, :]
A1

array([5, 6, 7, 8])

In [50]:
A1.shape

(4,)

In [51]:
A2 = A[1:3, :]
A2

array([[ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [52]:
A2.shape

(2, 4)

---

## Operations on Arrays

- We saw that multiplying by a constant was performed elementwise  
- All basic functions act elementwise

In [53]:
x = np.array([
  [1,2],
  [3,4]])
y = np.array([
  [5,6],
  [7,8]])

x

array([[1, 2],
       [3, 4]])

In [54]:
y

array([[5, 6],
       [7, 8]])

In [55]:
x + 10

array([[11, 12],
       [13, 14]])

- Lots of methods exist such as the `.add()` method for adding arrays elementwise

In [56]:
np.add(x, y)


array([[ 6,  8],
       [10, 12]])

- If we just do something like `x * y` we get elementwise multiplication

In [57]:
x * y

array([[ 5, 12],
       [21, 32]])

- The `.multiply()` method does elementwise multiplication too
- Additional functionality to add in conditions on when to multiply!
  + `where =` argument gives the condition on when to do the multiplication
  + `out =` tells it which values to use if you don't do the multiplication

In [58]:
np.multiply(x, y, where = (x >= 3), out = x)

array([[ 1,  2],
       [21, 32]])

- Elementwise division

In [59]:
x / y

array([[0.2       , 0.33333333],
       [3.        , 4.        ]])

- We can do matrix multiplication (if you are familiar with that) using the `.matmul()` method

In [60]:
np.matmul(x, y)

array([[ 19,  22],
       [329, 382]])

- `sqrt()` function can be used to find the square roots of the elements of a matrix

In [61]:
np.sqrt(x)

array([[1.        , 1.41421356],
       [4.58257569, 5.65685425]])

- `np.linalg.inv()` will provide the inverse of a square matrix (if you're familiar with that type of thing!)

In [62]:
np.linalg.inv(x)

array([[-3.2,  0.2],
       [ 2.1, -0.1]])

### Computations on Arrays

- `NumPy` has some useful functions for performing basic computations on arrays

In [63]:
x = np.array([
  [1,2,10],
  [3,4,11]])
np.sum(x)

np.int64(31)

- Column-wise and row-wise sums

In [64]:
x.shape

(2, 3)

In [65]:
np.sum(x, axis=0)

array([ 4,  6, 21])

In [66]:
np.sum(x, axis=1)

array([13, 18])

- Combine arrays (appropriately sized)

In [67]:
x = np.array([
  [1,2],
  [3,4]])
y = np.array([
  [5,6],
  [7,8]])

np.hstack((x, y))

array([[1, 2, 5, 6],
       [3, 4, 7, 8]])

In [68]:
np.vstack((x, y))

array([[1, 2],
       [3, 4],
       [5, 6],
       [7, 8]])

- <a href = "https://numpy.org/doc/stable/reference/routines.array-manipulation.html" target = "_blank">Lots of other operations!</a>


---

# Quick Video

This video shows the creation of numpy arrays, the use of `np.nditer()` for iterating over an array, and the use of `ufuncs` which act on the elements of an array. Remember to pop the video out into the full player.

The notebook written in the video is <a href = "https://colab.research.google.com/github/jbpost2/ST-554-Big-Data-with-Python/blob/main/01_Programming_in_python/Learning_Python.ipynb" target = "_blank">available here</a>.

In [69]:
from IPython.display import IFrame
IFrame(src="https://ncsu.hosted.panopto.com/Panopto/Pages/Embed.aspx?id=d84d2c18-c4ba-4107-bc7f-b0f800f76b2e&autoplay=false&offerviewer=true&showtitle=true&showbrand=true&captions=false&interactivity=all", height="405", width="720")

---

# Recap

- `NumPy` is a widely used library that provides arrays

    + Lots of functions to create arrays
    
    + Very fast computation and many useful functions for operating on arrays!


This wraps up the content for week 2. I know that was a lot about these objects. Now we require some more practice! You should head back to our <a href = "https://wolfware.ncsu.edu/" target = "_blank">Moodle site</a> to check out your homework assignment for this week.

Otherwise, if you are on the course website, use the table of contents on the left or the arrows at the bottom of this page to navigate to the next learning material!

If you are on Google Colab, head back to our course website for [our next lesson](https://jbpost2.github.io/ST-554-Big-Data-with-Python/01_Programming_in_python/13-EDA_Landing.html)!