In [None]:
%reload_ext postcell
%postcell register

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Introduction to Numpy

Numpy is a matrix/linear algebra library for Python. In _core_ Python (Python without external libraries), working with collection of numbers requires the use of loops, list comprehensions or map/filter/reduce functions. However, in Numpy, collections of numbers are the default and easy to work with.

Extremely helpful cheatsheet on numpy: https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf (check the web, there are many more)

Let's take a look at some examples

In [None]:
%matplotlib inline
#plt.figure(figsize=(50,50))
np.random.seed(seed=42)

In [None]:
a = np.array([1,2,3,4,5,6,7,8,9])
a

In [None]:
b = np.array([9,8,7,6,5,4,3,2, 1])
b

In [None]:
a + b

In [None]:
a - b

In [None]:
b - a

In [None]:
a * b

Notice that we are able to add the whole array, just by using normal arithmetic functions. No need to use loops!

Recall that attempting to add two Python lists just concatenates them:

In [None]:
[1,2,3] + [4,5,6]

**Exercise** Create array `c` of 5 values, another `d` of 5 values
1. Add them
2. Subtract them
3. Multiply them

In [None]:
%%postcell exercise_045_b

#type your answer here

### Numerical programming in Python

In the early 90s, some programmers wanted to use Python for their scientific work, but couldn't do so for several reasons:
1. Python is extremely slow[1], compared to faster languages. Numpy mitigates this problem by compiling important code in C, but exposing a Python API
2. Numeric code, which can consist of multi-dimensional matrices, had no counterpart in Python. As we have seen above, Numpy can handle an array of numbers just fine (later we will see examples of multiple dimensions)
3. Python, at the time, had no collection of high quality functions to operate on arrays or matrices of numbers. Numpy is that collection of numbers.


[1] https://benchmarksgame-team.pages.debian.net/benchmarksgame/performance/nbody.html


#### Quick performance comparison between core Python and numpy

In [None]:
core_python = list(range(10_000))
numpy_python = np.arange(10_000)

In [None]:
%timeit sum(core_python)
%timeit np.sum(numpy_python)

Numpy is an _order of magnitude_ faster than core Python!

Modern GPUs make this _even_ faster (ask me about them)

### Multi-dimensional arrays (matrics and tensors)

Earlier we saw some examples of arrays. Numpy can just as easily create 2, 3 or any dimensional arrays:

In [None]:
np.array([1,2,3,4,5,6,7,8,9])

In [None]:
np.array([1,2,3,4,5,6,7,8,9]).shape

In [None]:
np.array([[1,2,3], [4,5,6]])

In [None]:
np.array([[1,2,3], [4,5,6]]).shape

In [None]:
# Generate a 2-dimensional matrix (5 rows by 3 columns) of random numbers
a2d = np.random.random((5,3))
a2d

In [None]:
b2d = np.random.random((5,3))
b2d

In [None]:
a2d + b2d

### Vector and matrix math is as easy as scalar math

In [None]:
ones = np.ones((5,5))
ones

In [None]:
twos = ones + ones
twos

We have seen many examples of vector and matrix arithmetic using standard arithmetic operators. We can do math between matrices and scalars just as easily

In [None]:
ones + 5

It is sometimes very useful to use the comparison operator

In [None]:
ten_vals = np.random.random(10)
ten_vals

In [None]:
ten_vals > 0.5

### Selecting rows and columns

#### Slicing
Selecting subsets of a numpy array or matrix should look somewhat familiar from similar exercises done while studying lists and strings.

Select the first row from array `a2d`

In [None]:
a2d

In [None]:
a2d[0, :]

Select the second column from array `a2d`

In [None]:
a2d

In [None]:
a2d[:, 1]

In [None]:
a2d[1, :]

**Exercise** Given the following matrix, select the bottom right corner (the resulting matrix should have 4 values) - hint: recall that we can use negative numbers to access items from the end

In [None]:
exercise1 = np.array([[1,2,3],[4,5,6], [7,8,9]])
exercise1

In [None]:
%%postcell exercise_045_a

#type your answer here

#### Boolean Indexing
Values can also be selected using boolean operators (somewhat like SQL's where clause).

In the example below, we generate an array of 10 values, a handful of specific values:

In [None]:
ten_vals = np.random.random((10,))
ten_vals

In [None]:
mask = [ True, False, False,  False,  False,  False, False, False,  False, False]
ten_vals[mask]

In [None]:
ten_vals[[ True, False, False,  False,  False,  False, False, False,  False, False]]

In [None]:
ten_vals[[ True, True, False,  False,  False,  False, False, False,  False, False]]

In [None]:
ten_vals[[ True, True, False,  False,  False,  False, False, False,  True, True]]

Recall from an earlier section that we can generate a vector of booleans with a simple comparison operator

In [None]:
ten_vals > .5

**important** we can combine boolean indexing and comparison operator to come up with an extremely common usage pattern

In [None]:
ten_vals

In [None]:
ten_vals[ten_vals > .5]

In [None]:
ten_vals[ten_vals <= .5]

We were just able to query numpy, almost as if it was an sql table! All we did was combine two simple features of numpy.

You can even combine several boolean expressions, like a normal sql statement. However, keep in mind that when you combine them, you can't just use Python's _and_ and _or_ operators. You have to use `&` for _and_ and `|` for _or_. You also have to wrap multiple boolean statements within parenthesis.

In [None]:
ten_vals[(ten_vals > .1) & (ten_vals < .9)]

In [None]:
ten_vals[(ten_vals < .1) | (ten_vals > .9)]

**Exercise** From the array `np.random.random((25))`, find all values between .1 and .5 (inclusive)

In [None]:
%%postcell exercise_045_c

#type your answer here

### Columns of data as a matrix

Take a look at the table of numbers below:

In [None]:
grades = np.array([[72, 89, 14],
 [43, 65, 74],
 [38, 71, 62],
 [82, 66, 49],
 [31, 95, 65],
 [42, 58, 51],
 [15, 54, 85],
 [60, 21, 15],
 [79, 23, 58],
 [63, 87, 67]])
grades

These numbers represent grades of 10 students (each is out of a hundred). The first two are assignments and the last is the final exam.

**Exercise** Show all grades for the first assignment (use the slicing syntax we have used often)

In [None]:
%%postcell exercise_045_d

#type your answer here

### Dot product

What is the final grade when only the first assignment is considered?

In [None]:
grades.dot([0,0,1])

In [None]:
grades.dot([0,1,0])

In [None]:
grades.dot([1,0,0])

In [None]:
np.dot(grades, [1, 0, 0])

In [None]:
grades @ [1, 0, 0]

Notice that this is the same as using slicing syntax to get the first row. Using the dot product, we are saying, _get 100% of the first column and zero percent of the second and third column_

What is the final grade is assignments 1 and 2 are considired equally and the final exam is ignored (so the first column contributes half and the second column contributes half)?

In [None]:
grades.dot([.5, .5, 0])

In [None]:
np.dot(grades, [.5, .5, 0])

The following array represents the final grade for each student (think of this as a vertical array, with 47.25 at the top and 71 at the bottom, might make it easier to visualize)

In [None]:
final_grade = np.array([47.25, 64., 58.25, 61.5, 64., 50.5, 59.75, 27.75, 54.5 ,71.])
final_grade

What if you are the professor and already know the final grade for each student (from a previous quarter), but have forgotton how you weighed the assignments and the final grade? How can you figure out the parameters for the dot product function?

In [None]:
np.linalg.lstsq(grades, final_grade, rcond=None)[0]

In [None]:
grades.dot([0.25, 0.25, 0.5 ])

The above method tells us that the first two assignments contributed 25% to the final grade and the last value contributed 50%!

### A picture as a matrix
A black and white image is essentailly a matrix. Each pixel corresponds to a cell in a matrix and the value in that cell corresponds to how dark or light the pixel is.

Color images can be broken down into red, blue and green matrices (so a 3-dimensional matrix), but we'll hold off on such matrices for now.

In [None]:
image = plt.imread("images/chicago.png")

In [None]:
type(image)

In [None]:
plt.figure(figsize=(10,10))
plt.imshow(image)

In [None]:
image.shape

Confession: the red, green and blue images below are a bit fake. Once we extract a single color channel, matplotlib just sees it as a grascale image. We have to force it to display that image in the color we want.

In [None]:
fix, axes = plt.subplots(1,3, figsize=(15,13))

axes[0].imshow(image[:, :, 0], cmap="Reds_r")
axes[1].imshow(image[:, :, 1], cmap="Greens_r")
axes[2].imshow(image[:, :, 2], cmap="Blues_r")

In [None]:
image

According to [Wikipedia](https://en.wikipedia.org/wiki/Grayscale), a standard ratio to convert color images to black & white is:

$Y'=0.299R'+0.587G'+0.114B'$

In [None]:
chicagobw = np.dot(image, [0.299, 0.587, 0.114])

In [None]:
plt.figure(figsize=(10,10))
plt.imshow(chicagobw, cmap="gray")

In [None]:
chicagobw.shape

**Example**
 Get rid of the building on the left (it looks slanted!!)

In [None]:
plt.figure(figsize=(10,10))
plt.imshow(chicagobw[:, 400:], cmap="gray")

In [None]:
#### Forget the buildings, zoom in on the cracked ice, say height 1000-1600

plt.figure(figsize=(10,10))
plt.imshow(chicagobw[1000:1600, :], cmap="gray")

**Exercise** Where is the _origin_ (0,0), lower left corner, upper left corner, lower right corner, upper right corner or the middle of the image?

**Exercise** Zoom in to the pair of awnings in front of the building on the left.

In [None]:
%%postcell exercise_045_e

#type your answer here

**Exercise** Zoom in to the circular structure near the bottom of the photo, next to the building on the right


In [None]:
%%postcell exercise_045_f

#type your answer here

**Exercise** See if your professor will let you use Numpy in your linear algebra class

### Broadcasting
Numpy provides a fantastic utility by allowing you to do math on matrices which are not the right dimensions. 

In classical math text books, it makes no sense to add a matrix and a scalar. Numpy makes it trivial:

In [None]:
np.ones((5,5))

In [None]:
np.ones((5,5)) + 1

A matrix and a vector can also be added:

In [None]:
np.ones((5,1))

In [None]:
np.ones((5,5)) + np.ones((5,1))

See this web page for an explanation and diagrams: https://numpy.org/devdocs/user/theory.broadcasting.html

### Common numpy functions

**`arange`**

Similar to Python's `range` function, create a list of numbers, between 10 and 30

In [None]:
np.arange(10)

In [None]:
np.arange(10, 30)

Create the same as above, but show every third number

In [None]:
np.arange(10.5, 30, 3.34)

**`linspace`**

Create a list of linearly spaced numbers between a `start` number and an `end` number (50, by default)

In [None]:
np.linspace(10, 100)

Create 10 linearly spaced numbers, between 25 and 55

In [None]:
np.linspace(25, 55, 10)

**`ones`, `zeros`**

In [None]:
np.ones((3,5))

In [None]:
np.zeros((3,5))

In [None]:
np.zeros((3,5)) + 27

**Identity matrix**

In [None]:
np.eye(5,5) #<= notice there are two arguments, not a single tuple

In [None]:
np.eye((5,5))

In [None]:
np.zeros(3,5)

**`transpose`**

In [None]:
arr = np.array([[1,2,3,4], [5,6,7,8]])
arr

In [None]:
arr.T #<== convertrs rows to columns

**Display as an image**

In [None]:
plt.imshow(chicagobw, cmap="gray")

**Exercise** Rotate the image by 90 degrees

In [None]:
%%postcell exercise_045_g

#type your answer here

**Aggregation**

In [None]:
arr = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
arr

In [None]:
arr.min()

In [None]:
arr.max()

Sum all columns

In [None]:
arr.sum()

In [None]:
arr.sum(axis=1)

In [None]:
arr.sum(axis=0)

Note that many methods on vectors and matrices can also be called via `np.`

In [None]:
np.sum(arr)

Notice that you can do unexpectedly complex calculation without using loops

In [None]:
cost_of_parts = np.random.randint(0, 100, 5)
cost_of_parts

In [None]:
percent_of_total = list()
total = cost_of_parts.sum()

for p in cost_of_parts:
    percent_of_total.append(p/total)

np.array(percent_of_total)

In [None]:
percent_of_total = cost_of_parts / cost_of_parts.sum() 
percent_of_total

In [None]:
percent_of_total.sum()

**Exercise** Recall the vector `ten_vals` from above, how many values are above 0.5 (recall that `True` corresponds to `1` and `False` corresponds to `0`)? You can recreate `ten_vals` via `np.random.random(10)`.

In [None]:
%%postcell exercise_045_h

#type your answer here

In some instances, particularly time series, you need to calculate a running _sum_ or _prod_

In [None]:
np.array([1,2,3,4,5,6,7,8,9])

In [None]:
np.array([1,2,3,4,5,6,7,8,9]).cumsum()

**`reshape`, `flatten`**

In [None]:
arr = np.array([0,1,2,3,4,5,6,7,8,9])
arr

In [None]:
arr.reshape((2,5))

In [None]:
arr.reshape((5,2))

In [None]:
arr.reshape((5,3)) #<=== Why can't we do this?

In [None]:
arr = np.array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])
arr

In [None]:
arr.flatten()

In [None]:
image.flatten().shape

Common mistake

In [None]:
arr = np.array([[1,2,3,4,5,6,7,8,9,0]])
arr #notice the double brackets

In [None]:
arr.flatten() #notice the single brackets

**Combine arrays: `hstack`, `vstack`**

In [None]:
ones = np.ones((1,5))
ones

In [None]:
arr = np.random.randint(0,100, (5,5))
arr

In [None]:
np.vstack((ones, arr))

In [None]:
np.hstack((ones.T,arr))