In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("lab04.ipynb")

# Intro to Numpy 
Welcome to Lab 4 of DATA 271! In this lab we will get practice with the Numpy module. 


## Overview
Numpy (which stands for numerical Python) is one of Python's most vital libraries for data science.  Its key data format is the array (ndarray), which is useful for numerical and scientific computational tasks.  The ndarray is a multidimensional array which provides fast array-oriented arithmetic operations (without having to write loops).  Computations performed this way are called vectorized.  This results in concise code that is easy to read, as well as speed compared to element-by-element computation.


Having familiarity with array-oriented semantics will help us use future tools (like Pandas) more effectively.  Numpy is the foundation for nearly all numerical libraries for Python. 

The main areas of functionality we will focus on are
- fast vectorized array operations for data munging and cleaning, subsetting, filtering, and transforming
- common array algorithms such as sorting, unique and set operations
- using descriptive statistics and aggregating/summarizing data
- group-wise data manipultions

Numpy also has statistical functions, random number functions, a linear algebra library, and other functionality.

### ndarray
The N-dimensional array (ndarray) is the key object in Numpy, and the main data structure.  It is a fast, flexible "container" for large data numerical sets where all entries are of the same type ("homogenous" data).  The data structure also contains important metadata about the array (such as its shape, size, data type).

Its beauty resides in the ability to perform mathematical operations on whole blocks of data using similar syntax to what we use for scalar operations.  Arrays are of fixed size (they cannot be resized without creating a new array).

### Broadcasting
Another trait which is extremely convenient in Numpy is **broadcasting**.  The term broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes. Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python. It does this without making needless copies of data and usually leads to efficient algorithm implementations. 



In [None]:
import numpy as np

### Creating arrays explicitly
We can explictily generate small arrays using the `np.array` function.  The function takes a nested list where each element in the outer list contains the entries for a row in the array.  For larger arrays, it would be tedious to enter data by hand.

In [None]:
data = np.array([[1, 2], [3, 4], [5, 6]]) # create an array with 3 rows and 2 columns by using nested list
data
type(data) # numpy.ndarray
data.ndim # number of dimensions/axes
data.shape # number of rows, then number of columns
data.size # number of elements-- always the product of number of rows * number of columns

**Question 1:** Create an array \begin{bmatrix}
1 & 6 & 3\\
0 & 2 & 1
\end{bmatrix}
and determine the number of dimensions, the shape, and the size.

In [None]:
my_array = ...

number_of_dimensions = ...
shape_of_my_array = ...
size_of_my_array = ...

In [None]:
grader.check("q1")

### Creating arrays that follow specific rules
Numpy has a number of built in functions to create arrays of a specific type
- `np.zeros((3, 4))` creates an array of size 3 x 4 with all zeros
- `np.ones((3, 4))` creates an array of size 3 x 4 with all ones (this could be multipled by a scalar to create an array of all constants)
- `np.full((3, 4), 8)` creates a 3 x 4 array with 8 in each entry
- `np.eye(5)` creates a 5 x 5 array with ones on the diagonal and zeros elsewhere (the identity matrix)
- `np.linspace(0, 100, 6)` creates an array of 6 evenly divided values between 0 and 100
- `np.arange(0, 10, 3)` creates an array of values from 0 up to but not including 10 with step size of 3

Numpy also has a number of functions which create arrays of random numbers.  This is especially useful for simulation (e.g., Monte Carlo).  See the $\pi$ estimation example below.

**Question 2:** Complete the following:
- Assign `A` to a 2 x 3 matrix filled with the number 7
- Assign `B` to a 4 x 4 identity matrix (ones on the diagonal and zeros everywhere else)
- Assign `C` to a one-dimensional array with 5 numbers total which are evenly spaced and with first entry a 2 and last entry a 3
- Assign `D` to a one-dimensional array with the numbers 0, 1, 2
- Assign `E` to a 4 x 4 array with the numbers 0, 5, 10, 15 on the diagonal and zeros elsewhere. Make the elements of the array type int.  
Feel free to use online documentation.  When possible, if there are multiple ways to create the same array, demonstrate several.

In [None]:
# option 1
A = ...
A
# option 2
A = ...
A

In [None]:
B = ...
B

In [None]:
C = ...
C

In [None]:
# option 1
D = ...
D
# option 2
D = ...
D

In [None]:
# option 1
E = ...
E
# option 2
E = ...
E

In [None]:
grader.check("q2")

### Indexing and Slicing
Elements and subarrays of NumPy arrays are accessed using the standard square bracket notation that is also used with Python lists.  In general, the expression in the bracket is a tuple where each item in the tuple is a specifiction of which elements to access from each axis (dimension) of the array.

Subarrays extracted from arrays using slicing and indexing are alternative views of the same underlying array data (they are arrays that refer to the same data in the memory as the original array).  If elements in views are assigned new values, the values of the original array are updated.  Be aware of this.

If you would prefer to have a copy rather than a view (so you don't overwrite original data), you can use the copy() method.  Then changes to the copy do not affect the original array.


In [None]:
# 1 dimensional example
a = np.arange(0, 11)
a

In [None]:
# select the first element
a[0]

In [None]:
# select the last last
a[-1]

In [None]:
# select the 5th element, at index 4
a[4]

In [None]:
# select the second to the second to last elements
a[1:-1]

In [None]:
# select first 5 elements
a[:5]

In [None]:
#select last 5 elements
a[-5:]

In [None]:
# reverse the array and select every second value
a[:: -2]

**Question 3.1:** Create a one dimensional array `array1` with 5 evenly spaced entries starting at 2 and ending at 14. 

In [None]:
array1 = ...
array1

In [None]:
grader.check("q3_1")

**Question 3.2:** Select the third element of `array1`.

In [None]:
third_element = ...
third_element

In [None]:
grader.check("q3_2")

**Question 3.3:** Select the second to last element of `array1`.

In [None]:
second_to_last = ...
second_to_last

In [None]:
grader.check("q3_3")

**Question 3.4:** Select from the second element to last element of `array1`.

In [None]:
second_element_to_last = ...
second_element_to_last

In [None]:
grader.check("q3_4")

**Question 3.5:** Reverse the order of `array1`.

In [None]:
reversed_array1 = ...
reversed_array1

In [None]:
grader.check("q3_5")

### Extracting Columns, Rows, and Subarrays

The cell below contains a NumPy function we have not covered. You might find [this documentation](https://numpy.org/doc/stable/reference/generated/numpy.fromfunction.html) helpful as a reference for understanding it, but it is not necessary to complete the exercise.

In [None]:
# lambda function we will use to populate entries as column number + 10 times row number
f = lambda m, n: n + 10 * m
A = np.fromfunction(f, (6,6), dtype = int)
A

In [None]:
# extract second column
A[:, 1]

In [None]:
# extract 3rd row
A[2, :]

In [None]:
# extract upper diagonal left quadrant
A[:3, :3] 

In [None]:
# extract subarray with every second element
A[::2, ::2]

**Question 4.1:** Create an array `array2`, \begin{bmatrix}
1 & 2 & 3 & 4 & 5\\
6 & 7 & 8 & 9 & 10\\
11 & 12 & 13 & 14 & 15
\end{bmatrix}

In [None]:
array2 = ...
array2

In [None]:
grader.check("q4_1")

**Question 4.2:** Make a copy of `array2` called `array2_copy`. 

In [None]:
grader.check("q4_2")

**Question 4.3:** In `array2_copy`, replace the element in the first row and first column with the number $27$

In [None]:
grader.check("q4_3")

**Question 4.4:** In `array2_copy`, extract a subarray by taking every other element in the second and third rows.

In [None]:
subarray = ...
subarray

In [None]:
grader.check("q4_4")

**Question 4.5:** Verify that the original array has not been modified. 

In [None]:
grader.check("q4_5")

### Fancy and Boolean Valued Indexing
Fancy indexing allows us to index an array with another NumPy array or a Python list.  We can also index with boolean values.
In these instances, the array returned is not a view but a new, independent array.

In [None]:
# create 1d array
A = np.linspace(0, 1, 11) # array with 11 elements evenly spaced between 0 and 1, inclusive
A[np.array([1, 3, 7])] # create a one dimensional array with elements 1, 3, 7 and use this to index values


In [None]:
# we can accomplish the same thing by indexing with a Python list
A[[1, 3, 7]]

In [None]:
# we can also index using a Boolean-valued index array
# extract all values in the original array which are > .5
A[A > .5]
# note: A > .5 returns an array with boolean values true and false

**Question 5.1:** Create the following one-dimenstional array `array3` containing $3,4,6,10,24,89,45,43,46,99,100$. Use Boolean masking to make an array containing all the numbers from `array3` that are not divisible by 3. 

In [None]:
array3 = ...
not_div_3 = ...
not_div_3

In [None]:
grader.check("q5_1")

**Question 5.2:** Using `array3` and Boolean masking, make an array containing all the numbers from `array3` that are divisible by 5. 

In [None]:
div5 = ...
div5

In [None]:
grader.check("q5_2")

**Question 5.3:** Using `array3` and Boolean masking, make an array containing all the numbers from `array3` that are divisible by 3 and by 5. 

In [None]:
div3_and_5 = ...
div3_and_5

In [None]:
grader.check("q5_3")

**Question 5.4:** Using `array3` and Boolean masking, reset these values that are divisible by 3 in the original array to 42.

In [None]:
array3

In [None]:
grader.check("q5_4")

### Reshaping and Resizing Arrays
When working with data in arrays, it can be useful to rearrange the arrays and alter the way they are interpreted.  For example, an $N \times N$ array can be rearranged into a vector of length $N^2$ or several vectors can be concatenated into a longer vector or stacked into a matrix.  Reshaping an array does not modify the underlying array data, and produces a view of the array (if a copy is needed, use `np.copy()`).  It is necessary that the requested new shape match the number of elements in the original array.
The `ravel()` function is a special case of reshape which returns a flattened one dimensional array.
The functions `vstack()` and `hstack()` allows the joining of arrays either vertically or horizontally.

In [None]:
# reshpaes with function reshape()
data = np.array([[1, 2], [3, 4]]) # 2 x 2 array
np.reshape(data, (1, 4)) # reshapes into a 1 x 4 vector

In [None]:
# reshapes with method reshape()
data = np.array([[1, 2], [3, 4]]) # 2 x 2 array
data.reshape(4) # reshapes into a 1 x 4 vector

In [None]:
# reshaping
A = np.array([1, 2, 3, 4, 5, 6])
B = np.reshape(A, (2,3)) # note, product of dimensions must equal # of entries
print(B)

In [None]:
x = np.array([[1, 2, 3], [4, 5, 6]])
np.ravel(x)

In [None]:
data = np.arange(5)
data
np.vstack((data, data, data)) # stacks data as rows vertically

In [None]:
np.hstack((data, data, data)) # stacks data horizontally
# equivalent to concatenating 1d array three times

In [None]:
# ask hstack to treat input as columns to stack horizontally
data = data[:, np.newaxis] # make input arrays 2d with dimensions (1, 5) rather than 1d array of shape (5,)
np.hstack((data, data, data))

**Question 6.1:** Reshape the array  [1,2,3,4,5,6,7,8,9,10,11,12] into an array with 4 rows and 3 columns.

In [None]:
orig_array = np.array([1,2,3,4,5,6,7,8,9,10,11,12])
reshaped_array = ...
reshaped_array

In [None]:
grader.check("q6_1")

<!-- BEGIN QUESTION -->

**Question 6.2:** Is it possible to reshape the previous array into an array with 2 rows and 5 columns? Why or why not?

_Type your answer here, replacing this text._

<!-- END QUESTION -->

**Question 6.3:** Turn this array into a one dimensional array \begin{bmatrix}
1 & 2 & 3 & 4 & 5\\
6 & 7 & 8 & 9 & 10\\
11 & 12 & 13 & 14 & 15
\end{bmatrix}

In [None]:
big_ndarray = np.array([[1,2,3,4,5],[6,7,8,9,10],[11,12,13,14,15]])
flattened_array = ...
flattened_array

In [None]:
grader.check("q6_3")

**Question 6.4:** Take the row $[1, 2, 3]$ and use array operations to build the following matrices `m1` and `m2`:
    $\begin{bmatrix}
1 & 2 & 3\\
1 & 2 & 3 \\
1 & 2 & 3 
\end{bmatrix}$ and 
$\begin{bmatrix}
1 & 1 & 1\\
2 & 2 & 2 \\
3 & 3 & 3 
\end{bmatrix}$

respectively.

In [None]:
base_array = np.array([1,2,3])
m1 = ...
print(m1)
m2 = ...
print(m2)

In [None]:
grader.check("q6_4")


### Broadcasting
Numpy allows the user to perform element-wise operations on arrays of different shapes by broadcasting them to a common shape.  This lets us add a scalar value to each element of an array, or to add two arrays of different shapes by automatically expanding the dimension of the smaller array.

In [None]:
A = np.array([1, 2, 3, 0])
B = 2
print(A+B) # note that the scalar 2 was added to each element of the array A

In [None]:
a = np.array([[ 0.0,  0.0,  0.0],
              [10.0, 10.0, 10.0],
              [20.0, 20.0, 20.0],
              [30.0, 30.0, 30.0]])
b = np.array([1.0, 2.0, 3.0])
# b is added to each row of a
a + b

**Question 7.1:** In the Fibonacci Series, each number in the sequence is the sum of the two numbers that precede it. So, the sequence goes: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34,...  

Binet's formula (derived by mathematician Jacques Philippe Marie Binet) is an explicit formula used to find the $n$th term of the Fibonacci sequence. It is given by:

$$F_n=\frac{1}{\sqrt{5}}\left(\left(\frac{1+\sqrt{5}}{2}\right)^n-\left(\frac{1-\sqrt{5}}{2}\right)^n\right)$$

Use NumPy and Binet's formula to make a 1d array containing the first 15 numbers (ints) in the Fibonacci Series. 

*HINT:* You may have to round your result from Binet's formula and then change the type. 

In [None]:
n = ...
Fn = ...
Fn

In [None]:
grader.check("q7_1")

### Sorting

In [None]:
# sort rows of array
arr = np.array([[12, 15, 7], [13, 5,11], [8, 6, 10],[45,54,70]]) 
arr2 = np.sort(arr)  
arr2

In [None]:
# Sort all elements in a multi-dimensional array
arr = np.array([[12, 15, 7], [13, 5,11], [8, 6, 10],[45,54,70]]) 
arr2 = np.sort(arr, axis= None) 
arr2

In [None]:
# sort columns of array/first (0th) axis
arr2 = np.sort(arr, axis = 0)
arr2

In [None]:
# Sort the array alphabetically
arr = np.array([['orange','mango','grapes'], ['banana','cherry','apple'], ['papaya','watermelon','jackfruit']]) 
arr2 = np.sort(arr)  
arr2

In [None]:
# sort in descending order
array_a = np.array([5,8,6,12])
array_a[::-1].sort() # descending order
print(array_a)
# or
array_b = np.array([5,8,6,12])
print(np.sort(array_b[::-1]))
print(array_b) # The original array is not changed

**Question 8.1:** Sort the columns of array

$\begin{bmatrix}
4 & 2 & 3 & 4 & 5\\
9 & 7 & 8 & 1 & 10\\
11 & 12 & 13 & 15 & 14
\end{bmatrix}$



In [None]:
given_array = ...
sorted_by_columns = ...
sorted_by_columns

In [None]:
grader.check("q8_1")

**Question 8.2:** Sort the rows of array

$\begin{bmatrix}
4 & 2 & 3 & 4 & 5\\
9 & 7 & 8 & 1 & 10\\
11 & 12 & 13 & 15 & 14
\end{bmatrix}$

In [None]:
sorted_by_rows = ...
sorted_by_rows

In [None]:
grader.check("q8_2")

**Question 8.3:** Sort the array

$\begin{bmatrix}
4 & 2 & 3 & 4 & 5\\
9 & 7 & 8 & 1 & 10\\
11 & 12 & 13 & 15 & 14
\end{bmatrix}$

elementwise. (Return a 1d array)

In [None]:
sorted_elementwise = ...
sorted_elementwise

In [None]:
grader.check("q8_3")

**Question 8.4:** Sort the array

$\begin{bmatrix}
4 & 2 & 3 & 4 & 5\\
9 & 7 & 8 & 1 & 10\\
11 & 12 & 13 & 15 & 14
\end{bmatrix}$

elementwise in descending order. (Return a 1d array)

In [None]:
sorted_elementwise_descending = ...
sorted_elementwise_descending

In [None]:
grader.check("q8_4")

### Random Numbers

Numpy provides the random module to work with random numbers from various distributions (e.g., uniform, normal, etc.).

In [None]:
from numpy import random
# Create a 2 x 3 array of random data from a standard normal distribution
data = np.random.randn(2,3) # n is for normal
data

In [None]:
# multiply each element in the array by 10
data * 10

In [None]:
# add two arrays together element-wise
data + data

In [None]:
# note, the original array has not changed
data

In [None]:
random_normal = np.random.normal(0, 2, 30) # generates 30 random numbers from the normal distribution with mean 0 and standard deviation 2
print(random_normal)
random_normal2 = np.random.normal(0, 2, size = (2, 4)) # can control size of array with optional parameter
print(random_normal2)

### Task (Main Course)
In this task, we will practice simulating random events like flipping a coin.

**Question 9.1:** Write a simulation for flipping a fair coin 5000 times to estimate $P(tails)$. 

In [None]:
def coin_flip(...):
    results = ... # this will hold the total # of tails
    for ... in ...:
        flip_result = ... # flip the coin 
        results += ... 
    ...
    
prob_tail = ...

        

prob_tail

In [None]:
grader.check("q9_1")

<!-- BEGIN QUESTION -->

**Question 9.2:** In the previous part, do you get the same answer every time? Explain. 

_Type your answer here, replacing this text._

<!-- END QUESTION -->

**Question 9.3:** Repeat the first part of this problem but include `np.random.seed(1)` at the beginning of your cell.  Run your code several times and revisit the question.

In [None]:
        

prob_tail_seed

In [None]:
grader.check("q9_3")

<!-- BEGIN QUESTION -->

**Question 9.4:** What happens if you increase the number of coin flips to 50000 or 500000? Explain what you see and why you think it happens. 

_Type your answer here, replacing this text._

<!-- END QUESTION -->

### Task (Optional Dessert)
- choose an image for this task (a photo you took, a photo from the web, etc.).
- display the image.
- display its negative.
- display the image rotated 90 degrees clockwise (different than example).
- display the image as a grayscale image.
- display the image in each of its color channels.
- crop part of the image and display it.  (Choose a cropping that is visually interesting based on your image and experimentation.)
- choose one other image and blend the two images.  Play with weights until you are happy with the results.

# You're done!

Gus is so happy you made it to the end! Run the cell below to download the zip and submit to Canvas. 

<img src="gus_a_loaf_of_bread.JPG" alt="drawing" width="300"/>

### References
- Numerical Python: Scientific Computation and Data Science Applications wtih Numpy, SciPy and Matplotlib by Robert Johansson
- Data Science with Python: Probabilistic Modeling.  https://www.cdslab.org/python/notes/probabilistic-modeling/random-numbers/random-numbers.html

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(pdf=False, run_tests=True)