# Hw 2: More NumPy 📈

Name: Riley Hager

Student ID: rhager


## Instructions

Work your way through these problems. They are will provide some more practice for working with NumPy arrays.

### Submission instructions
* Submit this python notebook including your answers in the code cells as homework submission.
* **Do not change the number of cells!** Your submission notebook should have exactly one code cell per problem. 
* Do **not** remove the `# your code here` line and add you solution after that line. 

## 1. Let's Get Started!

In [1]:
# run me!
import numpy as np

## 2. Timing Comparison

There were some questions about why we emphasize NumPy functions and indexing so intensely. To help explain and motivate this, let's do some performance comparisons.

For this experiment, we will be testing various operations with random arrays. Here is an example of how you can generate a random array using `np.random.rand`.

In [2]:
np.random.rand(10)

array([0.56621448, 0.04253629, 0.02874651, 0.68793891, 0.12004592,
       0.33842107, 0.82144161, 0.34007526, 0.06004973, 0.871457  ])

### Problem 1
**Implement this!** Complete the function below so that it returns a random array of size `n`. Assign your new array to the `result` variable.

In [3]:
def generate_random_array(n):
    '''Returns a random array of a given shape.'''
    
    # your code here
    result = np.random.rand(n)
    return result

Let's try it out! You should see an array similar to the one from earlier.

In [4]:
generate_random_array(10)

array([0.33821796, 0.19235371, 0.30387634, 0.05601109, 0.1213902 ,
       0.30458489, 0.06956751, 0.57644071, 0.73416271, 0.5104259 ])

### 2.1 Summing Arrays

Let's try a sum operation. There are two ways of computing the sum of an array using built-in features. The first is to use a `for` loop and iterate through the values in an array. The second is to use the `sum` function.

### Problem 2

**Implement this!** Complete the function below so that it returns the sum of an array `a`. Again, assign your result to the `result` variable.

In [5]:
def loop_sum(a):
    '''Computes the sum of array A'''
    
    # your code here
    result = 0
    for i in a:
      result += i
    return result

Let's check that your function works.

In [6]:
an_array = generate_random_array(100)

'Nice!' if loop_sum(an_array) == sum(an_array) else 'Something went wrong...'

'Nice!'

### Problem 3

Now that we have a working `for` loop sum implementation, `loop_sum`, let's compare its performance to both the Python built-in `sum` function and the NumPy `np.sum` function. We will use an [IPython magic command 🧙‍♀️](https://ipython.readthedocs.io/en/stable/interactive/tutorial.html#magic-functions) (I'm not kidding) called `%timeit`. This function will run a given command repeatedly and report back the mean runtime and standard deviation.

In [7]:
another_array = generate_random_array(10000)
%timeit loop_sum(another_array)
%timeit sum(another_array)
%timeit np.sum(another_array)

1.49 ms ± 38.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.22 ms ± 61.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
8.99 µs ± 1.42 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)


**Write up!** What do you notice about these outputs? What would happen if you added more dimensions to your array? What do these results tell us?

### 2.2 Finding a Value

In Lab 0 and HW 1, we needed to find the age and class of the student from a roster who would graduate first. Let's use this set up to do another comparison. Here is the data that we worked with.

In [8]:
names = ['Billy','Meghan','Jeff', 'Alex','Cate']
roster = np.array([
    [50, 2021],
    [18, 2020],
    [21, 2019],
    [21, 2021],
    [21, 2020]
])

In Problem 7 from HW 1, we used the Python built-in `min` function with a lambda to accomplish this for a `List` version of the roster. What would happen if we had used the same method to do the same for a NumPy array?

In [9]:
min(roster, key=lambda student: student[1])

array([  21, 2019])

An appropriate NumPy equivalent of the code is this:

In [10]:
roster[np.argmin(roster[:, 1])]

array([  21, 2019])

### Problem 4

Before we check out the performance of each of these implementations, let's expand our roster a bit. The following cell, generates a new roster with entries for 1000 students.

In [11]:
roster = np.array([np.random.randint(16, 100, size=1000),
                   np.random.randint(2018, 2022, size=1000)]).T

Here's a preview of the new roster containing the first ten rows in the array:

In [12]:
roster[:10, :]

array([[  29, 2020],
       [  48, 2018],
       [  70, 2018],
       [  19, 2018],
       [  77, 2020],
       [  42, 2021],
       [  32, 2018],
       [  19, 2019],
       [  62, 2019],
       [  19, 2020]])

Now we are ready to evaluate these implementations using the `%timeit` magic command from earlier. 

In [13]:
%timeit min(roster, key=lambda student: student[1])
%timeit roster[np.argmin(roster[:, 1])]

529 µs ± 87.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
The slowest run took 5.08 times longer than the fastest. This could mean that an intermediate result is being cached.
11.7 µs ± 4.73 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)


**Write up!** What do you notice about these outputs? What would happen if you added more dimensions to your array? What do these results tell us?

## 3. Indexing Review

Indexing, especially with NumPy, can be a tricky feature to truly wrap one's head around, but it the benefits of (working towards) mastering it make it a worthy endeavor. The more practice you get, the easier it will become — eventually you won't be able to even imagine how else you would do things.

In this section, we will start with some review and then move on to more complex features.

### 3.1 Basic Indexing

At the risk of belaboring this topic, let's quickly do some practicing. The following cell produces `yet_another_array` using the `arange` [(array range) NumPy function](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.arange.html), which we will use in the next few problems.

In [14]:
yet_another_array = np.arange(10)
yet_another_array

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

### Problem 5

**Implement this!** Retrieve the 4th element of `yet_another_array`.

In [15]:
# your code here
yet_another_array[3]

3

### Problem 6

Here's something new: specifically for NumPy arrays, you can also pass in a Python list or NumPy array of indicies to retrieve.

For example, `some_array[[0, 2, 4, 6]]`.


**Implement this!** Retrieve the 2nd, 5th, and 9th elements of `yet_another_array`.

In [19]:
# your code here
index = [1, 4, 8]
yet_another_array[index]

array([1, 4, 8])

### Problem 7

Let's do this with a 2D array, too. Here we will generate `a_2d_array` using the `np.random.rand` function.

In [20]:
a_2d_array = np.random.rand(5, 5)
a_2d_array

array([[0.56754086, 0.5099704 , 0.96914677, 0.39276426, 0.10802423],
       [0.95095915, 0.60158841, 0.55307174, 0.9720697 , 0.26290719],
       [0.65667373, 0.61580384, 0.82029209, 0.6463334 , 0.23058473],
       [0.10846001, 0.25015906, 0.93812701, 0.45591671, 0.85277474],
       [0.77582225, 0.89822228, 0.03083841, 0.75859204, 0.24514656]])

Remember that you can index into a 2D array like this: `another_2D_array[row, column]`, where `row` and `column` are indicies, slices, `:`s, or some mix of these three.

**Implement this!** Retrieve the value at position (3, 4) from `a_2d_array`.

In [21]:
# your code here
a_2d_array[3, 4] #i am assuming that position(3, 4) was meant literally (instead of requesting the 3rd row and 4th column, which would be indexed with [2, 3]).

0.8527747374898264

### 3.2 Slice Indexing

You've already gotten familiar with slices, but here is some more practice.

### Problem 8

**Implement this!** Retrieve the 6th, 7th, and 8th elements of `yet_another_array`.

In [23]:
# your code here
yet_another_array[6:9]

array([6, 7, 8])

### Problem 9

Again, we can do this with a 2D array, too. Here is a reminder of what `a_2d_array` looks like.

In [24]:
a_2d_array

array([[0.56754086, 0.5099704 , 0.96914677, 0.39276426, 0.10802423],
       [0.95095915, 0.60158841, 0.55307174, 0.9720697 , 0.26290719],
       [0.65667373, 0.61580384, 0.82029209, 0.6463334 , 0.23058473],
       [0.10846001, 0.25015906, 0.93812701, 0.45591671, 0.85277474],
       [0.77582225, 0.89822228, 0.03083841, 0.75859204, 0.24514656]])

**Implement this!** Retrieve the values in the 3rd column of `a_2d_array`.

In [25]:
# your code here
a_2d_array[:,2]

array([0.96914677, 0.55307174, 0.82029209, 0.93812701, 0.03083841])

### 3.3 Logical Indexing

Now for something complete new! You might be interested in finding all of the values in an array that fulfills some condition like _get all values greater than 5_. NumPy enables this be supporting [**logical indexing**](https://docs.scipy.org/doc/numpy-1.13.0/user/basics.indexing.html#boolean-or-mask-index-arrays).

This idea is also referred to as **masking**. You can use an array of boolean values _of the same shape_ as the target array as the "index" into the target array. This will return all of the values that are in the same position as the `True` values from the boolean arrays. These logical arrays are called "masks" because they are analogous to masks, which let some of the underlying surface show through but hide the rest.

Let's try making our first logical array. In the cell below, we use a comparison operation on `yet_another_array`, returning an array of the same size with `True` in the positions where the values meet the condition.

In [26]:
yet_another_array > 5

array([False, False, False, False, False, False,  True,  True,  True,
        True])

Now let's see how this array "lines up" with `yet_another_array`.

In [27]:
yet_another_array

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Here is another example using an array that is not sorted already.

In [28]:
shuffled_array = np.random.permutation(yet_another_array)
shuffled_array

array([7, 6, 3, 0, 1, 2, 9, 8, 5, 4])

In [29]:
shuffled_array > 5

array([ True,  True, False, False, False, False,  True,  True, False,
       False])

Is this what you would expect? Let's actually get those values.

In [30]:
shuffled_array[shuffled_array > 5]

array([7, 6, 9, 8])

### Problem 10

**Implement this!** Retrieve the elements of `yet_another_array` that are even (hint: `%`).

In [32]:
# your code here
yet_another_array[yet_another_array%2 == 0]

array([0, 2, 4, 6, 8])

And that's it!