# Day 8: NumPy Arrays and Efficiency

### &#9989; Write your name here

Last class, we started to use libraries, including NumPy, which is a collection of numerical functions that center around creating and manipulating NumPy arrays, or just arrays for short. Arrays are like lists in their structure, but they offer the advantage of using the many numerical functionalities in the NumPy library.

For example, we can apply arithmetic operations to entire NumPy arrays at once. This enables us to avoid using *some* (not all) loops, specifically when the loop operation could be applied in parallel (using NumPy) rather than sequentially (like in a loop). The pseudocode below shows the comparison. 

```
L = {a list}
A = {an array}
operate = {a numerical function}

M = []
for i in range(len(L)):
    M.append(operate(L[i]))

B = operate(A)
```

The example function `operate` could be as simply as multiplying by 2, or more complex like converting values from mass to period using some physics formula. The new list and array, `M` and `B`, represent the resulting values that `operate` outputted. The code for the array manipulation was much more simple than for the list, which required a loop.

In this assignment, you will explore not only the differences in **syntax and structure**, but also the **computational efficiency** of using NumPy.

---

### Part 1: Creating NumPy arrays

The exercises in this part are designed to give you practice creating NumPy arrays. Functions like `arange` and `linspace` can be used to create arrays of equally spaced numbers (like advanced versions of `range`). You can also create arrays without any space or information (like the empty list `[]`), and arrays pre-allocated with lots of space (like a list of 50 zeros: `50 * [0]`).

**&#9989; Task 1.1:** `arange` is used to create an array with a **specified step-size** between each number. Create an array with a step-size of 0.01 between elements, and print it out (you can choose the start/end points).

In [None]:
# your answer here

**&#9989; Task 1.2:** `linspace` is used to create an array with a **specified number** of equally spaced elements. Create an array with 100 equally spaced elements, using the same start/end point you chose above, and print it out.

In [None]:
# your answer here

**&#9989; Task 1.3:** `array([])` can be used to create an "empty" array, analogous Python's empty list (`[]`). You can also use a NumPy version of `append`, called `np.append` (*the `np` prefix is only included here to distinguish between `append`*). Using these NumPy functions, create an empty array and append an element to it. Print out the resulting array.

*Hint: `np.append` uses different syntax than `append`. If you are getting errors, look up how the functions work to see the difference.*

In [None]:
# your answer here

Another way to initialize a NumPy array is to *pre-allocate* (as opposed to starting with a completely empty array without any elements). Pre-allocating can be done using `zeros` and `zeros_like`. Look them up to see how they work and what syntax is required to use them.

**&#9989; Task 1.4:** Create an array with 50 zeros, and print it out.

In [None]:
# your answer here

**&#9989; Task 1.5:** Create an array of zeros with the same structure as the provided array `array15`, and print it out.

In [5]:
array15 = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15]])

In [None]:
# your answer here

---

### Part 2: Testing efficiency using `time`

We can another library, `time`, to help demonstrate the time-efficiency of different chunks of code. After importing `time`, we can use the `time.time` function to grab the exact time from the computer's processor. By using this function multiple times, we can compute how long the computer took to run different chunks of code. See below for some examples, which are explained in the comments and the printed output.

In [19]:
import time

# we can use time.time() to grab the computer's time when that line of code runs

t0 = time.time()
# no computations here
t1 = time.time()

# we can print the difference to see how long it took to run the code in between
print("time to do no computations:", t1 - t0)

# timing a small loop
# we don't include list initialization in the timing
test_list0 = []

t2 = time.time()
for i in range(1):
    test_list0.append(i)
t3 = time.time()

print("time to loop and append 1 time:", t3 - t2)

# timing a longer loop
test_list1 = []

t4 = time.time()
for i in range(10000):
    test_list1.append(i)
t5 = time.time()

print("time to loop and append 10,000 times:", t5 - t4)

# we can compare timings directly by computing a ratio
ratio = (t5 - t4) / (t3 - t2)
print("looping and appending 10,000 times took", ratio, "times longer")

time to do no computations: 5.888938903808594e-05
time to loop and append 1 time: 0.00017404556274414062
time to loop and append 10,000 times: 0.0015931129455566406
looping and appending 10,000 times took 9.153424657534247 times longer


**&#9989; Task 2.1:** Write your own explanation of computing time efficiency by answering these questions:
- What is the point of using the `time.time` function multiple times in one chunk of code?
- How the placement of the `time.time` function calls affect the result of computing time-efficiency?
- How could `time.time` be used to compare two different equivalent computations, one using lists, and the other using NumPy arrays?

**/your answer here/**

**&#9989; Task 2.2:** The code below is a solution to Day 5, Task 3.1, when you were asked to calculate height values from a kinematic equation. The code **uses a loop** to compute the values one by one, and then **appends them** to a list, before printing them out.

Using `time.time`, measure the time it takes for the computer to create the full list of heights from scratch. Do **not** include the time it takes for printing, just the computation itself.

In [21]:
# add code to this cell

g = 9.81

# set initial values and time-step
y0 = 2
vy0 = 44
t = 0
delta_t = 0.1

y_vals = []

# run the loop for 50 time-steps
for i in range(50):
    # calculate the next value for y using kinematic equation
    y = y0 + vy0 * t - 0.5 * g * t ** 2

    # append the next y value to the list
    y_vals.append(y)

    t += delta_t

print(y_vals)

[2.0, 6.35095, 10.603800000000001, 14.758550000000003, 18.8152, 22.77375, 26.6342, 30.396549999999998, 34.06079999999999, 37.626949999999994, 41.09499999999999, 44.464949999999995, 47.736799999999995, 50.91055, 53.986200000000004, 56.96375000000001, 59.84320000000002, 62.62455, 65.30780000000001, 67.89295000000001, 70.38000000000001, 72.76895000000002, 75.05980000000001, 77.25255000000001, 79.34720000000002, 81.34375000000001, 83.24220000000003, 85.04255, 86.74480000000001, 88.34895000000002, 89.85500000000002, 91.26295000000002, 92.57280000000003, 93.78455000000002, 94.89820000000003, 95.91375000000002, 96.83120000000002, 97.65055000000002, 98.37180000000002, 98.99494999999999, 99.52000000000001, 99.94695, 100.2758, 100.50655000000002, 100.6392, 100.67375, 100.61019999999999, 100.44854999999998, 100.18880000000001, 99.83094999999999]


In [47]:
# add code to this cell

g = 9.81


In [48]:


# set initial values and time-step
y0 = 2
vy0 = 44
t = 0
delta_t = 0.1


In [49]:


y_vals = []


In [50]:

# run the loop for 50 time-steps
for i in range(50):
    # calculate the next value for y using kinematic equation
    y = y0 + vy0 * t - 0.5 * g * t ** 2

    # append the next y value to the list
    y_vals.append(y)

    t += delta_t

#print(y_vals)

In [21]:
# add code to this cell

g = 9.81

# set initial values and time-step
y0 = 2
vy0 = 44
t = 0
delta_t = 0.1

y_vals = []

# run the loop for 50 time-steps
for i in range(50):
    # calculate the next value for y using kinematic equation
    y = y0 + vy0 * t - 0.5 * g * t ** 2

    # append the next y value to the list
    y_vals.append(y)

    t += delta_t

print(y_vals)

[2.0, 6.35095, 10.603800000000001, 14.758550000000003, 18.8152, 22.77375, 26.6342, 30.396549999999998, 34.06079999999999, 37.626949999999994, 41.09499999999999, 44.464949999999995, 47.736799999999995, 50.91055, 53.986200000000004, 56.96375000000001, 59.84320000000002, 62.62455, 65.30780000000001, 67.89295000000001, 70.38000000000001, 72.76895000000002, 75.05980000000001, 77.25255000000001, 79.34720000000002, 81.34375000000001, 83.24220000000003, 85.04255, 86.74480000000001, 88.34895000000002, 89.85500000000002, 91.26295000000002, 92.57280000000003, 93.78455000000002, 94.89820000000003, 95.91375000000002, 96.83120000000002, 97.65055000000002, 98.37180000000002, 98.99494999999999, 99.52000000000001, 99.94695, 100.2758, 100.50655000000002, 100.6392, 100.67375, 100.61019999999999, 100.44854999999998, 100.18880000000001, 99.83094999999999]


**&#9989; Task 2.3:** Copy your code and change it to use NumPy arrays and the `np.append` function. Once again, measure the time it takes for the computer to create the full array of heights from scratch.

In [None]:
# your answer here

**&#9989; Task 2.4:** Copy the code from Task 2.2 again, and change it to use a **pre-allocated** list (a line of code has been added below to help with this). Once again, measure the time it takes for the computer to create the full list of heights from scratch.

In [None]:
# your answer here

y_vals = 50 * [0]

**&#9989; Task 2.5:** Copy your code and change it to use a pre-allocated NumPy array instead of a list. Again, measure the time it takes for the computer to create the full array of heights from scratch.

In [None]:
# your answer here

**&#9989; Task 2.6:** Finally, re-write the solution **without using loops**, and using NumPy operations instead. Your solution structure could be slightly different this time. One more time, measure the time it takes for the computer to create the full array of heights from scratch.

In [45]:
# your answer here

**&#9989; Task 2.7:** Compare the timings across all different solutions -- what do you notice?

**/your answer here/**

#### &#128721; **Stop here and check your progress with an instructor.**

---

### Part 3: Comparing efficiencies at different scales

At larger scales, the differences between approaches stand out much more. In this part, you will explore what that can look like.

**&#9989; Task 3.1:** Take the same code from Task 2.2, and scale it up to compute 10,000 values of height. Time how long it takes for the computer to compute the heights into a list.

In [None]:
# your answer here

**&#9989; Task 3.2:** Repeat the four other solution approaches for 10,000 values of height: using Numpy's version of append, pre-allocating a list, pre-allocating a NumPy array, and directly using NumPy operations without loops. Time all four approaches.

In [None]:
# your answer here

**&#9989; Task 3.3:** Continue testing and timing different scales of computation. Fill out the table below with the times it takes to do each approach at each scale. You can double-click to edit the information in the table.

**Fill this table out:**

| Solution Approach / Number of Values         | 50 | 10,000 | 500,000 | 10,000,000 |
| -------------------------------------------- | -- | ------ | ------- | ---------- |
| Appending a list with a loop                 |    |        |         |            |
| Appending a array with a loop                |    |        |         |            |
| Filling a pre-allocated list with a loop     |    |        |         |            |
| Filling a pre-allocated array with a loop    |    |        |         |            |
| Using NumPy operations directly with no loop |    |        |         |            |

**&#9989; Task 3.4:** Compare the different approaches along these categories:
- Efficiency
- Syntax and structure of code
- Your conceptualization of how the computer constructs the list/array

For the last one, it may be helpful to draw out ideas on a whiteboard with other students.

**/take discussion notes here/**

#### &#128721; **Stop here and check your progress with an instructor.**