# Lecture 4, Part 3 – NumPy

## CSS Summer Bootcamp, Week 1 🥾

#### Suraj Rampure

## Review: lists

### Some questions...

**Question:** Can a list have values of different types?

**Answer:** Sure!

In [None]:
['ucsd', 23, 1998]

**Question:** What happens if I use the `+` symbol between two lists?

**Answer:** The lists are concatenated.

In [None]:
[3, 4, 5] + [6, 7, 8]

**Question:** How do I apply some function or operation (e.g. add 5) to every element of a list?

**Answer:** Use a `for`-loop and add values one-by-one.

In [None]:
numbers = [5, 4, 9, 12, 18]
new_numbers = []
for num in numbers:
    new_numbers.append(num + 5)
    
new_numbers

## NumPy

### NumPy

<center><img src='images/numpy.png' width=20%></center>

- NumPy (pronounced "num pie") is a Python library (module) that provides support for **arrays** and operations on them.
- The pandas library, which you will learn about next week, goes hand-in-hand with NumPy.
    - NumPy is used heavily in the real world.

In [None]:
import numpy as np

### Arrays

Think of NumPy arrays (just "arrays" from now on) as fancy lists.

<center><img src='images/squid.png' width=30%></center>

To create a new NumPy array from scratch, we first create a list containing the elements we want, and then pass it into the `np.array` function.

In [None]:
np.array([4, 9, 1, 2])

In [None]:
np.array(['how', 'are', 'you'])

In [None]:
np.array([])

<center><img src='images/np-array.png' width=70%></center>

In [None]:
numbers = [5, 4, 9, 12, 18]
numbers

In [None]:
type(numbers)

In [None]:
numbers_arr = np.array(numbers)
numbers_arr

In [None]:
type(numbers_arr)

## Operations on every element

### Operations on every element

Arrays make it easy to apply an operation to every element, **without needing a `for`-loop**. This behavior is formally known as "broadcasting".

In [None]:
numbers_arr

In [None]:
numbers_arr * 2

In [None]:
numbers_arr - 5

In [None]:
numbers_arr // 2

In [None]:
numbers_arr ** 2 - 1

In [None]:
numbers_arr > 4

When we write an arithmetic expression involving an array and a single number, the result is a new array, containing the result of that arithmetic expression for each element of the array.


Not only is this faster for us – we don't have to write out an entire `for`-loop – but it's also faster for the computer. [Read more here](https://towardsdatascience.com/how-fast-numpy-really-is-e9111df44347) (and you'll see more in the lab).

### More examples: `np.log`, `np.sqrt`, `np.sin`, etc.

NumPy has many, many more functions built-in that can be applied to every element of an array.

In [None]:
numbers_arr

In [None]:
np.sqrt(numbers_arr)

In [None]:
np.mean(numbers_arr)

In [None]:
np.log(numbers_arr)

In [None]:
np.sin(numbers_arr)

In [None]:
# np.sqrt can work on individual numbers or arrays
np.sqrt(144)

<h3><span style='color:purple'>Activity</span></h3>

What is the largest value in the `tips_pct` array after running the following lines of code?

```py
my_tips = [0.15, 0.16, 0.22, 0.39]
your_tips = [0.25, 0.19, 0.08]
tips = np.array(my_tips + your_tips)
tips_pct = 100 * tips
```

**Try and answer WITHOUT running any code.**

## Element-wise operations

### Element-wise operations

- We can apply arithmetic operations to multiple arrays, provided they have the same length. 
- The result is computed **element-wise**, which means that the arithmetic operation is applied to one pair of elements from each array at a time.
- For example, `a + b` is an array whose first element is the sum of the first element of `a` and first element of `b`.

In [None]:
a = np.array([1, 2, 3])
b = np.array([-4, 5, 9])

In [None]:
a + b

In [None]:
a - 2 * b

In [None]:
a ** 2 + b ** 2

### Example: Population growth

For example, suppose we have two arrays containing the populations of several states in 2021 and 2022. Assume each array contains information about the same states in the same order.

We can compute the percent change from 2019 to 2020 like so:

In [None]:
pop_2021 = np.array([100, 55, 23, 91, 121])
pop_2022 = np.array([101, 45, 23, 93, 118])

In [None]:
# Change from 2021 to 2022
pop_2022 - pop_2021

In [None]:
# Percent change
100 * (pop_2022 - pop_2021) / pop_2021

<h3><span style='color:purple'>Activity</span></h3>

Suppose we run the following three lines of code. Will we run into an error? Why or why not?
```py
a = np.array([3, 4, 5])
b = np.array([7, -4.0, 5])
c = (a + b) / (a - b)
```

**Try and answer WITHOUT running any code.**

### Example: Harmonic mean

On Monday, we defined a function that takes in two numbers as input and returns their `harmonic_mean`.

In [None]:
def harmonic_mean(a, b):
    return 2 / (1 / a + 1 / b)

In [None]:
harmonic_mean(60, 80)

What happens if we provide arrays as inputs to `harmonic_mean`?

In [None]:
speed_1 = np.array([60, 20, 30])
speed_2 = np.array([40, 50, 60])

In [None]:
harmonic_mean(speed_1, speed_2)

Answer: we get back a new array containing 3 harmonic means.

### Generalized harmonic mean

Previously, we saw that the formula for the harmonic mean of two (positive) numbers is

$$H(a, b) = \frac{2}{\frac{1}{a} + \frac{1}{b}}$$

To compute the harmonic mean of $n$ (positive) numbers, a more general formula is

$$H(a_1, a_2, ..., a_n) = \frac{n}{\frac{1}{a_1} + \frac{1}{a_2} + ... + \frac{1}{a_n}}$$

In [None]:
numbers_arr

In [None]:
1 / numbers_arr

In [None]:
np.sum(1 / numbers_arr)

In [None]:
len(numbers_arr) / np.sum(1 / numbers_arr)

Let's define a function that can take in any array of (positive) numbers and return their harmonic mean.

In [None]:
def harmonic_mean_arr(arr):
    return len(arr) / np.sum(1 / arr)

In [None]:
harmonic_mean_arr(np.array([60, 80]))

In [None]:
harmonic_mean_arr(np.array([60, 80, 70, 90, 50, 20, 30, 10]))

## Other features of arrays

### Accessing individual elements

To access individual elements of an array, we can use the same indexing notation that we’re used to with lists.

```py
arr[i]
```

In [None]:
pop_2022

In [None]:
pop_2022[0]

In [None]:
pop_2022[-2]

Arrays also have a `.item` method that lists do not, that (mostly) works the same as "regular" indexing.

In [None]:
pop_2022.item(0)

### Automatic type conversion

Unlike lists, all elements in an array must be the same type (e.g. all ints, all floats, all strings, all bools).

If you create an array from a list whose elements are of different types, NumPy converts all values to the same type when creating the array.

In [None]:
some_values = np.array([2, 3, 3.5, 4, False])
some_values

In [None]:
some_values[0]

In [None]:
# All converted to strings!
other_values = np.array([9, 8, 'hello', -14.5])
other_values

### Array methods

Remember, methods are functions that we call with dot syntax. There are several array methods that make it easy to calculate values of interest.

Here are some examples. Note that in these examples (but not generally), we can use the equivalent forms

```py
arr.func_name()
```

or

```py
np.func_name(arr)
```

In [None]:
pop_2022

In [None]:
np.sum(pop_2022)

In [None]:
pop_2022.sum()

In [None]:
np.prod(pop_2022)

In [None]:
pop_2022.mean()

In [None]:
np.mean(pop_2022)

### Ranges

`np.arange` works very similarly to the `range` function you're already used to, except that it returns an array.

In [None]:
np.arange(10)

In [None]:
np.arange(3, 13, 3)

In [None]:
# 2^0, 2^1, 2^2, ..., 2^9
2 ** np.arange(10)

In [None]:
# 1^2 + 2^2 + 3^2 + ... + 100^2
np.sum(np.arange(101) ** 2)

### Append

There is no `append` **method** for arrays:

In [None]:
pop_2022

In [None]:
pop_2022.append(500)

However, there is a `numpy` **function** called `append` that we can use to add elements to the end of an array:

In [None]:
np.append(pop_2022, 500)

Unlike the list `append` method, `np.append` **doesn't modify** the original array – if you want to make any changes, you need to save them yourself.

In [None]:
pop_2022

In [None]:
np.append(pop_2022, 500)

In [None]:
pop_2022

In [None]:
pop_2022 = np.append(pop_2022, 500)
pop_2022

### Even more functions...

There are far too many functions built into NumPy for us to cover them all in this lecture (or any lecture).

Google and the [documentation](https://numpy.org/doc/1.23/user/whatisnumpy.html) are your friends!

In [None]:
pop_2022

In [None]:
# Cumulative sum: for each element, add all elements so far
np.cumsum(pop_2022)

In [None]:
# Difference: takes the differences of consecutive elements
np.diff(pop_2022)

In [None]:
# count_nonzero: counts the number of elements that are not 0
np.count_nonzero(np.array([2, 3, 0, -5, 0, 4]))

## Randomness

### Randomness

The `np.random` submodule contains functions that allow us to generate "random" numbers.

One such function is `np.random.random`, which turns a random `float` between 0 and 1.

In [None]:
# A single random float
np.random.random()

In [None]:
# 10 random floats
np.random.random(10)

Another such function is `np.random.randint`, which returns a random `int` between the two specified endpoints.

In [None]:
# Like rolling a die!
np.random.randint(1, 7)

In [None]:
# Like rolling a die 10 times!
np.random.randint(1, 7, size=50)

Another relevant function is `np.random.choice`, which returns a randomly selected element from an array.

In [None]:
# Like flipping a coin!
np.random.choice(['Heads', 'Tails'])

In [None]:
# Like flipping a coin, 10 times!
np.random.choice(['Heads', 'Tails'], size=10)

In [None]:
# Like flipping a biased coin, 10 times!
np.random.choice(['Heads', 'Tails'], size=10, p=[0.65, 0.35])

### Example: flipping coins

Suppose you flip a fair coin 100 times. How many heads would you expect to see?

<center>50.</center>

But 45 or 57 wouldn’t be that crazy.

20? 95? That would be shocking 😱.


**Idea:**
1. Flip a coin 100 times. Write down the number of heads.
2. Repeat step 1 many times – say, 10,000 times. (How?)
3. Look at the range of values we saw in our simulation.

<center><img src='images/coin-flip.png' width=50%></center>

### Simulating 100 coin flips, 10000 times

In [None]:
num_heads_arr = np.array([])

for _ in np.arange(10000):
    flips = np.random.choice(['Heads', 'Tails'], size=100)
    heads = np.count_nonzero(flips == 'Heads')
    num_heads_arr = np.append(num_heads_arr, heads)

In [None]:
num_heads_arr

In [None]:
num_heads_arr.mean()

In [None]:
num_heads_arr.std()

Let's draw a picture!

In [None]:
import plotly.express as px
px.histogram(num_heads_arr)

**Observation 👀:** Usually, the number of heads in 100 flips ranged from 40 to 60. Anything outside of that range was rare. Anything lower than 35 or above 65 was extremely rare.

In [None]:
num_heads_arr.max()

In [None]:
num_heads_arr.min()

### Rare events

**Question:** Of the 10,000 repetitions of our "experiment", in how many did we see 65 or more heads?

**Answer:** Very few!

In [None]:
num_heads_arr

In [None]:
num_heads_arr >= 65

In [None]:
np.sum(num_heads_arr >= 65)

In [None]:
num_heads_arr[num_heads_arr >= 65]

### Boolean indexing

What we saw on the previous slide was an example of Boolean indexing:

In [None]:
values = np.array([3, 4, 5, 2, 9])

In [None]:
values >= 4

This allows us to answer questions like...

Which values in `values` were at least 4?

In [None]:
values[values >= 4]

**How many** values in `values` were at least 4?

In [None]:
np.sum(values >= 4)

In [None]:
np.count_nonzero(values >= 4)

**No `for`-loop necessary!**