# Numpy
## Objectives
- Understand how numpy arrays are different from python lists.
- Create arrays in numpy of a given shape.
- Use broadcasting to combine arrays.
- Index into numpy arrays:
    - List style indexing and slicing.
    - Fancy indexing.
    - Boolean indexing.
- Use some numpy methods to process arrays:
    - Mathematical functions.
    - Aggregations.
    - Aggregation across axes.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Numpy: Efficient Array Compuation in Python

**Numpy** is the library behind almost all of the numerical and scientific computing in python.  It's quite true that if it were not for numpy, Python would not be a player in the data science world.

**Note**: It's pronounced **num-pie** as in **pie-thon**, not **num-pee**.

Numpy's major feature is its `array` data type (technically, it's called an `ndarray`, but everyone just calls them arrays).

Numpy arrays, on the face of it, look a lot like python lists:

In [None]:
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

And you can do pretty much anything to a numpy array that you could do to a list:

In [None]:
x[2]

In [None]:
x[:5]

In [None]:
x[1:8:2]

In [None]:
x[0] = 100
x

But, under the hood, very different things are going on...

  - Numpy arrays can hold one and only one type of data.
  - Numpy arrays are **super efficient** both in terms of memory footprint **and** computational efficiency.
  - Numpy arrays have a size, and the size cannot be changed.
  - Numpy arrays have a **shape**, which allows them to be multi-dimensional (examples forthcoming).

## Constraints on Arrays

One major difference between arrays and lists is that arrays **cannot be extended**.

In [None]:
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
x.append(10)

Numpy arrays have a fixed size, which cannot be changed.

In [None]:
x.size

So as a consequence, if you want to create a numpy array to hold some data, you **need to know how much array you need at the time the array is created**.

## Reshaping Arrays

Although the total size of an array **cannot** be changed, the **shape** of the array can be changed, **as long as this change of shape does not create or destroy elements** (i.e., as long as the reshaping does not change the **size** of the array).

In [None]:
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
x

In [None]:
x.shape

In [None]:
x.reshape(5, 2)

In [None]:
x.reshape(2, 5)

In [None]:
x.reshape(10, 1)

In [None]:
x.reshape(1, 10)

## Use `-1` to make the computer figure it out.

In [None]:
w = np.array([1,2,3,4,5,6,7,8,9,10,11,12])
w.reshape(-1, 2)

In [None]:
w.reshape(-1, 3)

In [None]:
w.reshape(4, -1)

In [None]:
w.reshape(-1, 2, 3)

In [None]:
w.reshape(6, -1)

## But doesn't work if not divisible:

In [None]:
w.reshape(-1, 5)

## And you can't have more than one `-1`

In [None]:
w.reshape(-1,2,-1)

Reshaping is very efficient, but it does **not** make a copy of the array!

In [None]:
y = x.reshape(2, 5)
y

In [None]:
y[1, 0] = 999
y

In [None]:
x

To make a copy of an array, use the `copy` method:

In [None]:
x = np.arange(10)
y = x.copy()
y[0] = 999

In [None]:
y

In [None]:
x

## Creating Numpy Arrays

### np.zeros

`np.zeros` creates a new array of a given size, and fills it with zeros.

In [None]:
np.zeros(10)

You can also create a **multi-dimensional** array by passing a tuple as the shape:

In [None]:
np.zeros((5, 3))

## And there's no limit on the number of dimensions:

In [None]:
np.zeros((2,3,4,5,4,3,2))

### np.ones

This is just like `np.zeros`, but it fills the array with ones.

In [None]:
x = np.ones(10)
x

In [None]:
for idx, n in enumerate(range(10)):
    # Some expensive computation...
    x[idx] = n

In [None]:
x

### np.full

If you want another constant in your array, use `np.full`.

In [None]:
np.full(shape=(3, 4), fill_value=np.pi)

In [None]:
np.full((3, 4), np.pi)

### Or multiply ones by a constant:

In [None]:
np.ones((3,4))*np.pi

### np.linspace

`np.linspace` creates an equally spaced grid of numbers between two endpoints.

(Usually works well to have an odd number of points)

In [None]:
np.linspace(0, 1, num=11)

In [None]:
np.linspace(0, 10, num=11)

In [None]:
np.linspace(0, 1, num=21)

### np.logspace
Creates a set of points (including start and end) on logarithmic space

In [None]:
np.logspace(0,7,8)
# 10**start
# 10**stop
# number of steps

### np.arange

`np.arange` is just like the built in `range`, but it makes an array.

In [None]:
np.arange(10)

In [None]:
np.arange(2, 10)

In [None]:
np.arange(0, 10, 2)

### np.random.uniform & np.random.normal

The `np.random` subpackage contains some functions for creating arrays of random numbers.  These two are the most useful, but there are more!

In [None]:
np.random.uniform(low=0.0, high=10.0, size=10).round(2)

In [None]:
unif = np.random.uniform(low=0.0, high=10.0, size=10**4) # replace 9 with 8

fig, ax = plt.subplots(figsize=(10, 4))
_ = ax.hist(unif, bins=100, color="green")

In [None]:
np.random.normal(loc=0.0, scale=1.0, size=10)#.round(2)

In [None]:
unif = np.random.normal(loc=0.0, scale=1.0, size=10**7)

fig, ax = plt.subplots(figsize=(10, 4))
_ = ax.hist(unif, bins=100, color="green")

## Broadcasting

### Boradcasting: Equal Length Arrays

One of the most useful features of arrays is called **broadcasting**.  At its most basic, boradcasting means that any arithmetic operation applied to an array is interpreted as being applied **element by element**.

In [None]:
x = np.array([1, 2, 3, 4, 5,  6])
y = np.array([2, 4, 6, 8, 10, 12])

In [None]:
x + y

In [None]:
x - y

In [None]:
x * y

In [None]:
y / x

In [None]:
y % x

### Broadcasting With a Constant

This is all pretty clear cut when the arrays have the **same shape**, but it's more powerful that that.

You can replace one of the arrays with a **constant** and that constant will be treated as if it were an entire array:

In [None]:
2 * x

In [None]:
y / 2

In [None]:
x % 2

You can also broadcast with comparison operators, this is **very useful** when combined with something we'll talk about later:

In [None]:
x <= 3

In [None]:
y / 2 == x

If you need to combine boolean arrays with logical operators, broadcasting also applys to the `&` (and) and `|` (or) operators:

In [None]:
(x <= 2) | (x >= 5)

In [None]:
(x >= 2) & (x <= 5)

### Broadcasting Multi-dimensional Arrays

Broadcasting also works for multi-dimensional arrays, but it takes some practice and getting used to.

#### Restrictions
Broadcasting can only be performed when the shape of each dimension in the arrays are equal and/or one has the dimension size of 1. 

In [None]:
x = np.ones((5, 4))
x

The constant (scalar) case is the same as it ever was:

In [None]:
2 * x

But now things get pretty intersting.  If you broadcast with a one dimensional array, cool stuff happens:

In [None]:
b = np.array([1, 2, 3, 4])
x * b

Note that `b` must have the correct shape for this to happen; refer to the restrictions

In [None]:
print("Shape of x:", x.shape)
print("Shape of b:", b.shape)

To get the same behaviour, but with the **rows** scaled, we need to do some gymnsatics.

In [None]:
b = np.array([1, 2, 3, 4, 5])
#x * b

This doesn't work because `x` is 2d and `b` is 1d.

In [None]:
print("Shape of x:", x.shape)
print("Shape of b:", b.shape)

To get this to work, we need to rehsape `b`.

In [None]:
b_reshaped = b.reshape((5, 1))

Note how the both first dimensions, **and the number of dimensions** of the arrays match now:

In [None]:
print("Shape of x         :", x.shape)
print("Shape of b_reshaped:", b_reshaped.shape)

In case the two arrays have the same number of dimensions, as long as the dimensions **either** match, or one of them is **exactly one**, the arrays will broadcast:

In [None]:
x * b_reshaped

The unit length dimensions are **stretched** to until the arrays have the same shape, then they are broadcast.

**Note:** We could also accomplish the first example with the same reshape principle.

In [None]:
b = np.array([1, 2, 3, 4])
b_reshaped = b.reshape((1, 4))

Note how all non-unit dimensions match:

In [None]:
print("Shape of x         :", x.shape)
print("Shape of b_reshaped:", b_reshaped.shape)

So we can broadcast:

In [None]:
x * b_reshaped

## Indexing Numpy Arrays

Numpy arrays support all the usual indexing shenanigans that lists do, so we won't comment on those any more.

### Indexing Multi-dimensional Arrays

The usual list style indexing extends to multi-dimensional arrays:

In [None]:
x = np.arange(20).reshape((5, 4))
x

In [None]:
# First two rows.
x[:2, :]

In [None]:
# First two columns.
x[:, :2]

In [None]:
# First two rows and first two columns
x[:2, :2]

In [None]:
# Even numbered rows and columns
x[::2, ::2]

You can combine this with assignment to surgically alter arrays:

In [None]:
x[:2, :2] = np.mean(x)
x

### Fancy Indexing

You can index an array with **another array** (or a list), and this is often referred to as **fancy indexing**.

In [None]:
x = np.arange(0, 20, 2)
x

In [None]:
x[[0, 0, 4, 4, 2, 2]]

In [None]:
colors = np.array(["red", "blue"])
idx = np.array([0, 0, 1, 1, 0, 0, 1, 1])
colors[idx]

This also works for multi-dimensional arrays, but the results can be confusing.

In [None]:
x = np.arange(20).reshape((5, 4))
x

In [None]:
x[[0, 1, 2, 3], [0, 1, 2, 3]]

Again, you can use this to do surgery:

In [None]:
x[[0, 1, 2, 3], [0, 1, 2, 3]] = 999
x

### Boolean Indexing

If you have an array of booleans (i.e.,`True`'s and `False`'s), you can use these as indexes too.  You'll get a new array containing only those elements that line up with a `True` in your index array.

In [None]:
x = np.arange(10)
x

In [None]:
x[[True, True, False, False, True, True, False, False, True, True]]

This is **very** useful when combined with broadcasting to create boolean index arrays.

In [None]:
x % 2 == 0

In [None]:
# Subset to the even entries.
x[x % 2 == 0]

Note that it's easy to do the wrong thing here!

In [None]:
# ???
x[x % 2]

Both of these are useful, depending on your intention.

You can also use this technique to select specific rows or columns in a multi-dimensional array:

In [None]:
x = np.arange(20).reshape((5, 4))
x

In [None]:
x[[True, False, True, True, False], :]

In [None]:
x[:, [False, True, False, True]]

And, of course, this allows for some surgical operations:

In [None]:
x[[True, False, True, True, False], :] = 999
x

When combined with broadcasting, this can get you to some neat places.

In [None]:
np.random.seed(123)
x = np.random.randint(0, 10, size=(6, 10))
x

Only the columns where the value in the first fow is bigger than five.

In [None]:
x[:, x[0, :] > 5]

Only the rows where the value in the first column is bigger than five.

In [None]:
x[x[:, 0] > 5, :]

## Array Methods and Axies
Array's have many, many useful methods, and some of then have an argument called `axis` that increases thier utility.
### Math Stuff.
Most of the day-to-day mathematical functions have representatives in numpy.

In [None]:
x = np.linspace(0, 1, 11)
x

In [None]:
np.exp(x)

In [None]:
np.log(x)

In [None]:
np.sin(2 * np.pi * x)

In [None]:
np.cos(2 * np.pi * x)

In [None]:
np.sqrt(x)#.round(2)

### Sums and Averages

It's easy and efficient to take the sum or average (arithmatic mean) of an array:

In [None]:
x = np.arange(10)
x

In [None]:
np.sum(x)

In [None]:
np.mean(x)

When dealing with a multi-dimensional array, the default behaviour is to consume the entire thing:

In [None]:
x = np.arange(16).reshape((4, 4))
x

In [None]:
np.sum(x)

But you can also do **row sums** and **column sums** by supplying an axis argument.

In [None]:
np.sum(x, axis=0)

In [None]:
np.sum(x, axis=1)

### Maxmiums and Minimums

Maximums and minimums work pretty much the same way as sums and averages.

In [None]:
np.random.seed(123)
x = np.random.randint(0, 100, size=10)
x

In [None]:
np.max(x)

In [None]:
np.min(x)

The behaviour for multi dimensional arrays is the same as before:

In [None]:
np.random.seed(1234)

x = np.random.randint(0, 100, size=16).reshape((4, 4))
x

In [None]:
# Column maximums
np.max(x, axis=0)

In [None]:
# Row minimums
np.max(x, axis=1)

#### Argument Maximums and Minimums

Sometimes you don't need to know the maximum (or minimum), but **where the maxmium occurs**.  This operation is called **argument maximum**.

In [None]:
x

In [None]:
np.argmax(x, axis=0) ## the row index (axis=0) of the column maximum

In [None]:
np.argmin(x, axis=1)

This can be very useful.  For example, if we want to find the maximum of a graph and then plot where it occurs:

In [None]:
f = lambda x: x**4 + 2*x**3 - 12*x**2 - 2*x + 6
x = np.linspace(-5, 4, num=250)

fig, ax = plt.subplots()
ax.plot(x, f(x))
_ = ax.set_ylim(-100, 100)

Now lets calculate the minimum and argmin and highlight it:

In [None]:
xmn, mn = x[np.argmin(f(x))], np.min(f(x))

fig, ax = plt.subplots()
ax.plot(x, f(x))
ax.scatter(xmn, mn, s=100)
_ = ax.set_ylim(-100, 100)

# Assignment
* visit course-outline to get link to repo
* fork
* clone
* complete assignment