In [2]:
import numpy as np

# Numpy Arrays  vs Python Lists

Like Python Lists, numpy Arrays are ordered sequences of data.  However, they have slightly different properties than Python lists:

| Property | Lists | Arrays |
| :--:     | :--:  | :--:   |
| Ordered  | ✔️    | ✔️ |
| Mutable | ✔️    | ✔️ |
| Can Mix Data Types | ✔️ |   |
| Append Data without Copying Whole Structure | ✔️  |  |
| Broadcastable |  | ✔️ |
| Fast Calculations |   | ✔️ |



### Exercises 
Let's explore each property and compare lists and arrays

#### Ordered Index

Index the third element of these data collections:

List:

In [None]:
x = [10, 20, 30, 40, 50]

Array:

In [3]:
x = np.array([10, 20, 30, 40, 50])

#### Ordered Slicing

Slice out the second-to-fourth element of these data collections:

List:

In [None]:
x = [10, 20, 30, 40, 50]

Array:

In [3]:
x = np.array([10, 20, 30, 40, 50])

#### "Mutate" a Value inside a Collection

Change the third element of these data collections to the value "A":
```python
data[index] = value
```

List:

In [None]:
x = ["A", "C", "G", "G", "C", "T"]

Array:

In [3]:
x = np.array(["A", "C", "G", "G", "C", "T"])

#### Mixing Data Types

Change the third element of these data collections to the value 40:
```python
data[index] = value
```

List:

In [None]:
x = ["A", "C", "G", "G", "C", "T"]

Array:

In [3]:
x = np.array(["A", "C", "G", "G", "C", "T"])

#### Append Values

Append a new value to the end of these data:

List:
```python
data.append(value)
```

In [None]:
x = ["A", "C", "G", "G", "C", "T"]

Array:
```python
np.append(data, value)
```

In [7]:
x = np.array(["A", "C", "G", "G", "C", "T"])

### Broadcasting

Run the following code, which multiplies every value in the collection by 10

List:

In [9]:
data = [1, 2, 3, 4, 5]
data

[1, 2, 3, 4, 5]

In [10]:
data10 = [x * 10 for x in data]
data10

[10, 20, 30, 40, 50]

Array:

In [11]:
data = np.array([1, 2, 3, 4, 5])
data

array([1, 2, 3, 4, 5])

In [12]:
data10 = data * 10
data10

array([10, 20, 30, 40, 50])

### Fast Calculations

Run the following code, which multiplies every value in the collection by 10.  This time, take a look at how long it takes to run.  Which is faster?

Note: the ```%%timeit``` magic command runs the cell many times and reports the average amount of time each run of the cell's code took.

List:

In [22]:
data = list(range(0, 1_000_000))
len(data)

1000000

In [23]:
%%timeit
data10 = [x * 10 for x in data]

85.3 ms ± 1.88 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


Array:

In [24]:
data = np.arange(0, 1_000_000)
data

array([     0,      1,      2, ..., 999997, 999998, 999999])

In [25]:
%%timeit
data10 = data * 10

1.56 ms ± 40.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


Array:

In [11]:
data = np.array([1, 2, 3, 4, 5])
data

array([1, 2, 3, 4, 5])

In [12]:
data10 = data * 10
data10

array([10, 20, 30, 40, 50])

### Fast Append

Run the following code, which appends a new value to a list a thousand times.  Which is faster?


List:

In [27]:
%%timeit
data = []  # an empty list
for _ in range(1000):  # repeat the same code 1000 times
    data.append("A")


72.5 µs ± 2.21 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


Array:

In [28]:
%%timeit
data = np.array([], dtype=str)  # an empty array
for _ in range(1000):  # repeat the same code 1000 times
    data = np.append(data, "A")

5.76 ms ± 143 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


### Summary
So long as your data is complete and well-organized, arrays are quite handy!  They are simple to work with and can crunch a lot of numbers in a short time!  On the other hand, if you want maximum flexibility, lists are perfect!

## Multidimensional Arrays with Numpy


Numpy arrays can be multidimensional: they can be squares, cubes, hypercubes, etc!  When choosing datastructures, Arrays are best chosen when all of the values in the structure represent the same variable.

With multidimensional arrays, everything is pretty much the same as the 1-dimensional case, with the addition of a few options for specifiying which order the dimensions should be in, and which dimension an operation should operate on.

### Creating Multidimensional Arrays

Most of the array-generation functions have a **shape** or **size** optional argument in them.  If you provide a tuple with a new shape specifying the number of elements along each dimension (e.g. (5, 3) will produce a matrix with 5 rows and 3 columns), it will give you something multidimensional!

```python
>>> data = np.random.randint(1, 10, size=(4, 5))
>>> data
array([[9, 7, 4, 2, 3],
       [3, 6, 7, 4, 8],
       [3, 6, 8, 7, 3],
       [6, 9, 4, 2, 2]])
```

For cases where there is no such option, all arrays have a **reshape()** method that lets you make it more-dimensional.  To simply **flatten** the matrix to a single dimension, you can use the **flatten()** method.

```python
>>> data.reshape(2, 10)
array([[9, 7, 4, 2, 3, 3, 6, 7, 4, 8],
       [3, 6, 8, 7, 3, 6, 9, 4, 2, 2]])

>>> data.flatten()
array([9, 7, 4, 2, 3, 3, 6, 7, 4, 8, 3, 6, 8, 7, 3, 6, 9, 4, 2, 2])
```

Numpy also has some auto-calculation features to make it a bit easier to get the shape you need:

```python
>>> data.reshape(-1, 5)  # -1 tells the reshape() method to calculate the value in that spot
array([[9, 7, 4, 2, 3],
       [3, 6, 7, 4, 8],
       [3, 6, 8, 7, 3],
       [6, 9, 4, 2, 2]])

>>> data.flatten()[np.newaxis, :]  # Makes a 1xN array
>>> data.flatten()[None, :]  # Also Makes a 1xN array
>>> data.flatten()[:, None]  # Makes an Nx1 array
```

And if an array has some extra dimensions you don't care about (like a 32x1x1 array, and you just want a 32 array), you can use the **squeeze()** method to squeeze out those extra dimensions!

Finally, you can find out the shape of a matrix by getting its **shape** attribute.  And to get the total number of elements, check its **size** attribute.

```python
>>> data.shape
(4, 5)

>>> data.size
20
```

#### Exercises

Generate a 3 x 10 array of random integers between 1 and 4.

Make an array with all the values between 0 and 11, and reshape it into a 3 x 4 matrix...

...Reshape the previous array into a 4 x 3 matrix...

...Reshape that array into a 2 x 6 matrix...

...Then flatten it.

Confirm its shape.  Is it the same as its size?

### Reordering Dimensions

There are many ways to transpose matrices:
  - array.T
  - array.transpose()
  - np.transpose(array)
  - array.swapaxes()

Try using each of them on the array **x**.

In [16]:
x = np.arange(12).reshape(3, 4)
x

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

**Discussion**: Let's try out each of them.  Why does Numpy have these options?  What's the benefit?

### Aggregating Across Axes

Almost all of the Numpy functions have an **axis** option, which lets you limit the operation to just that axis.  

For example, to get the mean of all rows:

```python
>>> array = np.arange(12).reshape(3, 4)
>>> array.mean(axis=0)
array([4., 5., 6., 7.])
```

And the mean of the columns:

```python
>>> array.mean(axis=1)
array([1.5, 5.5, 9.5])
```

Notice that the number of dimensions goes down by default whenever you aggregate across the axis.  If you'd like to keep the dimensions the same, you can also use the **keepdims=True** option:

```python
>>> array.mean(axis=1, keepdims=True)
array([[1.5],
       [5.5],
       [9.5]])
```

**Exercises**: Try it out for yourself, with the provided array "data":

In [8]:
np.random.seed(42)
data = np.random.randint(0, 10, size=(5, 3)) * [1, 10, 100]
data

array([[  6,  30, 700],
       [  4,  60, 900],
       [  2,  60, 700],
       [  4,  30, 700],
       [  7,  20, 500]])

1. What is the mean of each column?

2. What is the standard deviation of each row?

3. What is the mean of each column's median?

## Indexing Exercises

Numpy arrays work the same way as other sequences, but they can have multiple dimensions (rows, columns, etc) over which to index/slice the array.

```python
data = np.array([[0, 1, 2,  3],
                 [4, 5, 6,  7],
                 [8, 9, 10, 11]]
               )
second_row = data[1, :]
third_column = data[:, 2]
```

Using the example dataset *scores*, select only the described elements from each list:

In [49]:
scores = np.arange(1, 49).reshape(6, 8)
scores

array([[ 1,  2,  3,  4,  5,  6,  7,  8],
       [ 9, 10, 11, 12, 13, 14, 15, 16],
       [17, 18, 19, 20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29, 30, 31, 32],
       [33, 34, 35, 36, 37, 38, 39, 40],
       [41, 42, 43, 44, 45, 46, 47, 48]])

The first score in the 2nd row:

The third-through-fifth columns:

The last score:

The 2nd through 5th score, in the 6th column:

All the scores greater than 20:

The rectangle inscribed by scores 19, 22, 35, and 38:

The rectangle inscribed by scores 42, 44, 12, and 10:

##### Setting New Values

For arrays, indexing can also be used to assign a new value.  Let's try it out, using the following pattern:

```python
data[0, :] = 10  # changes all values in the first row to 10
data
```

In [11]:
scores = np.arange(1, 49).reshape(6, 8)
scores

array([[ 1,  2,  3,  4,  5,  6,  7,  8],
       [ 9, 10, 11, 12, 13, 14, 15, 16],
       [17, 18, 19, 20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29, 30, 31, 32],
       [33, 34, 35, 36, 37, 38, 39, 40],
       [41, 42, 43, 44, 45, 46, 47, 48]])

Change the 3rd column to all 10s:

Change the last score to 999:

Change the 4th row to 0:

Change the 5th column to nan  (e.g. np.nan)