In [1]:
import numpy as np

## Arrays and Vectorization

**Arrays** are sequences of same-type data points (most-often numbers).  Numpy allows us to work with the sequence without writing a for-loop, using a technique called **vectorization**.  

Besides an **array()** class, Numpy also includes a lot of math functions, which makes analysis much easier.  Let's try some out!

## Numpy Exercises

### Building Arrays

Numpy has some convenient array-building functions as well.  Some commonly-used are examples are **arange()**, **linspace()**, **zeros()**, and the random number generation functions in **random**.

| function | Purpose |  Example |
| :-----------: | :-------------: | :-------------: |
| **np.arange()**                  | Makes an array with all the integers between two values | np.arange(2, 7) |
| **np.linspace()**               | Makes a specific-length array |  np.linspace(2, 3, 10) |
| **np.zeros()**                    | Makes an array of all zeros | np.zeros(5) |
| **np.ones()**                     | Makes an array of all ones | np.ones(3) |
| **np.random.random()** | Makes an array of random numbers | np.random.random(100) |
| **np.random.randn()**     | Makes an array of normally-distributed random numbers | np.random.randn(100) |


1. Make an array containing the numbers 1 to 15.

In [2]:
np.arange(1, 16)

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

In [5]:
np.array(list(range(1, 16)))

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

2. Make an array containing 20 zeros.

In [7]:
np.zeros(20)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0.])

3. Make an array contain 20 ones!

In [8]:
np.ones(20)

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1.])

In [9]:
np.ones(20) + 1

array([2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
       2., 2., 2.])

In [16]:
np.full(20, 2)

array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

4. Generate an array of 10 random numbers from Numpy's **random** submodule, using any function you want.

In [17]:
np.random.random(10)

array([0.71689278, 0.77565895, 0.44661904, 0.97767436, 0.00450631,
       0.90676995, 0.96539536, 0.52741919, 0.56766366, 0.23340453])

In [18]:
np.random.seed(42)
np.random.random(10)

array([0.37454012, 0.95071431, 0.73199394, 0.59865848, 0.15601864,
       0.15599452, 0.05808361, 0.86617615, 0.60111501, 0.70807258])

In [20]:
np.random.randint(1, 10, size=100)

array([5, 2, 8, 6, 2, 5, 1, 6, 9, 1, 3, 7, 4, 9, 3, 5, 3, 7, 5, 9, 7, 2,
       4, 9, 2, 9, 5, 2, 4, 7, 8, 3, 1, 4, 2, 8, 4, 2, 6, 6, 4, 6, 2, 2,
       4, 8, 7, 9, 8, 5, 2, 5, 8, 9, 9, 1, 9, 7, 9, 8, 1, 8, 8, 3, 1, 8,
       3, 3, 1, 5, 7, 9, 7, 9, 8, 2, 1, 7, 7, 8, 5, 3, 8, 6, 3, 1, 3, 5,
       3, 1, 5, 7, 7, 9, 3, 7, 1, 4, 4, 5])

In [22]:
np.random.randint(1, 10, 100)

array([7, 7, 4, 7, 3, 6, 2, 9, 5, 6, 4, 7, 9, 7, 1, 1, 9, 9, 4, 9, 3, 7,
       6, 8, 9, 5, 1, 3, 8, 6, 8, 9, 4, 1, 1, 4, 7, 2, 3, 1, 5, 1, 8, 1,
       1, 2, 2, 6, 7, 5, 1, 1, 3, 2, 5, 6, 7, 4, 7, 8, 1, 6, 8, 5, 4, 2,
       6, 6, 1, 9, 6, 3, 4, 4, 3, 3, 3, 4, 7, 4, 9, 1, 8, 7, 2, 8, 1, 9,
       9, 2, 7, 3, 7, 9, 4, 1, 2, 1, 5, 5])

In [24]:
np.arange(1, 10, step=5)

array([1, 6])

In [26]:
np.linspace(1, 10, 5)

array([ 1.  ,  3.25,  5.5 ,  7.75, 10.  ])

## Statistics Methods on Arrays

Arrays have many useful math methods.  For example, to get the mean of an array of numbers:

```python
data = np.random.random(100)
data.mean()
```

**Exercise**: Calculate the statistics on the following numbers:

In [27]:
data = np.arange(2, 7)
data

array([2, 3, 4, 5, 6])

1. Get the mean of the data.

In [28]:
np.mean(data)

4.0

In [29]:
data.mean()

4.0

2. What is the sum of the data?

In [30]:
data.sum()

20

In [31]:
np.sum(data)

20

In [32]:
sum(data)

20

3. The maximum of the data?

In [33]:
data.max()

6

In [35]:
np.max(data)

6

4. The standard deviation of the data?

In [36]:
np.std(data)

1.4142135623730951

In [37]:
data.std()

1.4142135623730951

In [39]:
from scipy import stats
stats.sem(data)

0.7071067811865476

In [45]:
np.sum(np.square(data))

90

In [None]:
data.square().sum()

## Statistics Functions on Arrays

Numpy also has many useful statistics **functions**.  These take an array as an input and can be found inside the **np** library.  Sometimes, the same functionality can be found both as a Numpy function  and an array method, giving you the choice of how you'd like to use it.  


For example, the following **method**:


```python
data = np.random.random(100)
data.mean()
```

can also be used as a **function**:


```python
data = np.random.random(100)
np.mean(data)
```

**Exercise**: Calculate the statistics on the following numbers, this time using the **function** version of the previous methods.

In [None]:
data = np.arange(2, 7)
data

array([2, 3, 4, 5, 6])

1. Get the mean of the data.

2. What is the sum of the data?

3. What is the minimum of the data?

4. The standard deviation?

## Arithmetic with Arrays

Arrays can also be added, subtracted, multiplied, and divided.  

For example, to add 10 to all values in an array:

```python
data = np.random.randn(5)
print(data)
print(data + 10)
```

Here is multiplying two arrays together: 

```python
print(data )
print(data * data)
```



**Exercises**: Modify the following arrays using the math operators  (+, -, *, /)

In [46]:
data = np.arange(-3, 5)
data

array([-3, -2, -1,  0,  1,  2,  3,  4])

1. Multiply the data by 100

In [47]:
data * 100

array([-300, -200, -100,    0,  100,  200,  300,  400])

2. Add 40 to each value in the array.

In [48]:
data + 40

array([37, 38, 39, 40, 41, 42, 43, 44])

3. Divide the numbers by 100

In [49]:
data / 100

array([-0.03, -0.02, -0.01,  0.  ,  0.01,  0.02,  0.03,  0.04])

4. Subtract the data from itself.

In [51]:
data - data

array([0, 0, 0, 0, 0, 0, 0, 0])

In [52]:
np.array([1, 2, 3]) - np.array([2, 2])

ValueError: operands could not be broadcast together with shapes (3,) (2,) 

## Extra Exercises

### Exercise: Other math functions

1. Calculate the square of all the numbers from 0 to 8.

In [53]:
data = np.arange(0, 9)
data * data

array([ 0,  1,  4,  9, 16, 25, 36, 49, 64])

In [54]:
np.square(data)

array([ 0,  1,  4,  9, 16, 25, 36, 49, 64])

In [55]:
data ** 2

array([ 0,  1,  4,  9, 16, 25, 36, 49, 64])

2. Calculate the square roots of all the numbers from 0 to 8.

In [57]:
np.sqrt(data)

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712])

3. Make an array of 20 values, all of them 2's.

In [58]:
np.linspace(2, 2, 20)

array([2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
       2., 2., 2.])

In [59]:
np.ones(20) * 2

array([2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
       2., 2., 2.])

In [60]:
np.ones(20) + 1

array([2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
       2., 2., 2.])

In [63]:
np.repeat(np.arange(6), 2, axis=0)

array([0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5])

In [67]:
np.repeat([np.arange(6)], 2, axis=0).T.flatten()

array([0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5])

4. Subtract the mean of the array from each element in the array (a.k.a. "mean-centering" the values)

In [87]:
data = np.random.randn(20)
data

array([-7.03244284e-01,  8.66577431e-01, -1.30590688e+00,  9.59273919e-01,
        1.75836071e+00,  1.45235640e+00,  1.32824234e-01, -1.03281344e+00,
        7.71233752e-04, -3.10889307e-01, -4.46691439e-01,  2.06549852e-01,
       -6.65464408e-02,  7.07843858e-01,  7.14344715e-01, -6.27985641e-02,
        4.32257285e-01,  1.32138525e+00, -5.31600554e-01,  1.58926419e-02])

In [89]:
data2 = data - np.mean(data)
data2.mean()

-6.938893903907228e-18

In [94]:
type(data2.mean())

numpy.float64

In [90]:
np.set_printoptions(precision=2, suppress=True)

In [101]:
"Results: {:.2f}".format(np.abs(data2.mean()))

'Results: 0.00'

### Translating Algorithms into Code

Calculate the standard deviation of an array's values, without using the numpy.std() function.  (Formula can be found here: http://www.mathsisfun.com/data/standard-deviation-formulas.html)

1. Work out the Mean (the simple average of the numbers)
2. Then for each number: subtract the Mean and square the result
3. Then work out the mean of those squared differences.
4. Take the square root of that and we are done!


In [128]:
data = np.random.randn(10000000)
np.sqrt(np.sum((data - data.mean()) ** 2) / len(data))

0.9998051434512345

In [130]:
np.std(data)

0.9998051434512345

In [129]:
%%timeit
np.sqrt(np.sum((data - data.mean()) ** 2) / len(data))

56.4 ms ± 884 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [131]:
%%timeit
np.std(data)

57.8 ms ± 1.31 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [132]:
def std(data):
    return np.sqrt(np.sum((data - data.mean()) ** 2) / len(data))

std(data)

0.9998051434512345