In [1]:
%matplotlib inline

In [2]:
import numpy as np

## Arrays and Vectorization

**Arrays** are sequences of same-type data points (most-often numbers).  Numpy allows us to work with the sequence without writing a for-loop, using a technique called **vectorization**.  

So, instead of writing:

```python
values = [1, 3, 6, 8, 2]
squares = [el ** 2 for value in values]
```

We can instead write:

```python
values = np.array([1, 3, 6, 8, 2])
squares = values ** 2
```

Besides an **array()** class, Numpy also includes a lot of math functions, which makes analysis much easier.  Let's try some out!

## Numpy Exercises

Turn the following list into an array:

```python
my_list = [3, 6, 2, 10]
```

In [4]:
my_list = [3, 6, 2, 10]
my_array = np.array(my_list)
my_array

array([ 3,  6,  2, 10])

Add 20 to each element of the array [-3, 5, 2].

In [8]:
xx = np.array([-3, 5, 2])
yy = xx + 20
yy

array([17, 25, 22])

Get the absolute value of all numbers from -6 to 6

In [12]:
abs(np.arange(-6, 7))
np.abs(np.arange(-6, 7))

array([6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6])

Get the first 5 values in the array.

In [15]:
xx = np.arange(10, 100, 10)
xx[:5]

array([10, 20, 30, 40, 50])

Reverse-order the array below.

In [16]:
forward = np.array([1, 10, 100, 1000])
forward

array([   1,   10,  100, 1000])

In [17]:
forward[::-1]

array([1000,  100,   10,    1])

In [24]:
np.array([a for a in reversed(forward)])

array([1000,  100,   10,    1])

In [21]:
np.flip(forward, 0)

array([1000,  100,   10,    1])

### Building Arrays

Numpy has some convenient array-building functions as well.  Some commonly-used are examples are **arange()**, **linspace()**, **zeros()**, and the random number generation functions in **random**.

Make an array containing the numbers 1 to 15.

In [25]:
np.arange(1, 16)

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

Make an array containing 20 zeros.

In [26]:
np.zeros(20)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0.])

Make an array contain 20 ones!

In [27]:
np.ones(20)

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1.])

In [28]:
np.repeat(2, 20)

array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [29]:
np.linspace(2, 2, 20)

array([2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
       2., 2., 2.])

In [30]:
np.ones(20) * 2

array([2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
       2., 2., 2.])

In [41]:
np.ones(20, dtype=np.uint8) + 400

array([401, 401, 401, 401, 401, 401, 401, 401, 401, 401, 401, 401, 401,
       401, 401, 401, 401, 401, 401, 401], dtype=uint16)

In [36]:
ss = np.ones(20) + np.ones(20)
ss.dtype

dtype('float64')

In [35]:
xx.dtype

dtype('int64')

Generate an array of 10 random numbers from Numpy's **random** submodule, using any function you want.

In [42]:
norm = np.random.randn(10)
norm

array([ 1.10068415, -0.2211021 ,  1.43710742,  0.5342659 ,  0.3977355 ,
        0.77246039, -2.41138357,  0.06108842, -0.20743841, -0.42809849])

Get the mean of these numbers.

In [43]:
np.mean(norm)

0.10353192143399643

What is the sum of these numbers?

In [44]:
np.sum(norm)

1.0353192143399643

In [46]:
norm.sum()

1.0353192143399643

The standard deviation?

In [47]:
norm.std()

1.0150670445736287

In [48]:
np.std(norm)

1.0150670445736287

Subtract the mean of the array from each element in the array (a.k.a. "mean-centering" the values)

In [51]:
np.mean(norm - np.mean(norm))

-3.3306690738754695e-17

### Slicing Arrays

Make an array of boolean values that says which elements of the array [4, 2, 10, 6, 1, 7] are even. (Hint: The result should be [True, True, True, True, False, False])

In [54]:
xx = np.array([4, 2, 10, 6, 1, 7])
xx % 2 == 0

array([ True,  True,  True,  True, False, False])

Make an array that says which elements in the array [4, 2, 10, 6, 1, 7] are equal to 10.

In [55]:
xx = np.array( [4, 2, 10, 6, 1, 7])
xx == 10

array([False, False,  True, False, False, False])

Select only the values that are above the mean.

In [57]:
xx[xx > np.mean(xx)]

array([10,  6,  7])

Multiply all the values in the array [2, 5, 1, 10, 6] by 100, but only if they are less than 6.

In [63]:
xx = np.array([2, 5, 1, 10, 6])
xx[xx < 6] *= 100
xx

array([200, 500, 100,  10,   6])

### Translating Algorithms into Code

Calculate the standard deviation of an array's values, without using the numpy.std() function.  (Formula can be found here: http://www.mathsisfun.com/data/standard-deviation-formulas.html)

1. Work out the Mean (the simple average of the numbers)
2. Then for each number: subtract the Mean and square the result
3. Then work out the mean of those squared differences.
4. Take the square root of that and we are done!


In [154]:
xx = np.random.randn(1000000)
%timeit np.sqrt(np.mean((xx - np.mean(xx)) ** 2))

5.34 ms ± 59 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [153]:
%timeit np.std(xx)

5.43 ms ± 46.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [160]:
from numba import jit

@jit
def std(x):
    return np.sqrt(np.mean((xx - np.mean(xx)) ** 2))

%timeit std(xx)

4.12 ms ± 11.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [151]:
np.array([1, 2, 3, None, 4, None], dtype=float)

array([ 1.,  2.,  3., nan,  4., nan])

In [148]:
np.nan

nan

In [158]:
np.mean

<function numpy.core.fromnumeric.mean(a, axis=None, dtype=None, out=None, keepdims=<class 'numpy._globals._NoValue'>)>

In [144]:
import random
import math

In [147]:
def myfun(x):
    print('hi, everyone')
    
myfun??

[0;31mSignature:[0m [0mmyfun[0m[0;34m([0m[0mx[0m[0;34m)[0m[0;34m[0m[0m
[0;31mDocstring:[0m <no docstring>
[0;31mSource:[0m   
[0;32mdef[0m [0mmyfun[0m[0;34m([0m[0mx[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m    [0mprint[0m[0;34m([0m[0;34m'hi, everyone'[0m[0;34m)[0m[0;34m[0m[0m
[0;31mFile:[0m      ~/Dropbox/Teaching/MannWorkshop/Day1/<ipython-input-147-544abdbadc0b>
[0;31mType:[0m      function
