## Numerical Python

In [1]:
cd ..

/home/jovyan/projects/dsi/09-unsupervised_learning-a_tutorial_on_pca


In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

This lesson will make heavy use of the numerical python library, `numpy`. Remember, its very important when working in `numpy` that you do not "drop out of `numpy`" that is change your data into regular lists.

In [3]:
type([1,2])

list

In [4]:
type(np.array([1,2]))

numpy.ndarray

The most common way to "drop out of `numpy`" is to use a list comprehension on a `numpy` array.

In [5]:
type([v for v in np.array([1,2])])

list

#### `numpy` vs `math`

Python has a `math` library in addition to `numpy`. The main difference is that `numpy` works on vectors, whereas `math` works on scalar values.

In [6]:
import math

We will need cosine and sine functions to define our true function. As we will be performing vector calculations, we will need to use the `numpy` trigonometric functions as opposed to the `math` trigonometric functions.

In [7]:
vv = np.linspace(1,1000,1000)
np.cos(vv)
try:
    math.cos(vv)
except TypeError as e:
    print(e)

only length-1 arrays can be converted to Python scalars


We could perform a list comprehension using the `math` function.

In [8]:
cos_vv = [math.cos(v) for v in vv]

The issue is time.

In [9]:
%%timeit
np.cos(vv)

29.5 µs ± 2.88 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [10]:
%%timeit
[math.cos(v) for v in vv]

229 µs ± 8.36 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


This difference only increases for larger $n$.

In [11]:
%%timeit 
np.cos(np.linspace(1,1000,10000))

425 µs ± 54.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [12]:
%%timeit 
[math.cos(v) for v in np.linspace(1,1000,10000)]

2.16 ms ± 131 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [13]:
%%timeit 
np.cos(np.linspace(1,1000,1000000))

44.4 ms ± 4.13 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [14]:
%%timeit 
[math.cos(v) for v in np.linspace(1,1000,1000000)]

264 ms ± 27 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
