# Numerical Computing with NumPy
## Why?
In Python we can use `lists` to store and manipulate sequences of objects, any objects.<br>
While that is very convenient for us it comes at a cost of time and memory.

In this example we create 1,000,000 integers

In [1]:
import random 
measurements = [random.randint(150, 200) for _ in range(1_000_000)]
measurements[:10]

[192, 187, 192, 167, 153, 196, 200, 171, 158, 179]

and compute their mean


In [2]:
list_time = %timeit -o sum(measurements) / len(measurements)

6.85 ms ± 119 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


Because python doesn't know that our list only contains integers, it has to check everytime it adds values together whether the objects actually support addition.<br>
And thats why `sum` takes "so long"

If we could tell the interpreter that we are only adding integers, we could skip all that typechecking and speed up the operation.<br> 
For this purpose, `numpy` was invented.  
<br>

To use numpy we have to import it. The import is usually aliased as `np` so we have to type less later on. Aliasing things is only recommended if it is well established in the community of the respective package.

In [3]:
import numpy as np

Numpy's standard datatype is the `ndarray` (which stands for n-dimensional array). In the simplest case, numpy array can be created from list.

In [4]:
measurements_array = np.array(measurements)
measurements_array

array([192, 187, 192, ..., 180, 165, 184])

<br>

Now we can use the `mean` function provided by Numpy

In [5]:
numpy_time = %timeit -o np.mean(measurements_array)

875 µs ± 31.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


As we can see, using Numpy significantly speeds up our computation, making it

In [6]:
print(f"{list_time.average / numpy_time.average} times faster")

7.832971216504515 times faster


# Numpy Arrays


In [7]:
import numpy as np

<br>

# How to create Arrays
<br>

## from lists
As we already saw we can create Numpy Arrays from lists

In [8]:
np.array([10,2,35])

array([10,  2, 35])

<br>

## using Numpy functions
whenever we don't want to create an array from specific values like `10, 2, 35` we can use Numpy's utility functions.<br>
These are also faster than Pythons built-in functions.

#### `np.arange`
works like `range`

In [9]:
list(range(5))

[0, 1, 2, 3, 4]

In [10]:
np.arange(start=2, stop=14, step=2)

array([ 2,  4,  6,  8, 10, 12])

but is faster

In [11]:
built_in_time = %timeit -o np.array(range(1_000_000))

80.4 ms ± 1.35 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [12]:
numpy_time = %timeit -o np.arange(1_000_000)

1.06 ms ± 13 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [13]:
# print(f"Numpy is {built_in_time.average/numpy_time.average} times faster")

<br>

#### `np.linspace`
Creating an array with a certain number of values in a certain interval.

In [14]:
np.linspace(start=-5, stop=5, num=9)

array([-5.  , -3.75, -2.5 , -1.25,  0.  ,  1.25,  2.5 ,  3.75,  5.  ])

<br>

#### `np.zeros`
Creating an array with only zeros.

In [15]:
np.zeros(3)

array([0., 0., 0.])

`np.zeros` takes a `shape` argument that lets us create multidimensional arrays.

In [16]:
np.zeros((2, 4, 3))

array([[[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]]])

<br>

#### the same goes for `np.ones`, `np.empty` and `np.full`
#### `np.ones`

In [17]:
np.ones(shape=(2, 4, 3))

array([[[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]]])

#### `np.empty`
Corresponds to whatever was left in memory. Faster than `np.ones` and `np.zeros`

In [18]:
np.empty(shape=(2, 4, 3))

array([[[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]]])

In [19]:
empty_time = %timeit -o np.empty(shape=100_000)
ones_time = %timeit -o np.ones(shape=100_000)

print(f"\nempty is {ones_time.average/empty_time.average} times faster")

784 ns ± 8.71 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
17.8 µs ± 252 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

empty is 22.735242449907876 times faster


<br>

#### `np.full`
creates an array filled with the same value

In [20]:
np.full(shape=(2, 4, 3), fill_value=42)

array([[[42, 42, 42],
        [42, 42, 42],
        [42, 42, 42],
        [42, 42, 42]],

       [[42, 42, 42],
        [42, 42, 42],
        [42, 42, 42],
        [42, 42, 42]]])

<br>

## Anatomy of arrays

#### `dtype`
Returns data type of the array. Arrays can contain bools, ints, unsigned ints, floats or complex numbers of various byte sizes.<br>
They can also store strings or Python objects, but that has very few use cases.

In [21]:
values = [0, 1, 2, 3, 4]
int_arr = np.array(values, dtype='int')
int_arr, int_arr.dtype

(array([0, 1, 2, 3, 4]), dtype('int32'))

If the dtype does not match the given values, numpy will cast everything to that data type.

In [22]:
bool_arr = np.array(values, dtype='bool')
bool_arr, bool_arr.dtype

(array([False,  True,  True,  True,  True]), dtype('bool'))

If no explicit data type is given, numpy will choose the "smallest common denominator". <br>
In the following example, everything becomes a float, as ints can be represented as floats, but not vice versa.

In [23]:
values = [0, 1, 2.5, 3, 4]
float_arr = np.array(values)
float_arr, float_arr.dtype

(array([0. , 1. , 2.5, 3. , 4. ]), dtype('float64'))

However, once the data type is set, everything will be coerced to that type.'

In [24]:
values = [0, 1, 2, 3, 4]
int_arr = np.array(values, dtype='int')

int_arr[0] = 1.5
int_arr

array([1, 1, 2, 3, 4])

Numpy's non-Python data types force us to again think about problems like overflow etc.

In [25]:
values = [0, 1, 2, 3, 4]
uint_arr = np.array(values, dtype='uint8')
uint_arr, uint_arr.dtype

(array([0, 1, 2, 3, 4], dtype=uint8), dtype('uint8'))

In [26]:
uint_arr[1] += 255
uint_arr

array([0, 0, 2, 3, 4], dtype=uint8)

...and can lead to some problems when comparing them to standard python types

In [27]:
val = 1.2 - 1.0
arr = np.array([val], dtype=np.float32)
print(f'{val} == {arr[0]} -> {val == arr[0]}')

print(type(val))
print(type(arr[0]))

0.19999999999999996 == 0.20000000298023224 -> False
<class 'float'>
<class 'numpy.float32'>


Numpy provides a function for these cases

In [28]:
np.isclose(val, arr[0])

True

[Why are floating point calculations inaccurate](http://effbot.org/pyfaq/why-are-floating-point-calculations-so-inaccurate.htm)

<br>

### `shape` and `ndim`
`.shape` is very important for keeping track of arrays with more than one dimension. It is a tuple with the number of elements in each dimension.<br>
`.ndim` is just the number of dimensions in total. 

### 1D

In [29]:
values = [1, 2, 3, 4]
one_dim_arr = np.array(values)
one_dim_arr

array([1, 2, 3, 4])

In [30]:
one_dim_arr.shape

(4,)

In [31]:
one_dim_arr.ndim

1

### 2D

In [32]:
values = [[1, 2, 3, 4, 5],
          [1, 2, 3, 4, 1],
          [1, 2, 3, 4, 2]]
two_dim_arr = np.array(values)
two_dim_arr

array([[1, 2, 3, 4, 5],
       [1, 2, 3, 4, 1],
       [1, 2, 3, 4, 2]])

In [33]:
two_dim_arr.shape

(3, 5)

In [34]:
two_dim_arr.ndim

2

In [35]:
two_dim_arr[1,1]

2

In [36]:
two_dim_arr[1,1] = 10
two_dim_arr

array([[ 1,  2,  3,  4,  5],
       [ 1, 10,  3,  4,  1],
       [ 1,  2,  3,  4,  2]])

### 3D

In [37]:
values = [[[1, 2, 3, 4]] * 3] * 6
three_dim_arr = np.array(values)
three_dim_arr

array([[[1, 2, 3, 4],
        [1, 2, 3, 4],
        [1, 2, 3, 4]],

       [[1, 2, 3, 4],
        [1, 2, 3, 4],
        [1, 2, 3, 4]],

       [[1, 2, 3, 4],
        [1, 2, 3, 4],
        [1, 2, 3, 4]],

       [[1, 2, 3, 4],
        [1, 2, 3, 4],
        [1, 2, 3, 4]],

       [[1, 2, 3, 4],
        [1, 2, 3, 4],
        [1, 2, 3, 4]],

       [[1, 2, 3, 4],
        [1, 2, 3, 4],
        [1, 2, 3, 4]]])

In [38]:
three_dim_arr.shape

(6, 3, 4)

In [39]:
three_dim_arr.ndim

3

<br>

### Other attributes

In [40]:
two_dim_arr

array([[ 1,  2,  3,  4,  5],
       [ 1, 10,  3,  4,  1],
       [ 1,  2,  3,  4,  2]])

In [41]:
two_dim_arr.T

array([[ 1,  1,  1],
       [ 2, 10,  2],
       [ 3,  3,  3],
       [ 4,  4,  4],
       [ 5,  1,  2]])

In [42]:
print(dir(two_dim_arr))

['T', '__abs__', '__add__', '__and__', '__array__', '__array_finalize__', '__array_function__', '__array_interface__', '__array_prepare__', '__array_priority__', '__array_struct__', '__array_ufunc__', '__array_wrap__', '__bool__', '__class__', '__complex__', '__contains__', '__copy__', '__deepcopy__', '__delattr__', '__delitem__', '__dir__', '__divmod__', '__doc__', '__eq__', '__float__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__iand__', '__ifloordiv__', '__ilshift__', '__imatmul__', '__imod__', '__imul__', '__index__', '__init__', '__init_subclass__', '__int__', '__invert__', '__ior__', '__ipow__', '__irshift__', '__isub__', '__iter__', '__itruediv__', '__ixor__', '__le__', '__len__', '__lshift__', '__lt__', '__matmul__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift_