Unit 3: Array Programming

In [2]:
import numpy as np

# Linear Algebra in SciPy

### Summary:

* There is a module for linear algebra, [linalg](https://docs.scipy.org/doc/numpy/reference/routines.linalg.html)
* You can solve for a system of equations using the [solve function](https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.solve.html#numpy.linalg.solve)
    * You can create a square 2 dimensional matrix and a constant row vector and solve for each variable column
    * You can double check the answer using the inner product or [dot](https://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html#numpy.dot).
* You can use the `@` to produce the dot product of two arrays.


### `np.linalg.solve(A, B)`

In [68]:
orders = np.array([
    [2, 0, 0, 0],
    [4, 1, 2, 2],
    [0, 1, 0, 1],
    [6, 0, 1, 2]
])

totals = np.array([3, 20, 10, 15])

prices = np.linalg.solve(orders, totals)
prices

array([1.5       , 7.33333333, 0.66666667, 2.66666667])

## Dot Product

### shortcut: `@` symbol

In [58]:
# A • B
orders @ prices

array([ 3., 20., 10., 15.])

...and this equals the `totals`!

### `A.dot(B)`

In [59]:
# equivalent
orders.dot(prices)

array([ 3., 20., 10., 15.])

# Universal Functions

### Summary:
* [ufuncs](https://docs.scipy.org/doc/numpy/reference/ufuncs.html) are commonly needed vectorized functions
  * Vectorized functions allow you to operate element by element without using a loop
* The standard math and comparison operations have all been overloaded so that they can make use of vectorization
* Values can be [broadcasted](https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html), or stretched to be applied to the ufuncs.


### `np.split()`

In [66]:
a, b = np.split(np.arange(1, 11), 2)

a, b

(array([1, 2, 3, 4, 5]), array([ 6,  7,  8,  9, 10]))

In [67]:
a + b, a - b, b - a

(array([ 7,  9, 11, 13, 15]),
 array([-5, -5, -5, -5, -5]),
 array([5, 5, 5, 5, 5]))

In [70]:
# element-wise operation
a * b

array([ 6, 14, 24, 36, 50])

## Key concept: Broadcasting

In [71]:
a + 2

array([3, 4, 5, 6, 7])

In [72]:
a + np.repeat(2, 5)

array([3, 4, 5, 6, 7])

# Routines in Action; Reduction

### Summary:
* Common [mathematical](https://docs.scipy.org/doc/numpy-1.14.0/reference/routines.math.html) [routines](https://docs.scipy.org/doc/numpy-1.14.0/reference/routines.html) are exposed so the formula can be abstracted away (e.g. *mean* is a stats routine).
* Reduction functions take a dimension and collapse it into a single value.
    * These functions define an axis parameter, and you should remember that the function works across the dimension.


In [75]:
students_gpas = np.array([[4, 3.3, 3.5, 4], [3.2, 3.8, 4, 4], [3.9, 3.9, 4, 4]], dtype=float)
students_gpas

array([[4. , 3.3, 3.5, 4. ],
       [3.2, 3.8, 4. , 4. ],
       [3.9, 3.9, 4. , 4. ]])

In [76]:
students_gpas = np.array([[4, 3.3, 3.5, 4], [3.2, 3.8, 4, 4], [3.9, 3.9, 4, 4]], dtype=np.float16)
students_gpas

array([[4. , 3.3, 3.5, 4. ],
       [3.2, 3.8, 4. , 4. ],
       [3.9, 3.9, 4. , 4. ]], dtype=float16)

We do *not* want a mean across all GPAs

In [14]:
# unwanted output
round(students_gpas.mean(), 2)

3.8

## Reduction Operation

In [15]:
students_gpas.mean(axis=1)

array([3.7 , 3.75, 3.95])

In [20]:
students_gpas.mean(axis=0)

array([3.7       , 3.66666667, 3.83333333, 4.        ])

`mean()` is known as a **reduction operation**

All *ufuncs* have the built-in reduction ability.

In [42]:
study_minutes = np.zeros(100, np.uint16)
study_minutes[0] = 150
study_minutes[1] = 60
study_minutes[2:6] = [80, 60, 30, 90]
study_minutes[9::9] = 100
study_minutes.shape = 10, 10

In [51]:
np.add.reduce(study_minutes[0])

650

In [52]:
np.add.accumulate(study_minutes[0])

array([150, 210, 290, 350, 380, 470, 470, 470, 470, 650], dtype=uint64)

Reduction operations will nearly always require an axis parameter to be defined.

In [53]:
np.sum(study_minutes, axis=1)

array([650, 180, 180, 180, 180, 180, 180, 180, 180, 360], dtype=uint64)

# Plotting and Visualization

### Summary: