# Numpy

" NumPy is the fundamental package for scientific computing with Python.  It contains among other things:

* a powerful N-dimensional array object
* sophisticated (broadcasting) functions
* useful linear algebra, Fourier transform, and random number capabilities "


-- From the [NumPy](http://www.numpy.org/) landing page.



Before learning about numpy, we introduce..

### The NXOR Function

Many of the exercises involve working with the  $\mathrm{NXOR} \colon \; [-1, 1]^2  \rightarrow \{-1, +1\}$ function defined as 

$$ (x_1, x_2) \longmapsto \mathrm{sgn}(x_1 \cdot x_2) .$$

where for $x_1 \cdot x_2 = 0$ we let $\mathrm{NXOR}(x_1, x_2) = -1$.

We can visualize this function as

![A set of points in \[-1, +1\]^2 with green and red markers denoting the value assigned to them by the NXOR function](https://github.com/tmlss2018/PracticalSessions/blob/master/assets/nxor_labels.png?raw=true)

where each point in $ [-1, 1]^2$ is marked by green (+1) or red (-1) according to the value assigned to it by the NXOR function.





Over the course of the intro lab exercises we will

1. Generate such data with numpy.
2. Create the plot above with matplotlib.
3. Train a model to learn this function.


### Setup and imports. Run the following cell.

In [2]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import numpy as np


### Random numbers in numpy

In [6]:
np.random.random((3, 2))  # Array of shape (3, 2), entries uniform in [0, 1).

array([[0.60276338, 0.54488318],
       [0.4236548 , 0.64589411],
       [0.43758721, 0.891773  ]])

Note that (as usual in computing) numpy produces pseudo-random numbers based on a seed, or more precisely a random state. In order to make random sequences and calculations based on reproducible, use

* the [`np.random.seed()`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.seed.html) function to set the default global seed, or
* the [`np.random.RandomState`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.RandomState.html) class which is a container for a pseudo-random number generator and exposes methods for generating random numbers.


In [5]:
np.random.seed(0)
print(np.random.random(2))
# Reset the global random state to the same state.
np.random.seed(0)
print(np.random.random(2))

[0.5488135  0.71518937]
[0.5488135  0.71518937]


### Numpy Array Operations 1

There are a large number of operations you can run on any numpy array. Here we showcase some common ones.

In [0]:
# Create one from hard-coded data:
ar = np.array([
    [0.0, 0.2],
    [0.9, 0.5],
    [0.3, 0.7],
], dtype=np.float64)  # float64 is the default.

print('The array:\n', ar)
print()

print('data type', ar.dtype)
print('transpose\n', ar.T)
print('shape', ar.shape)
print('reshaping an array', ar.reshape((6)))



The array:
 [[ 0.   0.2]
 [ 0.9  0.5]
 [ 0.3  0.7]]

data type float64
transpose
 [[ 0.   0.9  0.3]
 [ 0.2  0.5  0.7]]
shape (3, 2)
reshaping an array [ 0.   0.2  0.9  0.5  0.3  0.7]


Many numpy operations are available both as np module functions as well as array methods. For example, we can also reshape as

In [0]:
print('reshape v2', np.reshape(ar, (6, 1)))

reshape v2 [[ 0. ]
 [ 0.2]
 [ 0.9]
 [ 0.5]
 [ 0.3]
 [ 0.7]]


### Numpy Indexing and selectors

Here are some basic indexing examples from numpy.

In [0]:
ar

array([[ 0. ,  0.2],
       [ 0.9,  0.5],
       [ 0.3,  0.7]])

In [0]:
ar[0, 1]  # row, column

0.20000000000000001

In [0]:
ar[:, 1]  # slices: select all elements across the first (0th) axis.

array([ 0.2,  0.5,  0.7])

In [0]:
ar[1:2, 1]  # slices with syntax from:to, selecting [from, to).

array([ 0.5])

In [0]:
ar[1:, 1]  # Omit `to` to go all the way to the end

array([ 0.5,  0.7])

In [0]:
ar[:2, 1]  # Omit `from` to start from the beginning

array([ 0.2,  0.5])

In [0]:
ar[0:-1, 1]  # Use negative indexing to count elements from the back.

array([ 0.2,  0.5])

We can also pass boolean arrays as indices. These will exactly define which elements to select.

In [0]:
ar[np.array([
    [True, False],
    [False, True],
    [True, False],
])]

array([ 0. ,  0.5,  0.3])

Boolean arrays can be created with logical operations, then used as selectors. Logical operators apply elementwise.

In [0]:
ar_2 = np.array([   # Nearly the same as ar
    [0.0, 0.1],
    [0.9, 0.5],
    [0.0, 0.7],
])

# Where ar_2 is smaller than ar, let ar_2 be -inf.
ar_2[ar_2 < ar] = -np.inf
ar_2

array([[ 0. , -inf],
       [ 0.9,  0.5],
       [-inf,  0.7]])

### Numpy Operations 2

In [0]:
print('array:\n', ar)
print()

print('sum across axis 0 (rows):', ar.sum(axis=0))
print('mean', ar.mean())
print('min', ar.min())
print('row-wise min', ar.min(axis=1))


array:
 [[ 0.   0.2]
 [ 0.9  0.5]
 [ 0.3  0.7]]

sum across axis 0 (rows): [ 1.2  1.4]
mean 0.433333333333
min 0.0
row-wise min [ 0.   0.5  0.3]


We can also take element-wise minimums between two arrays.

We may want to do this when "clipping" values in a matrix, that is, setting any values larger than, say, 0.6, to 0.6. We would do this in numpy with..

### Broadcasting (and selectors)

In [0]:
np.minimum(ar, 0.6)

array([[ 0. ,  0.2],
       [ 0.6,  0.5],
       [ 0.3,  0.6]])

Numpy automatically turns the scalar 0.6 into an array the same size as `ar` in order to take element-wise minimum.



Broadcasting can save us a lot of typing, but in complicated cases it may require a good understanding of the exact rules followed.

Some references:

* [Numpy page that explains broadcasting](https://docs.scipy.org/doc/numpy-1.13.0/user/basics.broadcasting.html)
* [Similar content with some visualizations](http://scipy.github.io/old-wiki/pages/EricsBroadcastingDoc)

Here we follow with a selection of other useful broadcasting examples.


In [0]:
# Centering our array.
print('centered array:\n', ar - np.mean(ar)) 

centered array:
 [[-0.43333333 -0.23333333]
 [ 0.46666667  0.06666667]
 [-0.13333333  0.26666667]]


Note that `np.mean()` was a scalar, but it is automatically subtracted from every element.


We can write the minimum function ourselves, as well.

In [0]:
clipped_ar = ar.copy()  # So that ar is not modified.
clipped_ar[clipped_ar > 0.6] = 0.6
clipped_ar

array([[ 0. ,  0.2],
       [ 0.6,  0.5],
       [ 0.3,  0.6]])

A few things happened here:

1. 0.6 was broadcast in for the greater than (>) operation
2. The greater than operation defined a selector, selecting a subset of the elements of the array
3. 0.6 was broadcast to the right number of elements for assignment.

Vectors may also be broadcast into matrices.

In [0]:
vec = np.array([1, 2])
ar + vec

array([[ 1. ,  2.2],
       [ 1.9,  2.5],
       [ 1.3,  2.7]])

Here the shapes of the involved arrays are:
```
ar     (2d array):  2 x 2
vec    (1d array):      2
Result (2d array):  2 x 2
```

When either of the dimensions compared is one (even implicitly, like in the case of `vec`), the other is used. In other words, dimensions with size 1 are stretched or “copied” to match the other.

Here, this meant that the `[1, 2]` row was repeated to match the number of rows in `ar`, then added together.


If there is a shape mismatch, you will be informed. To try, uncomment the line below and run it.

In [0]:
#ar + np.array([[1, 2, 3]])

#### Exercise

Broadcast and add the vector `[10, 20, 30]` across the columns of `ar`. 

You should get 
```
array([[10. , 10.2],
       [20.9, 20.5],
       [30.3, 30.7]])
 ```


In [0]:
#@title Code

# Recall that you can use vec.shape to verify that your array has the
# shape you expect.

### Your code here ###

In [0]:
#@title Solution

vec = np.array([[10], [20], [30]])
ar + vec

array([[ 10. ,  10.2],
       [ 20.9,  20.5],
       [ 30.3,  30.7]])

### `np.newaxis`

We can use another numpy feature, `np.newaxis` to simply form the column vector that was required for the example above. It adds a singleton dimension to arrays at the desired location:

In [0]:
vec = np.array([1, 2])
vec.shape

(2,)

In [0]:
vec[np.newaxis, :].shape

(1, 2)

In [0]:
vec[:, np.newaxis].shape

(2, 1)

Now you know more than enough to generate some example data for our `NXOR` function.


### Exercise: Generate Data for NXOR

Write a function `get_data(num_examples)` that returns two numpy arrays

* `inputs` of shape  `num_examples x 2` with points selected uniformly from the $[-1, 1]^2$ domain.
* `labels` of shape `num_examples` with the associated output of `NXOR`.

In [0]:
#@title Code

def get_data(num_examples):
  # Replace with your code.
  return np.zeros((num_examples, 2)), np.zeros((num_examples))


In [0]:
#@title Solution

# Solution 1.
def get_data(num_examples):
  inputs = 2*np.random.random((num_examples, 2)) - 1
  labels = np.prod(inputs, axis=1)
  labels[labels <= 0] = -1 
  labels[labels > 0] = 1 
  return inputs, labels

# Solution 1.
# def get_data(num_examples):
#   inputs = 2*np.random.random((num_examples, 2)) - 1
#   labels = np.sign(np.prod(inputs, axis=1))
#   labels[labels == 0] = -1 
#   return inputs, labels


In [0]:
get_data(4)

(array([[ 0.08875228,  0.85639209],
        [ 0.52213802,  0.48106992],
        [-0.53564626, -0.99373936],
        [-0.90605549, -0.02954708]]), array([ 1.,  1.,  1.,  1.]))

## That's all, folks!

For now.