# Numpy

## Getting started with Numpy

Start by loading the `numpy` library (with alias `np`).

In [None]:
#import the library
import numpy as np


## The array object

Create out first array `a1` corresponding to the following matrix:

$a_1 = \left(\begin{array}{ccc}1&2&3&4&5\\5&4&3&2&1\end{array}\right)$

In [None]:
# create and display the array
a1 = np.array(
    [
        [1, 2, 3, 4, 5],
        [5, 4, 3, 2, 1]
    ])
a1


Using the attribute `dtype` (data type) of the array, show that Numpy inferred `a1`'s data type as `int64` (64 bit integer).

In [None]:
# display the type inferred
a1.dtype


* What is the shape of the array? 
* Can you reshape it into a 3x3 array? 
* how about a 5x2? 
* In what order does the reshape rearrange elements?

In [None]:
# your code here
print("initial shape: {}".format(a1.shape))
a2 = np.reshape(a1, (5, 2))
print(a2)
a1.reshape((5,2))  # another way to do it
# a1.reshape((3, 3))  # this causes error as expected


Numpy arrays try to infer the most general type that works with the input you've given it. What is the `dtype` of the following examples? run and check!

In [None]:
print(np.array([True, False, False, True]).dtype)
print(np.array([1, 2, 5, 1, 2]).dtype)
print(np.array([1, 2, 5, 1, 2, 3.0]).dtype)
print(np.array([1, 2, 'a', 1, 2]).dtype)

What do you think will happen for elementwise operations? 
Create a simple vector with integer values and another one with floating point values. 

* make sure the shapes match
* add them and check the dtype of the result
* multiply them and check the dtype of the results

In [None]:
# your code here
a = np.array([1, 2, 3])
b = np.array([1.0, 2.0, 3.0])

# a very basic way of checking things are as you expect is an "assert" statement
# if the condition fails, it will throw an error.
assert a.shape == b.shape

print("{} -- type: {}".format(a + b, (a + b).dtype))
print("{} -- type: {}".format(a * b, (a * b).dtype))


### Array indexing and retrieving entries

To access entries for each dimension separated by a comma you can:

* give an integer index between `0` and `d-1` and between `-d` and `-1`
* give a list or array of valid integers
* give a range
  * `i:j` from `i` to `j` not included
  * `i:` all from `i`
  * `:i` all to `i` not included
  * `:` everything in that dimension

For the array `a1`  that you had created before

1. retrieve the first entry of the second row $\rightarrow$`5`

1. retrieve the last entry of the second row $\rightarrow$ `1`

1. retrieve the first column $\rightarrow$ `[1, 5]`

1. retrieve the rightmost 2x2 submatrix $\rightarrow$ `[[4, 5], [2, 1]]`

1. retrieve every second column $\rightarrow$ `[[1, 3, 5], [5, 3, 1]]` (use `np.arange`)

Does the output correspond to what you were expecting?

**Remember**: indexing starts at 0 in Python.


In [None]:
# add your code here to display the solutions
print("1> sol: ", a1[1, 0])
print("2> sol: ", a1[1, -1])
print("3> sol: ", a1[:, 0])
print("4> sol:\n", a1[:, -2:])
print("5> sol: ", a1.shape)
# a1.shape[1] returns the number of columns
print("6> sol:\n", a1[:,np.arange(0, a1.shape[1], 2)])


### Boolean indexing and functions on arrays

Still considering the same array `a1`,

* compute the mean of all the values that are smaller than 3
  * using the `mean()` method
  * using the `sum()` method and the `len()` function
* compute the variance of all values smaller than 3
  * using the `var()` method
  * remembering that the sample variance can be computed as $\sum_i x_i^2/n - (\sum_i x_i/n)^2$

In [None]:
# add your code here to display your solutions
# use boolean indexing
als3 = a1[a1 < 3]
l3   = len(als3)

print(als3.mean())
print(als3.sum() / l3)

print(als3.var())
print((als3 ** 2).sum() / l3 - als3.mean() ** 2)


### The case of vectors, columns and rows

We mentioned that a vector is not the same as a matrix column or a matrix row.
Let's see this in practice.

* create a simple vector `[1, 2, 3]`
* create a column matrix with the same elements
* create a row matrix with the same elements

try to do a simple operation between these things, what do you see?
Fix it using `reshape` or `squeeze`

In [None]:
# declare the vector, column and row
vec = np.array([1, 2, 3])
row = np.array([[1, 2, 3]])
col = np.array([[1], [2], [3]])


In [None]:
# do a simple operation between those and show the results + its shape
quickf = lambda mess, sol: print(mess, "gives...\n{} -- shape: {}\n".format(sol, sol.shape))
quickf("vec + col", vec + col)
quickf("vec + row", vec + row)
quickf("col + row", col + row)


In [None]:
# use reshape and make all operations output the same result
print("reshaping...")
col = np.reshape(col, (3, ))
row = np.reshape(row, (3, ))
quickf("vec + col", vec + col)
quickf("vec + row", vec + row)
quickf("col + row", col + row)


### Vector, Matrix, ... Tensors!

* The shape of a vector is `(p, )`
* The shape of a matrix is `(n, p)`
* The shape of a tensor is `(n, p, q, ...)` but where is this useful?

Images usually can be thought of as matrices where every entry corresponds to a pixel value.
These pixel values have three "channels", Red-Green and Blue if the image is in colours.
Let's check. 

Use `imread` from `imageio` to load `data/panda.jpg` and check the shape. You can see a documented example [here](https://imageio.readthedocs.io/en/latest/examples.html#imageio-usage-examples).

This will come back to this when we use neural networks to handle images.

In [None]:
from imageio import imread
# add your code to load the image, use ?imread if you want to see the doc
image = imread("data/panda.jpg")
print("Shape: {}".format(image.shape))


Use matplotlib's method [plt.imshow()](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.imshow.html) to view the image you loaded.

In [None]:
import matplotlib.pyplot as plt
# use plt.imshow() to show the image
plt.imshow(image)


## Other numpy methods

We'll now explore a few more methods for creating numpy arrays and doing mathematics, such as `np.linspace`, and `np.exp`. As ever, we are expecting you to refer to the documentation; for numpy, you'll find the API reference here: https://docs.scipy.org/doc/numpy/reference/index.html (though simply googling the function will normally get you what you want)

### Plotting data
As a quick aside, above we imported `matplotlib.pyplot` as `plt`.

**Important Note**:

* In Jupyter, if you want your plots to display directly in the notebook, you may need to have one cell somewhere with the line `%matplotlib inline` - depending on your default settings
* In "plain Python", if you want your plots to show, you will need to write `plt.show()` everytime you've defined a plot (not in Jupyter)

### plt.plot(), np.linspace(), and mathmatical functions

Use [np.linspace](https://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html) to create a range of 100 points between -1 and 3. Call that `x` then compute the function $y=\sin(\exp(-x^2))$ (use [np.sin](https://docs.scipy.org/doc/numpy-1.10.4/reference/generated/numpy.sin.html) and [np.exp](https://docs.scipy.org/doc/numpy/reference/generated/numpy.exp.html)) and display it using [plt.plot](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.plot.html). 

Additionally:
* using the option `ls` of `plt.plot` specify you want a dashed line
* using the option `color` of `plt.plot` specify you want the line to be red
* the maximum is attained at $x=0$, show that point on the curve with a black circle (use `marker`)

In [None]:
# add your code here to create the x values and the corresponding y values
# create 100 values between -1 and 3
x = np.linspace(-1, 3, 100)
# compute the value of the function at those points
y = np.sin(np.exp(-x ** 2))


In [None]:
# add your code here to display the line. Make it dashed and red.
# in the same cell add your code here to display the maximum, make it a black circle
plt.plot(x, y, ls='dashed', color='red')
plt.plot(0.0, np.sin(1.0), marker='o', color='black')


### A second plot

Generate 500 random points in 2D using `np.random.randn` then display those without line (scatter plot). 

* display the points as small blue dots
* make the axis "equal" using `plt.axis`
* display the mean as a big red dot
* Add as title (`plt.title`) "Visualising random numbers", make the font size 14 using the option `fontsize`
* Call the x-axis "x-axis" and same for "y-axis" (`plt.xlabel`, `plt.ylabel`)
* Add a legend (`plt.legend`) indicating what the different elements are, put it in the bottom-left corner using the `loc` option


In [None]:
# generate the points, compute the mean
points = np.random.randn(500,2)
mean = points.mean(axis=0)


In [None]:
# display the points, make the axis equal
plt.plot(points[:, 0], points[:, 1], '.', color='blue', ls='none')
plt.axis("equal")
# display the mean as a big red dot
plt.plot(mean[0], mean[1], 'o', color='red', markersize=10)
# add the title and the x/y labels and the legend
plt.title("Visualising random numbers", fontsize=14)
plt.xlabel("x-axis", fontsize=14)
plt.ylabel("y-axis", fontsize=14)
plt.legend(["Random points", "Mean"], loc='lower right')
