# Intro to NumPy

We're going to meet the NumPy library for maths, numerics, ndarrays, linear algebra, and more.

In [None]:
import numpy as np

Let's load a bit of data.

In [None]:
dt = np.load("../data/B-41_DT.npy")
rhob = np.load("../data/B-41_RHOB.npy")

In [None]:
dt[300]

In [None]:
dt[300:400]

These are NumPy arrays, which we'll meet properly in a minute. For now, just notice that they look a lot like lists... which might mean that even 'naive' functions that were written for scalar quantities (i.e. not for sequences) work on them.

In [None]:
def vp_from_dt(dt):
    return 1e6 / dt

vp_from_dt(dt/0.3048)[300:320]

Our functions do work on them!

## What is NumPy?

NumPy provides two fundametal objects: an _n_-dimensional array object (`ndarray`) and a universal function object (`ufunc`). The `ndarray` is a data structure, and the `ufunc` is a protocol for performing very fast elementwise operations on those data structures. 

As we have seen, NumPy's `ndarray` data structures are a lot like lists. As we'll see, however, they have a big advantage over lists.

We instantiate an `ndarray` with a list, or any sequence:

In [None]:
a = np.array([1, 2, 3, 4, 5])

In [None]:
a.append(6)

OK, so they're not exactly like lists. Indeed, there's one very big difference. 

Recall that trying to multiply a list doesn't do what you want it to do:

In [None]:
b = [1, 2, 3, 4, 5]
print(10 * b)

Instead, to multiply the numbers in a list by 10, we have to do something like this:

In [None]:
[10 * n for n in b]

But NumPy has a superpower: ufunc. What the heck is ufunc? It doesn't really matter, the point is what it enables: elementwise arithmetic. 

In [None]:
a

In [None]:
1000 * a

Specifically an N-dimensional array is a homogenous collection of 'items' indexed using N integers. With a 1-D array, you can index into a single element using 1 integer, with a 2-D array you need two integers, etc. 

This proves to be A Very Powerful Thing.

NumPy contains lots of other tools, including convolution, interpolation, and linear algebra operators, but most of what we do with it every day revolves around the `ndarray`, so we're going to spend a bit of time getting to know them.

## The `ndarray`

There are two essential pieces of information that define an _n_-dimensional array. The shape of the array, and the kind of item that the array is composed of:

### shape

    >>> a = np.arange(5)
    >>> a.shape
    (5,)

is an `tuple` of _n_ integers (one for each dimension) that provides information about how far the index can vary in that dimension. 

### dtype

    >>> a.dtype
    dtype('int64')

Because the `ndarray` is a homogeneous collection of exactly the same data type, NumPy code can be very fast.

## Array creation

There are plenty [array creation functions](https://docs.scipy.org/doc/numpy/reference/routines.array-creation.html) in NumPy allowing you to generate data from scratch, or from existing data. 

- `np.ones`, `np.zeros`, `np.ones_like`

- `np.arange`, `np.linspace`

- `np.random` module

Let's look at a few of them:

## Broadcasting

## Slicing

## Plotting

## Nans

## Masking with boolean arrays

In [None]:
a

## Indexing with arrays

## Vector and matrix operations

NumPy chooses the correct 'orientation' for a 1D vector when performing multiplication with a 2D array:

Matrix multiply:

No need to 'orientate' when performing the dot (inner or scalar) product between two vectors:

In [None]:
# Norm of a vector.
np.linalg.norm([3, 4])

In [None]:
# Distance between two points.
p = np.array([3, 4])
q = np.array([6, 8])
np.linalg.norm(p - q)

Solve a linear system of equations:

$$ 3x + y/2 = 9.5 $$
$$ x - 2y = -12$$

Let's solve $\mathrm{G}\mathrm{m} = \mathrm{d}$ with the closed-form solution, first via the standard equation...

And with NumPy's least squares solver:

## Random variables

<div style="background: #e0ffe0; border: solid 2px #d0f0d0; border-radius:3px; padding: 1em; color: darkgreen">
<h3>Exercise</h3>

Can you make and then plot 20 noisy pertubations of the `dt` log we loaded earlier, given a standard deviation of 5 microseconds per metre?
</div>

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

In [None]:
# Use a function from the np.random module to make an
# array with shape (20, s), where s is the size of dt.
noise = 

# Then add it to dt to make dt_noisy.
dt_noisy = 

Use this code to plot your result:

In [None]:
depth = np.load('../data/B-41_DEPTH.npy')[200:300]

plt.figure(figsize=(15, 3))
_ = plt.plot(depth, dt_noisy.T, color='k', alpha=0.1)

<div style="background: #e0ffe0; border: solid 2px #d0f0d0; border-radius:3px; padding: 1em; color: darkgreen">
<h3>Exercise</h3>

Can you use simultaneous assignment and the matrix transpose operation to get separate arrays for `x` and `y`?
</div>

<div style="background: #e0ffe0; border: solid 2px #d0f0d0; border-radius:3px; padding: 1em; color: darkgreen">
<h3>Exercise</h3>

Make a 2D velocity model of shape (120, 100) with 3 equal-thickness layers:

- 1486 m/s
- 2000 m/s
- 2400 m/s

Plot the result with `plt.imshow`.
</div>

Try giving this array to your `rc_series()` function. Plot the result with `imshow`.

## Fancy indexing

----

### INTRO STUDENTS TURN BACK NOW

## Broadcasting example

Broadcasting is a powerful idea. Here's a function to plot a waveform:

In [None]:
def wave(f):
    t = np.linspace(0, 1, 100)
    return np.sin(2 * np.pi * f * t)

We can of course pass a scalar for `f`:

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

f = 20
plt.plot(wave(f))

But with a small modification, we can allow the function to accept vectors for `f` too:

In [None]:
def wave(f):
    f = np.asanyarray(f).reshape(-1, 1)
    t = np.linspace(0, 1, 100)
    return np.squeeze(np.sin(2 * np.pi * f * t))

In [None]:
f = range(1, 31)
plt.imshow(wave(f))

Our library `bruges` implements most of its functions this way. So, for example, you can pass a range of frequencies to the Ricker-wavelet-generating function `ricker()`:

In [None]:
import bruges

plt.imshow(bruges.filters.ricker(0.2, 0.001, range(40)))

## Array manipulation

There are a number of [array manipulation](https://docs.scipy.org/doc/numpy/reference/routines.array-manipulation.html) routines in NumPy. The ones we use the most are:

#### Changing shape

- `np.reshape(a, newshape)`
- `np.flatten(a)` 

#### Transpose-like operations

- `np.transpose(a[, axes)` &mdash; same as `a.T`
- `np.moveaxis(a, source, destination)`
- `np.rollaxis(a, axis[, start])`
- `np.swapaxes(a, axis1, axis2)`
- `np.flatten(a)`

#### Change the number of dimensions

- `a[:, None]` &mdash; treats an array as if it has the new dimension already, effectively adding it
- `np.reshape(a, newshape)` &mdash; where newshape contains a new dimension of size 1
- `np.expand_dims(a, axis)` &mdash; adds dimensions
- `np.squeeze(a[, axis)` &mdash; removes dimensions
- `np.atleast_1d(*arys)`, `np.atleast_2d(*arys)`, `np.atleast_3d(*arys)` &mdash; treats arys as if they have at least the specified number of dimensions

#### Joining arrays

- `np.concatenate((a1, a2, ...)[, axis, out])` – join a sequence of arrays along an existing axis
- `np.stack(arrays[, axis, out])` – join a sequence of arrays along a new axis.
- `np.hstack(tup)` – stack arrays in sequence horizontally (column-wise).
- `np.vstack(tup)` – stack arrays in sequence vertically (row-wise).
- `np.block(arrays)` – assemble an `ndarray` from nested lists of blocks.

#### Spltting arrays

- `np.spilt(arr, indices_or_sections[, axis])`

#### Tiling arrays

- `np.tile(A, reps)` – construct an array by repeating A the number of times given by reps
- `np.repeat(a, repeat[, axis])` - repeat elements of an array

#### Rearraging elements

- `np.flip(m[, axis])` – reverse the order of elements in an array along the give axis
- `np.fliplr(m)` – flip an array in the left-right direction.
- `np.flipud(m)` – flip the array in the up-down direction
- `np.reshape(a, newshape[, order])` – gives a new shape to an array without changing it's data
- `np.roll(a, shift[, axis])` - roll array elements along a given axis

#### Adding and removing elements

- `np.trim_zeros(filt[,trim])` – Trim the leading and/or trailing zeros from a 1-D array or sequence.
- `np.unique(ar[, return_index, return_inverse, ...])` – Find the unique elements of an array

<hr />

<div>
<img src="https://avatars1.githubusercontent.com/u/1692321?s=50"><p style="text-align:center">© Agile Geoscience 2018</p>
</div>