## Introduction to NumPy

  1. An array object of arbitrary homogeneous items
  2. Fast mathematical operations over arrays
  3. Linear Algebra, Fourier Transforms, Random Number Generation

In [None]:
import numpy as np
np.__version__

## Where to get help

- http://docs.scipy.org
- Forums: mailing list, http://stackoverflow.com

## Where do I learn more?

- <a href="http://mentat.za.net/numpy/intro/intro.html">NumPy introductory tutorial</a>
- <a href="http://scipy-lectures.github.com">SciPy Lectures</a>

## Familiarize yourself with the notebook environment

- Tab completion
- Docstring inspection
- Magic commands: %timeit, %run
- Two modes: navigate & edit (use the keyboard!)

## NumPy vs pure Python—a speed comparison

In [None]:
x = np.random.random(1024)

%timeit [t**2 for t in x]

In [None]:
%timeit x**2

<img src="array_vs_list.png" width="60%"/>

From: Python Data Science Handbook by Jake VanderPlas (https://github.com/jakevdp/PythonDataScienceHandbook), licensed CC-BY-NC-ND

## The structure of a NumPy array

<img src="ndarray_struct.png"/>

In [None]:
x = np.array([[1, 4], [2, 8]], dtype=np.uint8)
x

In [None]:
x.shape, x.dtype, x.strides, x.size, x.ctypes.data

In [None]:
def memory_at(arr):
    import ctypes
    return list(ctypes.string_at(arr.ctypes.data, arr.size * arr.itemsize))

In [None]:
memory_at(x)

In [None]:
y = x.T
y.shape, y.dtype, y.strides, y.size, y.ctypes.data

In [None]:
memory_at(y)

## Constructing arrays

In [None]:
np.zeros((3,3))

In [None]:
np.ones((2, 2))

In [None]:
np.array([[1, 2], [-1, 5]])

In [None]:
np.zeros_like(x)

In [None]:
np.diag([1, 2, 3])

In [None]:
np.eye(3)

In [None]:
rng = np.random.RandomState(42)

rng.random_sample((3, 3))

In [None]:
x = rng.random_sample((2,2,3,2,2))

In [None]:
x.shape

## Shape

In [None]:
x = np.arange(12)
x

In [None]:
x.reshape((3, 4))

## Indexing

In [None]:
x = np.array([[1, 2, 3], [3, 2, 1]])
x

In [None]:
x[0, 1]

In [None]:
x[1]

In [None]:
x[:, 1:3]

### Fancy indexing—indexing with arrays

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

In [None]:
x = np.arange(100).reshape((10, 10))
plt.imshow(x);

In [None]:
print(y)
y < 2

In [None]:
mask = (x < 50)
mask[:5, :5]

In [None]:
mask.shape

In [None]:
x[mask]

In [None]:
x[mask] = 0
plt.imshow(x);

### Views

In [None]:
x = np.arange(10)
y = x[0:3]

print(x, y)

In [None]:
y.fill(8)

In [None]:
print(x, y)

## Data types

In [None]:
x = np.array([1,2,3])
x.dtype

In [None]:
np.iinfo(x.dtype)

In [None]:
x = np.array([1,2,3], dtype=np.uint64)
np.iinfo(x.dtype)

In [None]:
x = np.array([1.5, 2, 3])
x.dtype

In [None]:
x = np.array([1, 2, 3], dtype=float)
x.dtype

## Broadcasting


### 1D

<img src="broadcast_scalar.svg" width="50%"/>

### 2D

<img src="broadcast_2D.png"/>

### 3D (showing sum of 3 arrays)

<img src="broadcast_3D.png"/>

In [None]:
x, y = np.ogrid[:5:0.5, :5:0.5]

print(x)
print(y)
print()
print(x.shape)
print(y.shape)

In [None]:
plt.imshow(x**2 + y**2);

## Expressions and universal functions

In [None]:
x = np.linspace(0, 2 * np.pi, 1000)
y = np.sin(x) ** 3

plt.plot(x, y);

In [None]:
θ = np.deg2rad(1)

cos = np.cos
sin = np.sin

R = np.array([[cos(θ), -sin(θ)],
              [sin(θ),  cos(θ)]])

v = rng.random_sample((100, 2))

print(R.shape, v.shape)
print(R.shape, v.T.shape)

v_ = (R @ v.T).T

plt.plot(v[:, 0], v[:, 1], 'r.')
plt.plot(v_[:, 0], v_[:, 1], 'b.');

In [None]:
v = np.random.random((100, 2))
plt.plot(v[:, 0], v[:, 1], 'r.')
v_ = (R @ v.T).T

for i in range(100):
    v_ = (R @ v_.T).T
    plt.plot(v_[:, 0], v_[:, 1], 'b.', markersize=3, alpha=0.1)

## Input/output

In [None]:
!cat hand.txt

In [None]:
hand = np.loadtxt('hand.txt')
hand[:5]

In [None]:
plt.plot(hand[:, 0], hand[:, 1]);

In [None]:
# Use the NumPy binary format--do not pickle!
# np.save and np.savez

## Reductions

In [None]:
a = np.arange(12).reshape((3, 4))

In [None]:
np.mean(a)

In [None]:
a

In [None]:
np.mean(a, axis=0)

In [None]:
np.mean(a, axis=1)

In [None]:
a.sum()

In [None]:
x = np.array([1 + 1j, 2 + 2j])

In [None]:
x.real

In [None]:
y = np.array([-0.1, -0.05, 0.35, 0.5, 0.9, 1.1])

In [None]:
y.clip(0, 0.5)

## Exercises

Try the three exercises at http://www.scipy-lectures.org/intro/numpy/exercises.html#array-manipulations

## Structured arrays

In [None]:
!cat rainfall.txt

In [None]:
dt = np.dtype([('station', 'S4'), ('year', int), ('level', (float, 12))])

In [None]:
x = np.zeros((3,), dtype=dt)
x

In [None]:
r = np.loadtxt('rainfall.txt', dtype=dt)

In [None]:
r['station']

In [None]:
mask = (r['station'] == b'AAEF')
r[mask]

In [None]:
r[mask]['level']

If you're heading in this direction, you may want to involve Pandas:

In [None]:
import pandas as pd
df = pd.read_csv('rainfall.txt', header=None, sep=' ',
                 names=('station', 'year',
                        'jan', 'feb', 'mar', 'apr', 'may', 'jun',
                        'jul', 'aug', 'sep', 'oct', 'nov', 'dec'))
df

In [None]:
df['station']

In [None]:
aaef_data = df[df['station'] == 'AAEF']
aaef_data

In [None]:
aaef_data.loc[:, 'jan':'dec']

If you look at the DataFrame values, what do you see? A structured array!

In [None]:
aaef_data.values

Pandas makes some things a lot easier, but it's API and underlying model is
much more complex than NumPy's, so YMMV.