# Chapter 4

Chapter 4 provides an introduction to numpy. Its not useful to save any of the small examples in source control.

A couple important Jupyter discoveries:

- I can use ESC to end command mode, at which point I can navigate with vim like bindings (i.e. j/k to navigate cells, dd to delete cells, etc...)
- I can use ENTER to end editing mode, at which point I can navigate individual cells.
- I can use SHIFT+ENTER to execute an individual cell.
- I can use m in command mode to convert a cell type to markdown.

In [None]:
import numpy as np

## 4.1 The NumPy ndarray

`ndarray` is a N-dimensional array, which is a fast, flexible container for large data sets in Python. Arrays enable you to perform math operations on whole blocks of data.

In [None]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])

In [None]:
data = np.random.randn(7, 4)

In [None]:
mask = (names == 'Bob') | (names == 'Will')  # Select when name is equal to Bob or Will

In [None]:
data[mask]

In [None]:
data[data < 0] = 0

In [None]:
data

### Transposing Arrays and Swapping Axes

Arrays have the `transpose` method and the `T` attribute.

In [None]:
arr = np.arange(15).reshape((3, 5))

In [None]:
arr

In [None]:
arr.T

In [None]:
np.dot(arr.T, arr)

## 4.2 Universal Functions

A universal func, or ufunc, is a function that performs element-wise operations on data in ndarrays. Fast, vectorized wrappers for simple functions.

In [None]:
arr = np.arange(10)

In [None]:
np.sqrt(arr)

In [None]:
np.exp(arr)

In [None]:
x, y = np.random.randn(8), np.random.randn(8)

In [None]:
np.maximum(x, y)

## 4.3 Array Oriented Programming w/ Arrays

Use NumPy to perform data processing tasks w/o explicit loops. Instead use vector expressions. Called vectorization.

In [None]:
points = np.arange(-5, 5, 0.01) # 1000 equally spaced points

In [None]:
xs, ys = np.meshgrid(points, points) # meshgrid takes two 1D arrays and produces two 2D matrices corresponding to all pairs of x, y in the two arrays

In [None]:
z = np.sqrt(xs ** 2 + ys ** 2)

In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.imshow(z, cmap=plt.cm.gray); plt.colorbar()
plt.title("Image plot of $\sqrt{x^2 + y^2}$ for a grid of values")

### Mathematical and Statistical Methods

In [None]:
arr = np.random.randn(5, 4)

In [None]:
arr.mean()

In [None]:
np.sum(arr)

## 4.5 Linear Algebra

In [None]:
x = np.array([[1., 2., 3.], [4., 5., 6.]])
x

In [None]:
y = np.array([[6., 23.], [-1, 7], [8, 9]])
y

In [None]:
x.dot(y)  # np.dot(x, y)

In [None]:
np.dot(x, np.ones(3))

In [None]:
from numpy.linalg import inv, qr

In [None]:
X = np.random.randn(5, 5)
X

In [None]:
mat = X.T.dot(X)
mat

In [None]:
inv(X)

In [None]:
np.dot(X, inv(X))

## 4.7 Example: Random Walks

### Pure Python Approach

In [None]:
import random

position = 0
walk = [position]
steps = 1000

for i in range(steps):
    step = 1 if random.randint(0, 1) else -1
    position += step
    walk.append(position)
    
plt.plot(walk[:100])

### Array Approach

In [None]:
nsteps = 1000
draws = np.random.randint(0, 2, size=nsteps)
steps = np.where(draws > 0, 1, -1)  # Generate array with 1 if condition is true and -1 if false
walk = steps.cumsum()  # Get sum up to this point in the array
plt.plot(walk[:100])

### Simulating Many Random Walks

In [None]:
nwalks = 5000
nsteps = 1000
draws = np.random.randint(0, 2, size=(nwalks, nsteps))
steps = np.where(draws > 0, 1, -1)
walks = steps.cumsum(axis=1)  # Compute across the columns
walks

In [None]:
hits30 = (np.abs(walks) >= 30).any(axis=1)  #P 