### Numpy Array Operations : Axes and Broadcasting

In [2]:
import numpy as np

### Axis Parameter

Many operations in the numpy package can take an optional axis parameter to specify which dimensions the operation is to be applied. This is extremely useful for multi-dimensional data. To illustrate the axis parameter, consider a matrix the (3, 2) array X defined as:

In [3]:
X = np.arange(6).reshape(3, 2)
print(X)

[[0 1]
 [2 3]
 [4 5]]


An operation like np.mean or np.sum takes the mean or sum of all elements in the array

In [4]:
print(np.mean(X))
print(np.sum(X))

2.5
15


To take only the sum along each column, we can use the axis parameter

In [5]:
print(np.sum(X, axis = 0))

[6 9]


Since X has shape (3, 2) the output np.sum(X, axis = 0) is of shape(2, ). Similarly, we can take the sum along each row.

In [6]:
print(np.sum(X, axis = 1))

[1 5 9]


You can apply this to higher-order arrays:

In [7]:
X = np.arange(24).reshape(2, 3, 4)
Y1 = np.sum(X, axis = 0)
Y2 = np.sum(X, axis = 1)
print('Y1 = {}'.format(Y1))
print('Y2 = {}'.format(Y2))

Y1 = [[12 14 16 18]
 [20 22 24 26]
 [28 30 32 34]]
Y2 = [[12 15 18 21]
 [48 51 54 57]]


### Broadcasting

#### Example 1 : Mean Removal

Suppose that X is a data matrix of shape (n,p). That is, there are n data points and p features per point. Often, we have to remove the mean from each feature. That is, we want to compute the mean for each feature and then remove the mean from each column. We could do this with a for-loop as:

```python
Xm = np.zeros(p)    # Mean for each feature
X1_deneab = np.zeros((n,p))   # Transformed features with the means removed
for j in range(p):
    Xm[j] = np.mean(X[:, j])
    for i in range(n):
        X_demean[i,j] = X[i,j] - Xm[j]
```

The code below does this without a for loop using the axis parameter and broadcasting

In [8]:
# Generate some random data
n = 100
p = 5
X = np.random.rand(n, p)

# Compute the mean per column using the axis command
Xm = np.mean(X, axis = 0)   # This is a p-dim matrix

# Subtract the mean
X_demean = X - Xm[None, :]

The command Xm = np.mean(X, axis = 0) computes the mean of each column which is a p dimensional array. Then, Xm[None, :] converts this to a (1, p) shape array. Using python broadcasting we can then subtract the Xm[None,:] from X.

#### Example 2 : Standardizing variables

A variant of the above example is to standardize the features, where we compute the transform variables, 
```python
Z[i,j] = (X[i,j] - Xm[j]) / Xstd[j]
```
where Xstd[j] is the standard deviation per feature. This can be done as follows:

In [9]:
Xstd = np.std(X, axis = 0)
Z = (X - Xm[None,:]) / Xstd[None,:]

#### Example 3 : Distances

Here is a more complicated example. Suppose we have a data matrix X of shape (nx, p) and a second set of points, Y of shape (ny, p). For each i and j, we want to compute the distances,
```python
d[i,j] = np.sum((X[i,:] - Y[j,:]) ** 2)
```

This represents the distances between the vectors X[i,:] and Y[j,:]. This sort of computation is used for clustering and nearest neighbors. We can do this without a for loop as follows

In [10]:
# Some random data
nx = 100
ny = 10
p = 2
X = np.random.rand(nx, p)
Y = np.random.rand(ny, p)

# Computing the distances in two lines.
DXY = X[:, None, :] - Y[None, :, :]
d = np.sum(DXY ** 2, axis = 2)

How does this work? First, we use None keyword to reshape the matrices X and Y to compatible sizes
```python
    X[:,None,:]     # Shape nx, 1, p
    Y[None,:,:]     # Shape 1, ny, p
```

The two matrices can be subtracted so that 
```python
    DXY[i,j,k] = X[i,k] - Y[j,k]
```

Then d[i,j] = sum_k (X[i,k] - Y[j,k]) ** 2, which is the norm squared of the vector differences.

#### Example 4 :  Outer Product

The outer product of vectors x and y is the matrix Z[i,j] = x[i]y[j]. This can be performed in one line as follows:

In [11]:
# Some random data
nx = 100
ny = 10
x = np.random.rand(nx)
y = np.random.rand(ny)

# Compute the outer product in one line
Z = x[:,None] * y[None,:]

Here:
```python
    x[:,None]  # Has shape (nx, 1)
    y[None,:]  # Has shape (1, ny)
```
So with python broadcasting:

```python
    Z = x[:,None] * y[None,:]  # has shape (nx, ny)
```

**Excercise 1:** Given a matrix X, compute the matrix Y, where the rows of X are normalized to one. That is:
```python
        Y[i,j] = X[i,j] / sum_j X[i,j]
```

In [14]:
X = np.random.rand(4,3)
Y = X / np.sum(X, axis = 0)