### Subtracting the mean

We often want to do operations like subtract the mean from the columns or rows of a 2D array. For example, here is a 4 by 3 array:

In [2]:
import numpy as np

In [3]:
arr = np.array([[3., 1, 4], [1, 5, 9], [2, 6, 5], [3, 5, 8]])
arr

array([[3., 1., 4.],
       [1., 5., 9.],
       [2., 6., 5.],
       [3., 5., 8.]])

Let's say I wanted to remove the mean across the columns (the row mean). Here is the row mean:

In [4]:
row_means = np.mean(arr, axis=1)  # mean across the second (column) axis
row_means

array([2.66666667, 5.        , 4.33333333, 5.33333333])

This is a 1D array:

In [5]:
row_means.shape

(4,)

I want do something like the following, but in a neater and faster way:

In [6]:
de_meaned = arr.copy()
for i in range(arr.shape[0]):  # iterate over rows
    de_meaned[i] = de_meaned[i] - row_means[i]
# The rows now have very near 0 mean
de_meaned.mean(axis=1)

array([1.48029737e-16, 0.00000000e+00, 2.96059473e-16, 2.96059473e-16])

One way of doing this, is expanding 1D shape (4,) mean vector out to a shape (3, 4) array, where the new columns are all the same as the (4,) mean vector.  In fact you can do this with `np.outer` and a vector of ones:

In [7]:
means_expanded = np.outer(row_means, np.ones(3))
means_expanded

array([[2.66666667, 2.66666667, 2.66666667],
       [5.        , 5.        , 5.        ],
       [4.33333333, 4.33333333, 4.33333333],
       [5.33333333, 5.33333333, 5.33333333]])

Now we can subtract this expanded array to remove the row means:

In [8]:
re_de_meaned = arr - means_expanded
# The row means are now very close to zero
re_de_meaned.mean(axis=1)

array([1.48029737e-16, 0.00000000e+00, 2.96059473e-16, 2.96059473e-16])