The training data is normalised by shifting and rescaling to a mean of 0 and a standard deviation of 1

Test data is normalised using the mean and standard deviation of the *training* set 

The training data is normalised by shifting and rescaling to a mean of 0 and a standard deviation of 1. For consistency, test data is normalised using the mean and standard deviation of the *training* set. And the workflow should never be polluted with any property that has been derived from the test set.  

In [1]:
from tensorflow.keras.datasets import boston_housing

(train_data, train_targets), (test_data, test_targets) = boston_housing.load_data()
print(train_data.shape, test_data.shape)

(404, 13) (102, 13)


In [2]:
mean = train_data.mean(axis = 0)
train_data -= mean # shift
std = train_data.std(axis = 0)
train_data /= std # rescale

test_data -= mean
test_data /= std

Normalisation. The mean is calculated about the samples axis i.e. column by column in the sample-feature table. The mean is subtracted from each sample so that the mean of the shifted data is zero, and division by the standard deviation scales the data to a standard deviation of one.    

In [3]:
def stats(x):
    mean = sum(x) / len(x)
    devs = x - mean
    ave_sq_devs = sum(devs**2) / len(x)
    std = np.sqrt(ave_sq_devs)
    return mean, std  

Let's have a look at what all that means. This function returns the mean and standard deviation of a list of values. The mean is exactly the same as the average - the total divided by then number of items - and the standard deviation is the square root of the average square deviation from the mean. The square deviation is used because deviations can be negative as well as positive and we wish to measure spread.    

In [4]:
def normalise(x):
    mean, std = stats(x)
    return (x - mean) / std

Here is a normalisation function.

In [5]:
def print_stats(x):
    mean, std = stats(x)
    print('x =', x, '\nmean = ', mean, ', std =', std)

A simple print function to declutter code.

In [6]:
import numpy as np
x = np.array([10.0, 20.0, 30.0, 40.0, 50.0])
print_stats(x)

x = normalise(x)
print_stats(x)

x = [10. 20. 30. 40. 50.] 
mean =  30.0 , std = 14.142135623730951
x = [-1.41421356 -0.70710678  0.          0.70710678  1.41421356] 
mean =  0.0 , std = 0.9999999999999999


The mean of the data held in `x` is clearly 30 and the standard deviation is... 14. Most of the data is between 16
and 44; the data has a range of 40. The normalised data is distributed around 0 and has a range of 2.8.

In [7]:
x = np.array([5.0, 10.0, 30.0, 50.0, 55.0])
print_stats(x)

x = normalise(x)
print_stats(x)

x = [ 5. 10. 30. 50. 55.] 
mean =  30.0 , std = 20.248456731316587
x = [-1.234662  -0.9877296  0.         0.9877296  1.234662 ] 
mean =  0.0 , std = 1.0


The mean of this list is still 30 but the data is more spread out . Indeed the standard deviation is... 20. The normalised data is distributed about 0 and most values are within one 1 unit of the mean.

In [8]:
x = np.array([[1, 2], [3, 4], [5, 6]])
print(x)
print()
print(np.sum(x, axis=0))
print()
print(np.sum(x, axis=1)) 

[[1 2]
 [3 4]
 [5 6]]

[ 9 12]

[ 3  7 11]


The axis = 0 parameter tells `mean()` to calculate by summing down columns. The affect of summing along different axes is illustrated with this example. `x` is a 3 by 2 matrix. The column sums are 9 and 12 and the row sums - axis 1 - are 3, 7 and 11.