This type of machine-learning problem is called **regression**, which consists of *predicting a continuous value* instead of a discrete labels.

You’ll attempt to predict **the median price of homes** in a given Boston suburb in the mid-1970s, given data points about the suburb at the time, such as the crime rate, the local property tax rate, and so on

It has relatively few data points: **only 506, split between 404 training samples and 102 test samples**. And each feature in the input data (for example, the crime rate) has a *different scale*. For instance, some values are pro- portions, which take values between 0 and 1; others take values between 1 and 12, oth- ers between 0 and 100, and so on.


LOADING THE DATA

In [1]:
from keras.datasets import boston_housing # type: ignore
(train_data, train_labels), (test_data, test_labels) = boston_housing.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/boston_housing.npz
[1m57026/57026[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5us/step


It would be problematic to feed into a neural network values that all take wildly different ranges. The network might be able to automatically adapt to such heterogeneous data, but it would definitely make learning more difficult. So we need to make sure values from each feature(column) don't differ by large scales. For instance, some features could be represented in percent, some in thousands To do this, we use **standardization (or unit variance normalization)** which makes each feature have a **mean of 0 and standard deviation of 1**, preventing certain feature from dominating, hence reducing bias and improving convergence.


NORMALIZING THE DATA

In [None]:
mean = train_data.mean(axis=0)
train_data -= mean
std = train_data.std(axis=0)
train_data /= std
test_data -= mean
test_data /= std
