# Regression Of Boston House Prices
In this project tutorial you will discover how to develop and evaluate neural network models
using Keras for a regression problem. After completing this step-by-step tutorial, you will know:
1. How to load a CSV dataset and make it available to Keras.
2. How to create a neural network model with Keras for a regression problem.
3. How to use scikit-learn with Keras to evaluate models using cross validation.
4. How to perform data preparation in order to improve skill with Keras models.
5. How to tune the network topology of models with Keras.

Let’s get started.

## 1.1 Boston House Price Dataset

The problem that we will look at in this tutorial is the Boston house price dataset. The dataset
describes properties of houses in Boston suburbs and is concerned with modeling the price of
houses in those suburbs in thousands of dollars. As such, this is a regression predictive modeling
problem. There are 13 input variables that describe the properties of a given Boston suburb.
The full list of attributes in this dataset are as follows:
1. CRIM: per capita crime rate by town.
2. ZN: proportion of residential land zoned for lots over 25,000 sq.ft.
3. INDUS: proportion of non-retail business acres per town.
4. CHAS: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).
5. NOX: nitric oxides concentration (parts per 10 million).
6. RM: average number of rooms per dwelling.
7. AGE: proportion of owner-occupied units built prior to 1940.
8. DIS: weighted distances to five Boston employment centers.
9. RAD: index of accessibility to radial highways.
10. TAX: full-value property-tax rate per ✩ 10,000.
11. PTRATIO: pupil-teacher ratio by town.
12. B: 1000(Bk − 0.63) 2 where Bk is the proportion of blacks by town.
13. LSTAT: % lower status of the population.
14. MEDV: Median value of owner-occupied homes in ✩ 1000s.
This is a well studied problem in machine learning. It is convenient to work with because all
of the input and output attributes are numerical and there are 506 instances to work with. A
sample of the first 5 rows of the 506 in the dataset is provided below:

![alt text](sample_data.png "Title")

Sample of the Boston House Price Dataset.
The dataset is available in the bundle of source code provided with this book. Alternatively,
you can download this dataset and save it to your current working directly with the file name
housing.csv (https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data). Reasonable performance for models evaluated using Mean Squared Error (MSE)
are around 20 in squared thousands of dollars (or $4,500 if you take the square root). This is a
nice target to aim for with our neural network model. You can learn more about the Boston
house price dataset on the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Housing) .

## 1.2 Develop a Baseline Neural Network Model
In this section we will create a baseline neural network model for the regression problem. Let’s
start off by importing all of the functions and objects we will need for this tutorial.

In [1]:
# Import Classes and Functions.
import numpy
import pandas
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.cross_validation import cross_val_score
from sklearn.cross_validation import KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

Using TensorFlow backend.


We can now load our dataset from a file in the local directory. The dataset is in fact not in
CSV format on the UCI Machine Learning Repository, the attributes are instead separated by
whitespace. We can load this easily using the Pandas library. We can then split the input (X)
and output (Y ) attributes so that they are easier to model with Keras and scikit-learn.

In [2]:
# load dataset
# Load Dataset and Separate Into Input and Output Variables.
dataframe = pandas.read_csv("housing.csv", delim_whitespace=True, header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:13]
Y = dataset[:,13]

We can create Keras models and evaluate them with scikit-learn by using handy wrapper
objects provided by the Keras library. This is desirable, because scikit-learn excels at evaluating
models and will allow us to use powerful data preparation and model evaluation schemes with
very few lines of code. The Keras wrapper class require a function as an argument. This function
that we must define is responsible for creating the neural network model to be evaluated.

Below we define the function to create the baseline model to be evaluated. It is a simple
model that has a single fully connected hidden layer with the same number of neurons as input
attributes (13). The network uses good practices such as the rectifier activation function for
the hidden layer. No activation function is used for the output layer because it is a regression
problem and we are interested in predicting numerical values directly without transform.
The efficient ADAM optimization algorithm is used and a mean squared error loss function
is optimized. This will be the same metric that we will use to evaluate the performance of the
model. It is a desirable metric because by taking the square root of an error value it gives us a
result that we can directly understand in the context of the problem with the units in thousands
of dollars.


In [4]:
# define base mode
# Define and Compile a Baseline Neural Network Model.
def baseline_model():
    # create model
    model = Sequential()
    model.add(Dense(13, input_dim=13, init= 'normal' , activation= 'relu' ))
    model.add(Dense(1, init= 'normal' ))
    # Compile model
    model.compile(loss= 'mean_squared_error' , optimizer= 'adam' )
    return model

The Keras wrapper object for use in scikit-learn as a regression estimator is called KerasRegressor.
We create an instance and pass it both the name of the function to create the neural network
model as well as some parameters to pass along to the fit() function of the model later, such
as the number of epochs and batch size. Both of these are set to sensible defaults. We also
initialize the random number generator with a constant random seed, a process we will repeat
for each model evaluated in this tutorial. This is to ensure we compare models consistently and
that the results are reproducible.


In [5]:
# fix random seed for reproducibility
# Initialize Random Number Generator and Prepare Model Wrapper for scikit-learn.
seed = 7
numpy.random.seed(seed)

# Lift Performance By Standardizing The Dataset
# evaluate model with standardized dataset
estimator = KerasRegressor(build_fn=baseline_model, nb_epoch=100, batch_size=5, verbose=0)

The final step is to evaluate this baseline model. We will use 10-fold cross validation to
evaluate the model.

In [6]:
# Evaluate Baseline Model.
kfold = KFold(n=len(X), n_folds=10, random_state=seed)
results = cross_val_score(estimator, X, Y, cv=kfold)
print("Baseline: %.2f (%.2f) MSE" % (results.mean(), results.std()))

Baseline: 30.85 (26.52) MSE


Running this code gives us an estimate of the model’s performance on the problem for unseen
data. The result reports the mean squared error including the average and standard deviation
(average variance) across all 10 folds of the cross validation evaluation.

Baseline: 38.04 (28.15) MSE

Sample Output From Evaluating the Baseline Model.

## 1.3 Lift Performance By Standardizing The Dataset
An important concern with the Boston house price dataset is that the input attributes all vary
in their scales because they measure different quantities. It is almost always good practice to
prepare your data before modeling it using a neural network model. Continuing on from the
above baseline model, we can re-evaluate the same model using a standardized version of the
input dataset.

We can use scikit-learn’s Pipeline framework (http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html) to perform the standardization during the
model evaluation process, within each fold of the cross validation. This ensures that there is
no data leakage from each testset cross validation fold into the training data. The code below
creates a scikit-learn Pipeline that first standardizes the dataset then creates and evaluates
the baseline neural network model.


In [7]:
# Regression Example With Boston Dataset: Standardized
# Update To Use a Standardized Dataset.
import numpy
import pandas
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.cross_validation import cross_val_score
from sklearn.cross_validation import KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# load dataset
dataframe = pandas.read_csv("housing.csv", delim_whitespace=True, header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:13]
Y = dataset[:,13]
# define base model
def baseline_model():
    # create model
    model = Sequential()
    model.add(Dense(13, input_dim=13, init= 'normal' , activation= 'relu' ))
    model.add(Dense(1, init= 'normal' ))
    # Compile model
    model.compile(loss= 'mean_squared_error' , optimizer= 'adam' )
    return model
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# evaluate model with standardized dataset
estimators = []
estimators.append(( 'standardize' , StandardScaler()))
estimators.append(( 'mlp' , KerasRegressor(build_fn=baseline_model, nb_epoch=50, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = KFold(n=len(X), n_folds=10, random_state=seed)
results = cross_val_score(pipeline, X, Y, cv=kfold)
print("Standardized: %.2f (%.2f) MSE" % (results.mean(), results.std()))

Standardized: 31.57 (31.57) MSE


Running the example provides an improved performance over the baseline model without
standardized data, dropping the error by 10 thousand squared dollars.

Standardized: 31.57 (31.57) MSE

Sample Output From Evaluating the Model on The Standardized Dataset.

A further extension of this section would be to similarly apply a rescaling to the output
variable such as normalizing it to the range of 0 to 1 and use a Sigmoid or similar activation
function on the output layer to narrow output predictions to the same range.

## 1.4 Tune The Neural Network Topology
There are many concerns that can be optimized for a neural network model. Perhaps the point
of biggest leverage is the structure of the network itself, including the number of layers and
the number of neurons in each layer. In this section we will evaluate two additional network
topologies in an effort to further improve the performance of the model. We will look at both a
deeper and a wider network topology.

### 1.4.1 Evaluate a Deeper Network Topology
One way to improve the performance of a neural network is to add more layers. This might
allow the model to extract and recombine higher order features embedded in the data. In this
section we will evaluate the effect of adding one more hidden layer to the model. This is as easy
as defining a new function that will create this deeper model, copied from our baseline model
above. We can then insert a new line after the first hidden layer. In this case with about half
the number of neurons. Our network topology now looks like:12.4. Tune The Neural Network Topology

``` 13 inputs -> [13 -> 6] -> 1 output ```

Summary of Deeper Network Topology.

We can evaluate this network topology in the same way as above, whilst also using the
standardization of the dataset that above was shown to improve performance.

In [8]:
# Regression Example With Boston Dataset: Standardized and Larger
# Evaluate the Larger Neural Network Model.
import numpy
import pandas
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.cross_validation import cross_val_score
from sklearn.cross_validation import KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# load dataset
dataframe = pandas.read_csv("housing.csv", delim_whitespace=True, header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:13]
Y = dataset[:,13]
# define the model
def larger_model():
    # create model
    model = Sequential()
    model.add(Dense(13, input_dim=13, init= 'normal', activation= 'relu' ))
    model.add(Dense(6, init= 'normal' , activation= 'relu' ))
    model.add(Dense(1, init= 'normal' ))
    # Compile model
    model.compile(loss= 'mean_squared_error' , optimizer= 'adam' )
    return model
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# evaluate model with standardized dataset
estimators = []
estimators.append(( 'standardize', StandardScaler()))
estimators.append(( 'mlp' , KerasRegressor(build_fn=larger_model, nb_epoch=50, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = KFold(n=len(X), n_folds=10, random_state=seed)
results = cross_val_score(pipeline, X, Y, cv=kfold)
print("Larger: %.2f (%.2f) MSE" % (results.mean(), results.std()))

Larger: 22.84 (28.18) MSE


Running this model does show a further improvement in performance from 28 down to 24 thousand squared dollars.

Larger: 22.84 (28.18) MSE

Sample Output From Evaluating the Deeper Model.

### 1.4.2 Evaluate a Wider Network Topology
Another approach to increasing the representational capacity of the model is to create a wider
network. In this section we evaluate the effect of keeping a shallow network architecture and
nearly doubling the number of neurons in the one hidden layer. Again, all we need to do is define
a new function that creates our neural network model. Here, we have increased the number of
neurons in the hidden layer compared to the baseline model from 13 to 20. The topology for
our wider network can be summarized as follows:

``` 13 inputs -> [20] -> 1 output ```

Summary of Wider Network Topology.

We can evaluate the wider network topology using the same scheme as above.

In [9]:
# Regression Example With Boston Dataset: Standardized and Wider
# Evaluate the Wider Neural Network Model.
import numpy
import pandas
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.cross_validation import cross_val_score
from sklearn.cross_validation import KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# load dataset
dataframe = pandas.read_csv("housing.csv", delim_whitespace=True, header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:13]
Y = dataset[:,13]
# define wider model
def wider_model():
    # create model
    model = Sequential()
    model.add(Dense(20, input_dim=13, init= 'normal' , activation= 'relu' ))
    model.add(Dense(1, init= 'normal' ))
    # Compile model
    model.compile(loss= 'mean_squared_error' , optimizer= 'adam' )
    return model
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# evaluate model with standardized dataset
estimators = []
estimators.append(( 'standardize' , StandardScaler()))
estimators.append(( 'mlp' , KerasRegressor(build_fn=wider_model, nb_epoch=100, batch_size=5,
verbose=0)))
pipeline = Pipeline(estimators)
kfold = KFold(n=len(X), n_folds=10, random_state=seed)
results = cross_val_score(pipeline, X, Y, cv=kfold)
print("Wider: %.2f (%.2f) MSE" % (results.mean(), results.std()))

  % delta_t_median)


Wider: 23.41 (26.52) MSE


Building the model does see a further drop in error to about 21 thousand squared dollars.
This is not a bad result for this problem.

Wider: 23.41 (26.52) MSE

Sample Output From Evaluating the Wider Model.

It would have been hard to guess that a wider network would outperform a deeper network
on this problem. The results demonstrate the importance of empirical testing when it comes to
developing neural network models.

## 1.5 Summary
In this lesson you discovered the Keras deep learning library for modeling regression problems.
Through this tutorial you learned how to develop and evaluate neural network models, including:
1. How to load data and develop a baseline model.
2. How to lift performance using data preparation techniques like standardization.
3. How to design and evaluate networks with different varying topologies on a problem.

### 1.5.1 Next
You are now equipped with the skills to develop neural network
models on standard machine learning datasets.