# Project: Regression Of Boston House Prices
Discover how to develop and evaluate neural network models using Keras for a regression problem.

The Boston house price describes properties of houses in Boston suburbs and is concerned with **modeling the price of houses** in those suburbs in thousands of dollars. As such, this is a regression predictive modeling problem. There are **13 input** variables that describe the properties of a given Boston suburb. The full list of attributes in this dataset are as follows:
- CRIM: per capita crime rate by town.
- ZN: proportion of residential land zoned for lots over 25,000 sq.ft.
- INDUS: proportion of non-retail business acres per town.
- CHAS: Charles River dummy variable ('1' if tract bounds river; '0' otherwise).
- NOX: nitric oxides concentration (parts per 10 million).
- RM: average number of rooms per dwelling
- AGE: proportion of owner-occupied units built prior to 1940.
- DIS: weighted distances to five Boston employment centers.
- RAD: index of accessibility to radial highways.
- TAX: full-value property-tax rate per USD 10,000.
- PTRATIO: pupil-teacher ratio by town. 
- B: 1000(Bk−0.63)^2 where Bk is the proportion of blacks by town. 
- LSTAT: % lower status of the population.
- MEDV: Median value of owner-occupied homes (in USD 1000s).

This is a well studied problem in machine learning. It is convenient to work with because all of the input and output attributes are numerical and there are **506 instances** to work with. Reasonable performance for models evaluated using **Mean Squared Error (MSE)** are around 20 in squared thousands of dollars (or USD 4,500 if you take the square root). This is a nice target to aim for with our neural network model.

In [1]:
import numpy as np
import pandas as pd

from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasRegressor #####

from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

np.random.seed(47)

Using TensorFlow backend.


In [2]:
df = pd.read_csv('housing.csv', delim_whitespace=True, header=None)
data = df.values
df.head(2)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242.0,17.8,396.9,9.14,21.6


In [3]:
X = data[:, 0:13]
y = data[:, -1]

### Develop a Baseline Neural Network Model
We can create Keras models and evaluate them with scikit-learn by using handy **wrapper** objects provided by the Keras library. This is desirable, because scikit-learn excels at evaluating models and will allow us to use powerful data preparation and model evaluation schemes with very few lines of code. 
- The Keras wrapper class require a function as an argument. This function that we must deﬁne is responsible for creating the neural network model to be evaluated. The Keras wrapper object for use in scikit-learn as a regression estimator is called **`KerasRegressor()`**

Below we deﬁne the function to create the baseline model to be evaluated. It is a simple model that has a single fully connected hidden layer with the same number of neurons as **input attributes(13)**. The network uses good practices such as the rectifier activation function for the hidden layer. **No activation function is used for the output layer** because it is a regression problem and we are interested in predicting numerical values directly without transform. 
- The efficient **ADAM optimization algorithm** is used. 
- **mean squared error loss function** is optimized. 
- This will be the same metric that we will use to evaluate the performance of the model. MSE is a desirable metric because by taking the square root of an error value it gives us a result that we can directly understand in the context of the problem with the units in thousands of USD.

In [4]:
def baseline_model():
    model = Sequential()
    model.add(Dense(13, input_dim=13, activation='relu', kernel_initializer='normal'))
    model.add(Dense(1, kernel_initializer='normal'))
    model.compile(loss='mean_squared_error', optimizer='adam')
    return(model)

In [5]:
estimator = KerasRegressor(build_fn=baseline_model, epochs=100, batch_size=5, verbose=0)
kfold = KFold(n_splits=10, random_state=47)

results = cross_val_score(estimator, X, y, cv=kfold)
print('Baseline: %.2f (+/- %.2f) MSE' %(results.mean(), results.std()))

Baseline: -40.91 (+/- 30.22) MSE


It gives us an estimate of the model’s performance on the problem for unseen data. The result reports the MSE including the average and standard deviation(average variance) across all 10 folds of the cross-validation evaluation.

In [6]:
results #?????????????????????????????????????????????????? 

array([ -13.41400992,  -21.91132667,   -5.73191913,  -40.80643153,
        -51.79660368,  -32.1905456 ,  -27.92281556,  -85.77527382,
       -105.15807104,  -24.37787559])

### Lift Performance By Standardizing The Dataset
An important concern with the Boston house price dataset is that the input attributes all vary in their scales because they measure different quantities. It is almost always good practice to prepare your data before modeling it using a neural network model. Continuing on from the above baseline model, we can re-evaluate the same model using a **standardized version of the input dataset**. We can use scikit-learn’s **Pipeline** framework to perform the **standardization during the model evaluation process, within each fold of the cross-validation**. This ensures that there is no data leakage from each testset cross-validation fold into the training data. 

In [7]:
# Regression Example With Boston Dataset: Standardized
# evaluate model with standardized dataset
estimators = [] 
estimators.append(('standardize', StandardScaler())) 
estimators.append(('mlp', KerasRegressor(build_fn=baseline_model, epochs=50, batch_size=5, verbose=0))) 

pipeline = Pipeline(estimators) 
kfold = KFold(n_splits=10, random_state=47) 

results = cross_val_score(pipeline, X, y, cv=kfold)
print('Standardized: %.2f (+/- %.2f) MSE' %(results.mean(), results.std()))

Standardized: -28.83 (+/- 28.04) MSE


A further extension of this section would be to similarly apply a rescaling to the output variable such as normalizing it to the range of 0 to 1 and use a Sigmoid or similar activation function on the output layer to narrow output predictions to the same range

### Tune The Topology
There are many concerns that can be optimized for a neural network model. Perhaps the point of biggest leverage is the structure of the network itself, including the number of layers and the number of neurons in each layer. In this section we will evaluate two additional network topologies in an effort to further improve the performance of the model. 
- go deeper 
  - One way to improve the performance of a neural network is to add more layers. This might allow the model to extract and recombine higher order features embedded in the data. In this section we will evaluate the effect of **adding one more hidden layer** to the model. 
- go wider
  - Another approach to increasing the representational capacity of the model is to create a wider network. In this section we evaluate the effect of keeping a shallow network architecture and nearly **doubling the number of neurons in the one hidden layer**. Here, we have increased the number of neurons in the hidden layer compared to the baseline model from 13 to 20.
  
It is hard to guess that a wider network would outperform a deeper network on this problem. The results demonstrate the importance of empirical testing when it comes to developing neural network models.

In [12]:
# go deeper (13 inputs -> 13 hidden -> 6 hidden -> 1 output)
def deeper_model():
    model = Sequential()
    model.add(Dense(13, input_dim=13, activation='relu', kernel_initializer='normal'))
    model.add(Dense(6, activation='relu', kernel_initializer='normal'))
    model.add(Dense(1, kernel_initializer='normal'))
    model.compile(loss='mean_squared_error', optimizer='adam')
    return(model)

estimators = []
estimators.append(('standardize', StandardScaler())) 
estimators.append(('mlp', KerasRegressor(build_fn=deeper_model, epochs=50, batch_size=5, verbose=0))) 
pipeline = Pipeline(estimators) 
kfold = KFold(n_splits=10, random_state=47) 
results = cross_val_score(pipeline, X, y, cv=kfold) 
print("Deeper: %.2f (+/- %.2f) MSE" % (results.mean(), results.std()))

Deeper: -22.63 (+/- 26.12) MSE


It shows a further improvement in MSE performance?

In [13]:
# go wider (13 inputs -> 20 hidden -> 1 output)
def wider_model():
    model = Sequential()
    model.add(Dense(20, input_dim=13, activation='relu', kernel_initializer='normal'))
    model.add(Dense(1, kernel_initializer='normal'))
    model.compile(loss='mean_squared_error', optimizer='adam')
    return(model)

estimators = []
estimators.append(('standardize', StandardScaler())) 
estimators.append(('mlp', KerasRegressor(build_fn=wider_model, epochs=50, batch_size=5, verbose=0))) 
pipeline = Pipeline(estimators) 
kfold = KFold(n_splits=10, random_state=47) 
results = cross_val_score(pipeline, X, y, cv=kfold) 
print("Deeper: %.2f (+/- %.2f) MSE" % (results.mean(), results.std()))

Deeper: -24.43 (+/- 22.70) MSE


Building the model does see a further drop in error to about 24 thousand squared dollars!