# Keras Neural Network Regression Model

We'll use the Boston Housing Prices dataset to design a neural network regression model.

The model has 13 input features:

1. CRIM: per capita crime rate by town.
2. ZN: proportion of residential land zoned for lots over 25,000 sq.ft.
3. INDUS: proportion of non-retail business acres per town.
4. CHAS: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).
5. NOX: nitric oxides concentration (parts per 10 million).
6. RM: average number of rooms per dwelling.
7. AGE: proportion of owner-occupied units built prior to 1940.
8. DIS: weighted distances to five Boston employment centers.
9. RAD: index of accessibility to radial highways.
10. TAX: full-value property-tax rate per \$10,000.
11. PTRATIO: pupil-teacher ratio by town.
12. B: 1000(Bk − 0.63) 2 where Bk is the proportion of blacks by town.
13. LSTAT: % lower status of the population.

Our model should use those 13 features to predict the median price of the home in \$1,000 increments.

In [1]:
# Regression Example With Boston Dataset: Standardized and Larger
import numpy
from pandas import read_csv
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

Using TensorFlow backend.


## Load the dataset in the usual way

In [2]:
# load dataset
dataframe = read_csv("housing.csv", delim_whitespace=True, header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:13]
Y = dataset[:,13]

## Regression model

The model is simple. The input layer has our 13 features. There is a 6-neuron hidden layer. There is a 1 neuron output layer. We'll use ReLu as our activation function. **The output layer does *not* have an activation function!** That's the change we need to make so that it works as a regression (rather than a classification).

In [17]:
# define the model
def larger_model():
    # create model
    model = Sequential()
    model.add(Dense(13, input_dim=13, kernel_initializer='normal', activation='relu'))
    model.add(Dense(6, kernel_initializer='glorot_uniform', activation='relu'))
    model.add(Dense(1, kernel_initializer='glorot_uniform'))
    
    # Compile model
    model.compile(loss='mean_squared_error', optimizer='adam') # Use mean squared error as our loss function
    return model

In [18]:
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)

## Let's use sklearn to help us

sklearn has a nice pipeline that can allow us to cross-validate our neural network. This is always a good idea because it allows us to estimate how well our model should perorm in the real world (i.e. to data it has not seen). It is a good way to determine if a model is overfitted. sklearn has some nice cross-validation code. We can pass the Keras neural network model into sklearn just as if it were any other sklearn model.

In [19]:
# evaluate model with standardized dataset
estimators = []
estimators.append(('standardize', StandardScaler()))  # allow sklearn to normalize the input data (0 mean, 1 std)
estimators.append(('mlp', KerasRegressor(build_fn=larger_model, epochs=50, batch_size=5, verbose=2))) 
pipeline = Pipeline(estimators)

## K-fold

K-fold cross validation will automatically split our data into training and testing sets. In fact, it splits the data into K equal sets [1,2,3,4,...K]. The pipeline will be run K times. Each time it is run, it will use the K-th set as the testing set and the remaining (K-1) sets as the training set. For example, in iteration 1 it will use set 1 as the test set and sets (2-K) as the training set; for iteration 2 it will use set 2 as the test set and sets (1,3-K) as the training sets. 

So after K iterations, we'll end up with K performance values (e.g. mean squared error) for our model. If those MSE vary widely, then we probably don't have a reliable model. Or-- put another way-- if we have two models that perform about the same, but one model has less variance in a K-folds test, then it should be the preferred model.

Note that K-folds isn't used to fit the final model. It is just a way to predict how well a given model will generalize to the outside world.

In [20]:
kfold = KFold(n_splits=10, random_state=seed)  # K fold cross validation

In [21]:
results = cross_val_score(pipeline, X, Y, cv=kfold)

Epoch 1/50
0s - loss: 585.8535
Epoch 2/50
0s - loss: 508.3529
Epoch 3/50
0s - loss: 375.4395
Epoch 4/50
0s - loss: 236.0568
Epoch 5/50
0s - loss: 145.9458
Epoch 6/50
0s - loss: 99.1405
Epoch 7/50
0s - loss: 71.7455
Epoch 8/50
0s - loss: 55.2866
Epoch 9/50
0s - loss: 46.1126
Epoch 10/50
0s - loss: 41.5468
Epoch 11/50
0s - loss: 38.8007
Epoch 12/50
0s - loss: 36.9442
Epoch 13/50
0s - loss: 35.4448
Epoch 14/50
0s - loss: 34.1423
Epoch 15/50
0s - loss: 32.9254
Epoch 16/50
0s - loss: 31.8325
Epoch 17/50
0s - loss: 30.8860
Epoch 18/50
0s - loss: 30.0616
Epoch 19/50
0s - loss: 29.1671
Epoch 20/50
0s - loss: 28.4440
Epoch 21/50
0s - loss: 27.7712
Epoch 22/50
0s - loss: 27.0992
Epoch 23/50
0s - loss: 26.5281
Epoch 24/50
0s - loss: 25.9241
Epoch 25/50
0s - loss: 25.5041
Epoch 26/50
0s - loss: 24.8953
Epoch 27/50
0s - loss: 24.5035
Epoch 28/50
0s - loss: 24.0494
Epoch 29/50
0s - loss: 23.7703
Epoch 30/50
0s - loss: 23.2160
Epoch 31/50
0s - loss: 22.8385
Epoch 32/50
0s - loss: 22.4986
Epoch 33/50


0s - loss: 18.6873
Epoch 18/50
0s - loss: 18.0631
Epoch 19/50
0s - loss: 17.3336
Epoch 20/50
0s - loss: 17.0196
Epoch 21/50
0s - loss: 16.4263
Epoch 22/50
0s - loss: 15.9396
Epoch 23/50
0s - loss: 15.6485
Epoch 24/50
0s - loss: 15.2090
Epoch 25/50
0s - loss: 14.9103
Epoch 26/50
0s - loss: 14.5728
Epoch 27/50
0s - loss: 14.3682
Epoch 28/50
0s - loss: 13.9981
Epoch 29/50
0s - loss: 13.7194
Epoch 30/50
0s - loss: 13.5524
Epoch 31/50
0s - loss: 13.3365
Epoch 32/50
0s - loss: 13.1306
Epoch 33/50
0s - loss: 13.0313
Epoch 34/50
0s - loss: 12.7489
Epoch 35/50
0s - loss: 12.6740
Epoch 36/50
0s - loss: 12.5355
Epoch 37/50
0s - loss: 12.2915
Epoch 38/50
0s - loss: 12.1384
Epoch 39/50
0s - loss: 12.1287
Epoch 40/50
0s - loss: 11.9198
Epoch 41/50
0s - loss: 11.7550
Epoch 42/50
0s - loss: 11.6173
Epoch 43/50
0s - loss: 11.5976
Epoch 44/50
0s - loss: 11.5584
Epoch 45/50
0s - loss: 11.3270
Epoch 46/50
0s - loss: 11.3461
Epoch 47/50
0s - loss: 11.3732
Epoch 48/50
0s - loss: 11.2492
Epoch 49/50
0s - los

In [22]:
print("Mean standard error for our model is %.2f (+/- %.2f)" % (results.mean(), results.std()))

Mean standard error for our model is 23.84 (+/- 26.60)


## Is this good?

If your mean squared error is 23, then your root mean squared error (i.e. the square root) is less than 5. Since these are housing prices in increments of \$1,000, then this indicates that on average the model is less than \$5,000 off the true price of the house.

For decision trees, the MSE for this model is anywhere from 20-50. Basic linear regression models are something like 20-30.