## Predicting House Prices using Keras (Regression Example)

Source:  F. Chollet, "Deep Learning with Python" https://github.com/fchollet/deep-learning-with-python-notebooks/blob/master/3.7-predicting-house-prices.ipynb

The target: MEDV, the median price of homes in a given Boston suburb in the mid-1970s.

Python notebook. Version for R-Users: https://www.kaggle.com/floser/predicting-house-prices-using-keras-with-r

In [None]:
# import libraries
import numpy as np
import pandas as pd 
from sklearn.model_selection import train_test_split
from keras import models
from keras import layers

In [None]:
# Read data set. Source: 1. https://www.kaggle.com/vipulgandhi/how-to-choose-right-metric-for-evaluating-ml-model
names = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV']
housing_data = pd.read_csv('../input/boston-house-prices/housing.csv', delim_whitespace=True, names=names)
# display first data sets
housing_data.head()
# feature matrix X and target y
X =  housing_data.drop(["MEDV"],axis = 1)
y = housing_data["MEDV"]

In [None]:
housing_data.describe()

In [None]:
#  train-test-split. Here 50% training, 50% test data
train_data, test_data, train_targets, test_targets = train_test_split(X, y, test_size=0.5, random_state=1)
train_targets.head(2)

In [None]:
# Preparing the data: feature-wise normalization
mean = train_data.mean(axis=0)
train_data -= mean
std = train_data.std(axis=0)
train_data /= std

test_data -= mean
test_data /= std

In [None]:
# "Building our network (very small, two hidden layers, each 64 units)" 
# Modification: each 16 units
def build_model():
    # Because we will need to instantiate
    # the same model multiple times,
    # we use a function to construct it.
    model = models.Sequential()
    model.add(layers.Dense(16, activation='relu',
                           input_shape=(train_data.shape[1],)))
    model.add(layers.Dense(16, activation='relu'))
    model.add(layers.Dense(1))
    model.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])
    return model

In [None]:
# Validating our approach using K-fold validation
k = 4 
num_val_samples = len(train_data) // k
num_epochs = 50 # 100
all_scores = []
for i in range(k):
    print('processing fold #', i)
    # Prepare the validation data: data from partition # k
    val_data = train_data[i * num_val_samples: (i + 1) * num_val_samples]
    val_targets = train_targets[i * num_val_samples: (i + 1) * num_val_samples]

    # Prepare the training data: data from all other partitions
    partial_train_data = np.concatenate(
        [train_data[:i * num_val_samples],
         train_data[(i + 1) * num_val_samples:]],
        axis=0)
    partial_train_targets = np.concatenate(
        [train_targets[:i * num_val_samples],
         train_targets[(i + 1) * num_val_samples:]],
        axis=0)

    # Build the Keras model (already compiled)
    model = build_model()
    # Train the model (in silent mode, verbose=0)
    model.fit(partial_train_data, partial_train_targets,
              epochs=num_epochs, batch_size=1, verbose=0)
    # Evaluate the model on the validation data
    val_mse, val_mae = model.evaluate(val_data, val_targets, verbose=0)
    all_scores.append(val_mae)
all_scores

In [None]:
np.mean(all_scores)

Since the training and test dataset are of the same size, we can change their role and see what happens. To keep it simple we just copy the code and change the samples.

Are the errors similar?

In [None]:
#change role of train and test
test_data, train_data, test_targets, train_targets = train_data, test_data, train_targets, test_targets
test_targets.head(2)

In [None]:
# Validating our approach using K-fold validation
k = 4 
num_val_samples = len(train_data) // k
num_epochs = 50 # 100
all_scores = []
for i in range(k):
    print('processing fold #', i)
    # Prepare the validation data: data from partition # k
    val_data = train_data[i * num_val_samples: (i + 1) * num_val_samples]
    val_targets = train_targets[i * num_val_samples: (i + 1) * num_val_samples]

    # Prepare the training data: data from all other partitions
    partial_train_data = np.concatenate(
        [train_data[:i * num_val_samples],
         train_data[(i + 1) * num_val_samples:]],
        axis=0)
    partial_train_targets = np.concatenate(
        [train_targets[:i * num_val_samples],
         train_targets[(i + 1) * num_val_samples:]],
        axis=0)

    # Build the Keras model (already compiled)
    model = build_model()
    # Train the model (in silent mode, verbose=0)
    model.fit(partial_train_data, partial_train_targets,
              epochs=num_epochs, batch_size=1, verbose=0)
    # Evaluate the model on the validation data
    val_mse, val_mae = model.evaluate(val_data, val_targets, verbose=0)
    all_scores.append(val_mae)
all_scores

In [None]:
np.mean(all_scores)