## Regression Housing Pricing Project 3 for Fathers who want to buy a House:  Predicting Housing Prices


In [1]:
# All necessary imports here

import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.wrappers.scikit_learn import KerasRegressor
from keras.utils import np_utils
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

Using TensorFlow backend.


### Step 1. Project Description
Now besides the previously attempted classification which included prediction for discrete labels to input data point, in regression problems we're going to predict a continuous value for the median price (in thousands of dollars) of homes in a given Boston suburb  in the mid 1970s. The dataset has  a total of 506 data points out of which 404 are training samples and 102 test samples each with 13 numerical properties/features.

In [0]:
# Load dataset
path_to_data = "https://raw.githubusercontent.com/mahrukh98/incredible-AI/master/datasets/housing.csv";
# The dataset is in fact not in CSV format, the attributes are instead separated by whitespace. 
dataframe = pd.read_csv(path_to_data, delim_whitespace=True, header=None)
dataset = dataframe.values

# Split into input (X) and output (Y) variables
X = dataset[:,0:13]
Y = dataset[:,13]

### Step 2. Develop a Baseline Neural Network Model
Creating the baseline model with single, fully connected hidden layer of 13 hidden units (same number as input features) and randomly initialized weights. The hidden layer is passed through 'relu' activation function that zeroes-out the negative values and only keeps the positive values of computation. Output layer with single unit with no activation function, as we're dealing with scalar regression problem- to predict single continuous value Finally, 'mean_squared_error' loss function and 'Adam' optimizer along with recording mean absolute accuracy (mae) metrics are reserved for compilation. 

In [0]:
# define base model
def baseline_model():
  # create model
  model = Sequential([
      Dense(13, input_shape=(13,)),
      Activation('relu'),
      Dense(1)
  ])
  
  #compile model
  model.compile(loss='mean_squared_error', optimizer='Adam')
  return model

In [0]:
# fix random seed for reproducibility
seed = 7
np.random.seed(seed)

Using the Keras wrapper object for use in scikit-learn as a regression estimator called KerasRegressor with the model creation function, number of epochs and batch_size as argument.

In [0]:
# evaluate model with standardized dataset
#estimator object instantiation of KerasRegressor wrapper class
estimator = KerasRegressor(build_fn=baseline_model, epochs=100, batch_size=5, verbose=0)

Evaluating the model using K-fold cross validation, where K = 10

In [6]:
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(estimator, X, Y, cv=kfold)
print("Results: %.2f (%.2f) MSE" % (results.mean(), results.std()))

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Use tf.cast instead.
Results: -42.49 (23.26) MSE


### Step 3: Modeling The Standardized Dataset
This dataset consists of 13 input features and all of them have widely different range of values, so data preparation is crucial here as it's always necessary before modelling any neural network.

Now, for data preparation we're using the standardization technique from StandardScaler class. We're asked to train the standardization procedure on the training data within the pass of a cross-validation run and to use the trained standardization to prepare the “unseen” test fold using pipeline wrapper!


In [7]:
# evaluate model with standardized dataset
np.random.seed(seed)
estimators = []
# scaler object instantiation of StandardScaler class
scaler = StandardScaler()
estimators.append(('standardize', scaler))
estimators.append(('mlp', KerasRegressor(build_fn=baseline_model, epochs=50, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(pipeline, X, Y, cv=kfold)
print("Standardized: %.2f (%.2f) MSE" % (results.mean(), results.std()))

Standardized: -26.94 (32.53) MSE


### Step 4. Tune The Neural Network Topology
We'll be tuning the network on the basis of its structure, that can be deeper or wider.



> ### Step 4.1. Evaluate a Deeper Network Topology


> Deeper network means more layers in the network so as to extract higher order and prominent features from the input data. Here, we're adding 1 more hidden layer with half number of hidden units as in the first layer!  





In [0]:

# define the deeper model
def larger_model():
	# create model
	model = Sequential([
      Dense(13, input_shape=(13,)),
      Activation('relu'),
      Dense(6),
      Activation('relu'),
      Dense(1)
  ])

	# Compile model
	model.compile(loss='mean_squared_error', optimizer='Adam')
	return model


In [9]:
# evaluate model with standardized dataset
np.random.seed(seed)
estimators = []
# scaler object instantiation of StandardScaler class
scaler = StandardScaler()
estimators.append(('standardize', scaler))
estimators.append(('mlp', KerasRegressor(build_fn=larger_model, epochs=50, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(pipeline, X, Y, cv=kfold)
print("Larger model: %.2f (%.2f) MSE" % (results.mean(), results.std()))

Larger model: -22.51 (28.44) MSE



> ### Step 4.2. Evaluate a Wider Network Topology


> Wider network means more hidden units which might result in increased representational capability. So, here we're increasing the hidden units to be 20 for 1 hidden layer!







In [0]:
# define wider model
def wider_model():
	# create model
	model = Sequential([
      Dense(20, input_shape=(13,)),
      Activation('relu'),
      Dense(1)
  ])
	# Compile model
	model.compile(loss='mean_squared_error', optimizer='Adam')
	return model

In [11]:
# evaluate model with standardized dataset
np.random.seed(seed)
estimators = []
# scaler object instantiation of StandardScaler class
scaler = StandardScaler()
estimators.append(('standardize', scaler))
estimators.append(('mlp', KerasRegressor(build_fn=wider_model, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(pipeline, X, Y, cv=kfold)
print("Wider model: %.2f (%.2f) MSE" % (results.mean(), results.std()))

Wider model: -21.26 (21.83) MSE


This looks pretty good as according to the manual: ***Reasonable performance for models evaluated using Mean Squared Error (MSE) are around 20 in squared thousands of dollars (or $4,500 if you take the square root). ***

### Step 5. Really Scaling up: developing a model that overfits
Now, to figure out the strength of our model that it lies exactly right at the border between the underfitting and overfitting, we'll have to cross that border i.e. overfit the model. This can be achieved by:


1.   Adding layers ----- 3 layers + 1 output layer
2.   Increasing hidden units --- 20----->13------>6------>1
3.   Training for more epochs ---100








In [0]:
# define overfitting model
def overfitted_model():
	# create model
	model = Sequential([
      Dense(20, input_shape=(13,)),
      Activation('relu'),
      Dense(13),
      Activation('relu'),
      Dense(6),
      Activation('relu'),
      Dense(1)
  ])
	# Compile model
	model.compile(loss='mean_squared_error', optimizer='Adam')
	return model

In [13]:
# evaluate model with standardized dataset
np.random.seed(seed)
estimators = []
# scaler object instantiation of StandardScaler class
scaler = StandardScaler()
estimators.append(('standardize', scaler))
estimators.append(('mlp', KerasRegressor(build_fn=overfitted_model, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(pipeline, X, Y, cv=kfold)
print("Overfitted model: %.2f (%.2f) MSE" % (results.mean(), results.std()))

Overfitted model: -22.34 (24.57) MSE


### Step 6. Tuning the Model
Here. we're decreasing the number of epochs to be 75. Let's see, whether we we would require any further improvements/tuning after this or not.

In [0]:
# define improved model
def improved_model():
	# create model
	model = Sequential([
      Dense(20, input_shape=(13,)),
      Activation('relu'),
      Dense(13),
      Activation('relu'),
      Dense(6),
      Activation('relu'),
      Dense(1)
  ])
	# Compile model
	model.compile(loss='mean_squared_error', optimizer='Adam')
	return model

In [16]:
# evaluate model with standardized dataset
np.random.seed(seed)
estimators = []
# scaler object instantiation of StandardScaler class
scaler = StandardScaler()
estimators.append(('standardize', scaler))
estimators.append(('mlp', KerasRegressor(build_fn=improved_model, epochs=75, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(pipeline, X, Y, cv=kfold)
print("Improved model: %.2f (%.2f) MSE" % (results.mean(), results.std()))

Improved model: -21.14 (21.92) MSE


### Step 7. Rewriting the code using the Keras Functional API




In [0]:
import keras
from keras import layers

In [18]:
# creating functional API for tuned/improved model 
def create_improved_fn():
  inputs = keras.Input(shape = (13,))
  x = layers.Dense(20, activation='relu')(inputs)
  x = layers.Dense(13, activation='relu')(x)
  x = layers.Dense(6, activation='relu')(x)
  output = layers.Dense(1)(x)

  model = keras.Model(inputs, output)
  
  model.compile(loss='mean_squared_error',
              optimizer='Adam')
  
  return model

# evaluate model with standardized dataset
np.random.seed(seed)
estimators = []
# scaler object instantiation of StandardScaler class
scaler = StandardScaler()
estimators.append(('standardize', scaler))
estimators.append(('mlp', KerasRegressor(build_fn=create_improved_fn, epochs=75, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(pipeline, X, Y, cv=kfold)
print("Improved model: %.2f (%.2f) MSE" % (results.mean(), results.std()))

Improved model: -21.14 (21.92) MSE


### Step 8. Rewriting the code by doing Model Subclassing

In [0]:
import keras
from keras import layers

In [20]:
# Creating subclass for improved/tuned model
class Improved(keras.Model):
  def __init__(self):
    super(Improved, self).__init__()
    self.dense1 = layers.Dense(20, activation='relu')
    self.dense2 = layers.Dense(13, activation='relu')
    self.dense3 = layers.Dense(6, activation='relu')
    self.dense4 = layers.Dense(1)
    
  def call(self, inputs):
    x = self.dense1(inputs)
    x = self.dense2(x)
    x = self.dense3(x)
    return self.dense4(x)
  
# DISCLAIMER!!!
# This part is inspired from the functional API style :D 
# As, build_fn needs callable function or class instance, we're generating another method which will also accompany the input shape not specified in the class and 
# compilation step

def create_Improved_subclass():
  inputs = keras.Input(shape = (13,))
  model = Improved()
  output = model.call(inputs)
  
  model = keras.Model(inputs, output)
  model.compile(loss='mean_squared_error',
              optimizer='Adam')
  return model

# evaluate model with standardized dataset
np.random.seed(seed)
estimators = []
# scaler object instantiation of StandardScaler class
scaler = StandardScaler()
estimators.append(('standardize', scaler))
estimators.append(('mlp', KerasRegressor(build_fn=create_Improved_subclass, epochs=75, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(pipeline, X, Y, cv=kfold)
print("Improved model: %.2f (%.2f) MSE" % (results.mean(), results.std()))


Improved model: -21.14 (21.92) MSE


### Step 9. Rewriting the code without using scikit-learn

In [0]:
# Creating model
def create_final_model():
  final_model = Sequential([
    Dense(20, input_shape=(13,)),
    Activation('relu'),
    Dense(13),
    Activation('relu'),
    Dense(6),
    Activation('relu'),
    Dense(1)
   ])
  
	# Compiling model
  final_model.compile(loss='mean_squared_error',
              optimizer='Adam',metrics=['mae'])
  return final_model

In [45]:
np.random.seed(seed)
k = 10
num_val_samples = len(dataset) // k
np.random.shuffle(dataset)
all_scores = []
# all_trial_s = []
num_epochs = 100

for i in range(k):
  print('processing fold #', i)

  # Preparing the validation data and properly partitioning training data
  val_X = X[num_val_samples * i:num_val_samples * (i + 1)]
  train_X = np.append(X[:num_val_samples * i], X[num_val_samples * (i + 1):], axis=0) 
    
  val_Y = Y[num_val_samples * i:num_val_samples * (i + 1)]
  train_Y = np.append(Y[:num_val_samples * i] , Y[num_val_samples * (i + 1):], axis=0)
  # Building the Keras model (already compiled)
  model = create_final_model() 
  model.fit(train_X,train_Y,epochs=num_epochs,batch_size=5,verbose=0,validation_data = (val_X,val_Y))
  # Evaluate the model on the validation data
  val_mse, val_mae = model.evaluate(val_X, val_Y, verbose=0)
  all_scores.append(val_mse)
  # all_trial_s.append(val_mae)

processing fold # 0
processing fold # 1
processing fold # 2
processing fold # 3
processing fold # 4
processing fold # 5
processing fold # 6
processing fold # 7
processing fold # 8
processing fold # 9


In [46]:
all_scores

[11.177670936584473,
 41.12146514892578,
 45.45825225830078,
 12.997956085205079,
 37.85966926574707,
 247.81135620117186,
 17.173582458496092,
 14.066553840637207,
 27.465848541259767,
 30.840325775146486]

In [48]:
print("Final improved model: %.2f (%.2f) MSE" % (np.mean(all_scores), np.std(all_scores)))

Final improved model: 48.60 (67.44) MSE


In [50]:
np.random.seed(seed)
k = 10
num_val_samples = len(dataset) // k
np.random.shuffle(dataset)
all_scores = []
# all_trial_s = []
num_epochs = 50

for i in range(k):
  print('processing fold #', i)

  # Preparing the validation data and properly partitioning training data
  val_X = X[num_val_samples * i:num_val_samples * (i + 1)]
  train_X = np.append(X[:num_val_samples * i], X[num_val_samples * (i + 1):], axis=0) 
    
  val_Y = Y[num_val_samples * i:num_val_samples * (i + 1)]
  train_Y = np.append(Y[:num_val_samples * i] , Y[num_val_samples * (i + 1):], axis=0)
  # Building the Keras model (already compiled)
  model = create_final_model() 
  model.fit(train_X,train_Y,epochs=num_epochs,batch_size=5,verbose=0,validation_data = (val_X,val_Y))
  # Evaluate the model on the validation data
  val_mse, val_mae = model.evaluate(val_X, val_Y, verbose=0)
  all_scores.append(val_mse)
  # all_trial_s.append(val_mae)

processing fold # 0
processing fold # 1
processing fold # 2
processing fold # 3
processing fold # 4
processing fold # 5
processing fold # 6
processing fold # 7
processing fold # 8
processing fold # 9


In [51]:
print("Final improved model: %.2f (%.2f) MSE" % (np.mean(all_scores), np.std(all_scores)))

Final improved model: 27.94 (10.62) MSE


Final model's results are diverged a little because the K-fold cross-validation can be improved and will be for sure!