## Project of Regression Problem to Predicting Housing Prices

####  Step 1 - The problem that we will look at in this project is the Boston house price dataset.
The dataset describes 13 numerical properties of houses in Boston suburbs and is concerned with modeling the price of houses in those suburbs in thousands of dollars. As such, this is a regression predictive modeling problem. Input attributes include things like crime rate, proportion of nonretail business acres, chemical concentrations and more

##### Step 2 - We will start off by importing all of the classes and functions we will need:

In [57]:
# Import Libararies
import numpy
import pandas
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
import tensorflow as tf


##### Extension of Step 2 - The dataset describes 13 numerical properties of houses in Boston suburbs and is concerned with modeling the price of houses in those suburbs in thousands of dollars. As such, this is a regression predictive modeling problem. Input attributes include things like crime rate, proportion of nonretail business acres, chemical concentrations and more

In [14]:
# load dataset
dataframe = pandas.read_csv("housing.csv", delim_whitespace=True, header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:13]
Y = dataset[:,13]

In [17]:
# Shape of Data

dataset.shape
X.shape

(506, 13)

##### Extension of Step 2 - defining the function that creates our baseline model

In [18]:
# baseline model
def create_baseline():
        # create model, write code below
        model = Sequential()
        model.add(Dense(13, activation="relu", input_shape=(13,)))
        model.add(Dense(1))
          
        # Compile model, write code below
        model.compile(optimizer='adam', loss='mse', metrics=['mae'])
        return model


##### Evaluate this model using stratified cross validation in the scikit-learn framework

In [41]:
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# evaluate model with standardized dataset
estimator = KerasRegressor(build_fn=create_baseline, epochs=100, batch_size=5, verbose=0)
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(estimator, X, Y, cv=kfold)
print("Results: %.2f (%.2f) MSE" % (results.mean(), results.std()))

Results: -42.62 (23.42) MSE


##### Step 3: Modeling The Standardized Dataset

In [20]:
# evaluate model with standardized dataset
numpy.random.seed(seed)
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasRegressor(build_fn=create_baseline, epochs=50, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(pipeline, X, Y, cv=kfold)
print("Standardized: %.2f (%.2f) MSE" % (results.mean(), results.std()))

Standardized: -26.91 (32.54) MSE


##### Extension of Step 3 Add sigmoid Activation function on output layer

In [52]:
# baseline model
def create_baseline():
        # create model, write code below
        model = Sequential()
        model.add(Dense(13, activation="relu", input_shape=(13,)))
        model.add(Dense(1, activation="sigmoid"))
          
        # Compile model, write code below
        model.compile(optimizer='adam', loss='mse', metrics=['mae'])
        return model


In [53]:
# evaluate model with standardized dataset
numpy.random.seed(seed)
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasRegressor(build_fn=create_baseline, epochs=50, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(pipeline, X, Y, cv=kfold)
print("Standardized: %.2f (%.2f) MSE" % (results.mean(), results.std()))

Standardized: -546.51 (276.40) MSE


##### Step 4.1 - Evaluate a Deeper Network

In [32]:
def create_smaller():
        # create model, write code below
        model = Sequential()
        model.add(Dense(13, activation="relu", input_shape=(13,)))
        model.add(Dense(6, activation="relu"))
        model.add(Dense(1))
          
        # Compile model, write code below
        model.compile(optimizer='adam', loss='mse', metrics=['mae'])
        return model

In [33]:
numpy.random.seed(seed)
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasRegressor(build_fn=create_smaller, epochs=50, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(pipeline, X, Y, cv=kfold)
print("Small Model : %.2f (%.2f) MSE" % (results.mean(), results.std()))

Small Model : -22.59 (28.49) MSE


##### Step 4.2 - Evaluate a Wider Network Topology

In [34]:
# larger model
def create_larger():
    # create model
    model = Sequential()
    model.add(Dense(20, activation="relu", input_shape=(13,)))
    model.add(Dense(1))
    
    # Compile model, write code below
    model.compile(optimizer='adam', loss='mse', metrics=['mae'])
    return model

In [35]:
numpy.random.seed(seed)
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasRegressor(build_fn=create_larger, epochs=50, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(pipeline, X, Y, cv=kfold)
print("Large Model : %.2f (%.2f) MSE" % (results.mean(), results.std()))

Large Model : -25.10 (27.19) MSE


##### Step 5. Really Scaling up: developing a model that overfits

In [42]:
# Main model By Keras api
def create_model():
        # create model, write code below
        model = Sequential()
        model.add(Dense(13, activation="relu", input_shape=(13,)))
        
        model.add(Dense(10,activation="relu"))
 
        model.add(Dense(1))
          
        # Compile model, write code below
        model.compile(optimizer='adam', loss='mse', metrics=['mae'])
        return model

In [43]:
numpy.random.seed(seed)
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasRegressor(build_fn=create_larger, epochs=50, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(pipeline, X, Y, cv=kfold)
print("Large Model : %.2f (%.2f) MSE" % (results.mean(), results.std()))

Large Model : -25.08 (27.25) MSE


##### Step 6 - Tunning the model

In [44]:
# Main model By Keras api
def create_model():
        # create model, write code below
        model = Sequential()
        model.add(Dense(13, activation="relu", input_shape=(13,)))
        
        model.add(Dense(8,activation="relu"))
 
        model.add(Dense(1))
          
        # Compile model, write code below
        model.compile(optimizer='adam', loss='mse', metrics=['mae'])
        return model

In [45]:
numpy.random.seed(seed)
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasRegressor(build_fn=create_larger, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(pipeline, X, Y, cv=kfold)
print("Main Model : %.2f (%.2f) MSE" % (results.mean(), results.std()))

Main Model : -20.98 (21.89) MSE


##### Step 7 - Rewriting the code using the Keras Functional API

In [64]:
# Main model By Keras api
def create_model_by_function_api():
    # create model, write code below
    inputs = Input(shape=(13,))
    x_input = Dense(13, activation='relu')(inputs)
    x_input = Dense(8, activation='relu')(x_input)
    predictions = Dense(1)(x_input)
    model = Model(inputs=inputs, outputs=predictions)
        
    # Compile model, write code below
    model.compile(optimizer='adam', loss='mse', metrics=['mae'])
    return model

In [65]:
numpy.random.seed(seed)
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasRegressor(build_fn=create_model_by_function_api, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(pipeline, X, Y, cv=kfold)
print( "Model by Functional Api : %.2f (%.2f) MSE" % (results.mean(), results.std()))

Model by Functional Api : -21.72 (28.90) MSE


##### Step 8. Rewriting the code by doing Model Subclassing

In [66]:

class SubClass_Model(tf.keras.Model):
    #inputs = Input(shape=(60,))
    def __init__(self):
        super(SubClass_Model, self).__init__()
        self.dense1 = tf.keras.layers.Dense(13, activation='relu')
        self.dense2 = tf.keras.layers.Dense(8, activation='relu')
        self.dense3 = tf.keras.layers.Dense(1)

    def call(self, inputs):
        x_input = self.dense1(inputs)
        x_input = self.dense2(x_input)
        outputs = self.dense3(x_input)
        return outputs

In [67]:
# Main model By Keras api
def create_model_by_SubClass():
    # create model, write code below
    model = SubClass_Model()
        
    # Compile model, write code below
    model.compile(optimizer='adam', loss='mse', metrics=['mae'])
    return model

In [68]:
numpy.random.seed(seed)
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasRegressor(build_fn=create_model_by_SubClass, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(pipeline, X, Y, cv=kfold)
print("Model by Sub Class : %.2f (%.2f) MSE" % (results.mean(), results.std()))

Model by Sub Class : -21.14 (24.14) MSE


##### Step 9. Rewriting the code without using scikit-learn

In [85]:
numpy.random.seed(seed)
data=dataset
k = 10
num_validation_samples = len(data) // k
np.random.shuffle(data)
validation_scores = []
all_scores=[]
for fold in range(k):
    validation_inputs = X[num_validation_samples * fold:num_validation_samples * (fold + 1)]
    training_inputs = np.append(X[:num_validation_samples * fold], X[num_validation_samples * (fold + 1):], axis=0) #, X[num_validation_samples * (fold + 1):]
    
    validation_targets = Y[num_validation_samples * fold:num_validation_samples * (fold + 1)]
    training_targets = np.append(Y[:num_validation_samples * fold] , Y[num_validation_samples * (fold + 1):], axis=0)

    model = create_model() 
    validation_scores = model.fit(training_inputs,training_targets,epochs=12,batch_size=5,verbose=0)
    val_mse, val_mae = model.evaluate(validation_inputs, validation_targets, verbose=0)
    all_scores.append(val_mse)

validation_score = np.mean(all_scores)
print("Model using K-Fold without skitlearn : %.2f " % (validation_score))

Model using K-Fold without skitlearn : 106.47 
