<a href="https://colab.research.google.com/github/sidharth-ds/Keras---Deep-Learning/blob/main/keras_Boston_house_price__Regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import pandas

from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasRegressor

from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

In [None]:
# load dataset
dataframe = pandas.read_csv("/content/boston house price.csv", delim_whitespace=True, header=None)
dataframe

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.0900,1,296.0,15.3,396.90,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242.0,17.8,396.90,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222.0,18.7,396.90,5.33,36.2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
501,0.06263,0.0,11.93,0,0.573,6.593,69.1,2.4786,1,273.0,21.0,391.99,9.67,22.4
502,0.04527,0.0,11.93,0,0.573,6.120,76.7,2.2875,1,273.0,21.0,396.90,9.08,20.6
503,0.06076,0.0,11.93,0,0.573,6.976,91.0,2.1675,1,273.0,21.0,396.90,5.64,23.9
504,0.10959,0.0,11.93,0,0.573,6.794,89.3,2.3889,1,273.0,21.0,393.45,6.48,22.0


In [None]:
# split into input (X) and output (Y) variables

dataset = dataframe.values
X = dataset[:,0:13]
Y = dataset[:,13]

## Keras wrappers:
* We can create Keras models and evaluate them with scikit-learn by using handy wrapper objects provided by the Keras library. 
* This is desirable, because scikit-learn excels at evaluating models and will allow us to use powerful data preparation and model evaluation schemes with very few lines of code.
* The Keras wrappers require a function as an argument. This function that we must define is responsible for creating the neural network model to be evaluated.
* The Keras wrapper object for use in scikit-learn as a regression estimator is called KerasRegressor.

## Defining & Compiling the model:
* We define the function to create the **baseline model** to be evaluated.
* It is a simple model that has a single fully connected hidden layer with the same number of neurons as input attributes (13).
* The network uses good practices such as the **rectifier activation** function for the hidden layer. 
* **No activation** function is used for the **output layer** because it is a regression problem and we are interested in predicting numerical values directly **without transform**.
* The efficient **ADAM** optimization algorithm is used and a mean squared error **(MSE) loss** function is optimized.


In [None]:
# define base model

def baseline_model():
	# create model
	model = Sequential()
	model.add(Dense(13, input_dim=13, kernel_initializer='normal', activation='relu'))
	model.add(Dense(1, kernel_initializer='normal'))
	# Compile model
	model.compile(loss='mean_squared_error', optimizer='adam')
	return model

## Evaluating the model:
* The final step is to evaluate this baseline model. 
* We will use 10-fold cross validation to evaluate the model.

In [None]:
estimator = KerasRegressor(build_fn=baseline_model, epochs=100, batch_size=5, verbose=0)

  """Entry point for launching an IPython kernel.


In [None]:
kfold = KFold(n_splits=10)
results = cross_val_score(estimator, X, Y, cv=kfold)
print("Results: %.2f (%.2f) MSE" % (results.mean(), results.std()))

Results: -29.21 (19.46) MSE


* The mean squared error is negative because scikit-learn inverts so that the metric is maximized instead of minimized. We can ignore the sign of the result.
* The result reports the mean squared error including the average (29) and standard deviation (average variance - 19) across all 10 folds of the cross validation evaluation.

## Modeling The Standardized Dataset:
* An important concern with the Boston house price dataset is that the input attributes all vary in their scales because they measure different quantities.
* Continuing on from the above baseline model, we can re-evaluate the same model using a standardized version of the input dataset.

In [9]:
# evaluate model with standardized dataset using Pipeline

estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasRegressor(build_fn=baseline_model, epochs=50, batch_size=5, verbose=0)))

pipeline = Pipeline(estimators)

kfold = KFold(n_splits=10)
results = cross_val_score(pipeline, X, Y, cv=kfold)
print("Standardized: %.2f (%.2f) MSE" % (results.mean(), results.std()))

  """


Standardized: -28.67 (26.18) MSE


***Error dropped from 29 to 28***

## Tune The Neural Network Topology:

* One way to improve the performance a neural network is to add more layers. 
* This might allow the model to extract and recombine higher order features embedded in the data.

In [10]:
# define the model

def larger_model():
	# create model
	model = Sequential()
	model.add(Dense(13, input_dim=13, kernel_initializer='normal', activation='relu'))
	model.add(Dense(6, kernel_initializer='normal', activation='relu'))
	model.add(Dense(1, kernel_initializer='normal'))
	# Compile model
	model.compile(loss='mean_squared_error', optimizer='adam')
	return model

In [11]:
# evaluating the tuned model

estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasRegressor(build_fn=larger_model, epochs=50, batch_size=5, verbose=0)))

pipeline = Pipeline(estimators)

kfold = KFold(n_splits=10)
results = cross_val_score(pipeline, X, Y, cv=kfold)
print("Larger: %.2f (%.2f) MSE" % (results.mean(), results.std()))

  """


Larger: -23.37 (24.85) MSE


***Running this model does show a further improvement in performance from 28 down to 23 thousand squared dollars***

## Evaluate a Wider Network Topology:
* Another approach to increasing the representational capability of the model is to create a wider network
* We evaluate the effect of keeping a shallow network architecture and nearly doubling the number of neurons in the one hidden layer.
* We are increaseing the number of neurons in the hidden layer compared to the baseline model from 13 to 20.

In [12]:
# define wider model

def wider_model():
	# create model
	model = Sequential()
	model.add(Dense(20, input_dim=13, kernel_initializer='normal', activation='relu'))
	model.add(Dense(1, kernel_initializer='normal'))
	# Compile model
	model.compile(loss='mean_squared_error', optimizer='adam')
	return model

In [13]:
# evaluate the wider network topology:

estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasRegressor(build_fn=wider_model, epochs=100, batch_size=5, verbose=0)))

pipeline = Pipeline(estimators)

kfold = KFold(n_splits=10)
results = cross_val_score(pipeline, X, Y, cv=kfold)
print("Wider: %.2f (%.2f) MSE" % (results.mean(), results.std()))

  """


Wider: -20.93 (24.95) MSE


***Building the model does see a further drop in error to about 20 thousand squared dollars***
* It would have been hard to guess that a wider network would outperform a deeper network on this problem. 
* The results demonstrate the importance of empirical testing when it comes to developing neural network models.