# <center> Regression Of Boston House

We will delve into the Boston House Price dataset, a dataset that characterizes properties in Boston suburbs. The primary focus is on modeling house prices in these suburbs in thousands of dollars, rendering it a regression predictive modeling challenge. The dataset encompasses 13 input variables outlining the features of a given Boston suburb:

1. **CRIM:** Per capita crime rate by town.
2. **ZN:** Proportion of residential land zoned for lots over 25,000 sq.ft.
3. **INDUS:** Proportion of non-retail business acres per town.
4. **CHAS:** Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).
5. **NOX:** Nitric oxides concentration (parts per 10 million).
6. **RM:** Average number of rooms per dwelling.
7. **AGE:** Proportion of owner-occupied units built before 1940.
8. **DIS:** Weighted distances to five Boston employment centers.
9. **RAD:** Index of accessibility to radial highways.
10. **TAX:** Full-value property-tax rate per 10,000.
11. **PTRATIO:** Pupil-teacher ratio by town.
12. **B:** \(1000(Bk - 0.63)^2\), where \(Bk\) is the proportion of blacks by town.
13. **LSTAT:** % lower status of the population.
14. **MEDV:** Median value of owner-occupied homes in 1000s.

This dataset is well-studied in machine learning due to its convenience; all input and output attributes are numerical, and it comprises 506 instances for analysis.

# Import Classes and Functions

In [1]:
import pandas as pd
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from scikeras.wrappers import KerasRegressor
from sklearn.model_selection import KFold, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

## Initialize Random Number Generator

In [2]:
seed = 7
np.random.seed(seed)

## Load The Dataset

In [3]:
df = pd.read_csv('BostonHousing.csv')
df

Unnamed: 0,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,b,lstat,medv
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.0900,1,296,15.3,396.90,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,396.90,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,396.90,5.33,36.2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
501,0.06263,0.0,11.93,0,0.573,6.593,69.1,2.4786,1,273,21.0,391.99,9.67,22.4
502,0.04527,0.0,11.93,0,0.573,6.120,76.7,2.2875,1,273,21.0,396.90,9.08,20.6
503,0.06076,0.0,11.93,0,0.573,6.976,91.0,2.1675,1,273,21.0,396.90,5.64,23.9
504,0.10959,0.0,11.93,0,0.573,6.794,89.3,2.3889,1,273,21.0,393.45,6.48,22.0


In [4]:
df.isnull().sum()

crim       0
zn         0
indus      0
chas       0
nox        0
rm         5
age        0
dis        0
rad        0
tax        0
ptratio    0
b          0
lstat      0
medv       0
dtype: int64

In [5]:
BM = df['rm'].isnull()
df[BM]

Unnamed: 0,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,b,lstat,medv
10,0.22489,12.5,7.87,0,0.524,,94.3,6.3467,5,311,15.2,392.52,20.45,15.0
35,0.06417,0.0,5.96,0,0.499,,68.2,3.3603,5,279,19.2,396.9,9.68,18.9
63,0.1265,25.0,5.13,0,0.453,,43.4,7.9809,8,284,19.7,395.58,9.5,25.0
96,0.11504,0.0,2.89,0,0.445,,69.6,3.4952,2,276,18.0,391.83,11.34,21.4
135,0.55778,0.0,21.89,0,0.624,,98.2,2.1107,4,437,21.2,394.67,16.96,18.1


In [6]:
df = df.interpolate()

In [7]:
# split into input (X) and output (y) variables
dataset = df.values
X = dataset[:,:-1].astype(float)
y = dataset[:,-1]

## Define The Neural Network Model

In [8]:
# define base mode
def baseline_model():
    # create model
    model = Sequential()
    model.add(Dense(13, input_dim=X.shape[1], kernel_initializer='normal', activation='relu'))
    model.add(Dense(1, kernel_initializer='normal'))
    
    # compile model
    model.compile(loss='mean_squared_error', optimizer='adam')
    return model

In [9]:
# evaluate model with standardized dataset
np.random.seed(seed)
estimator = KerasRegressor(build_fn=baseline_model, epochs=100, batch_size=5, verbose=0)
kfold = KFold(n_splits=10, shuffle=True, random_state=seed)
result_1 = cross_val_score(estimator, X, y, cv=kfold)
print('Baseline: %.2f%% (%.2f%%) MSE' %(result_1.mean()*100, result_1.std()*100))

  X, y = self._initialize(X, y)
  X, y = self._initialize(X, y)
  X, y = self._initialize(X, y)
  X, y = self._initialize(X, y)
  X, y = self._initialize(X, y)
  X, y = self._initialize(X, y)
  X, y = self._initialize(X, y)
  X, y = self._initialize(X, y)
  X, y = self._initialize(X, y)
  X, y = self._initialize(X, y)


Baseline: 73.32% (7.76%) MSE


Running this code gives me an estimate of the model’s performance on the problem for unseen
data. The result reports the mean squared error including the average and standard deviation
(average variance) across all 10 folds of the cross validation evaluation.

## Lift Performance By Standardizing The Dataset

In [10]:
np.random.seed(seed)
estimator = []
estimator.append(('standardize', StandardScaler()))
estimator.append(('mlp', KerasRegressor(build_fn=baseline_model, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimator)
kfold = KFold(n_splits=10, shuffle=True, random_state=seed)
result_2 = cross_val_score(pipeline, X,y, cv=kfold)
print('Standardize: %.2f%% (%.2f%%) MSE' %(result_2.mean()*100, result_2.std()*100))

  X, y = self._initialize(X, y)
  X, y = self._initialize(X, y)
  X, y = self._initialize(X, y)
  X, y = self._initialize(X, y)
  X, y = self._initialize(X, y)
  X, y = self._initialize(X, y)
  X, y = self._initialize(X, y)
  X, y = self._initialize(X, y)
  X, y = self._initialize(X, y)
  X, y = self._initialize(X, y)


Standardize: 82.20% (7.32%) MSE


## Evaluate a Deeper Network Topology

In [11]:
def larger_model():
    # create model
    model = Sequential()
    model.add(Dense(13, input_dim=X.shape[1], kernel_initializer='normal', activation='relu'))
    model.add(Dense(6, kernel_initializer='normal', activation='relu'))
    model.add(Dense(1, kernel_initializer='normal'))
    
    #Compile the model
    model.compile(loss="mean_squared_error", optimizer='adam')
    return model

In [12]:
# evaluate model with standardized dataset
np.random.seed(seed)
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasRegressor(build_fn=larger_model, epochs=50, batch_size=5, verbose=0)))
pipeline = Pipeline(estimator)
kfold = KFold(n_splits=10, shuffle= True, random_state=seed)
result_3 = cross_val_score(pipeline, X,y, cv=kfold)
print('Standardize: %.2f%% (%.2f%%) MSE'%(result_3.mean()*100, result_3.std()*100))

  X, y = self._initialize(X, y)
  X, y = self._initialize(X, y)
  X, y = self._initialize(X, y)
  X, y = self._initialize(X, y)
  X, y = self._initialize(X, y)
  X, y = self._initialize(X, y)
  X, y = self._initialize(X, y)
  X, y = self._initialize(X, y)
  X, y = self._initialize(X, y)
  X, y = self._initialize(X, y)


Standardize: 80.86% (7.97%) MSE


Running this model does show a further improvement in performance from 28 down to 24 thousand squared dollars.

## Evaluate a Wider Network Topology

In [13]:
def wider_model():
    # create model
    model = Sequential()
    model.add(Dense(20, input_dim=X.shape[1], kernel_initializer='normal', activation='relu'))
    model.add(Dense(1, kernel_initializer='normal'))
    
    # compile the model
    model.compile(loss='mean_squared_error', optimizer='adam')
    return model

In [14]:
# evaluate model with standardized dataset
np.random.seed(seed)
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasRegressor(build_fn=wider_model, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = KFold(n_splits=10, shuffle=True, random_state=seed)
result_4 = cross_val_score(pipeline, X,y, cv=kfold)
print('Wider %.2f%% (%.2f%%) MSE' %(result_4.mean()*100, result_4.std()*100))

  X, y = self._initialize(X, y)
  X, y = self._initialize(X, y)
  X, y = self._initialize(X, y)
  X, y = self._initialize(X, y)
  X, y = self._initialize(X, y)
  X, y = self._initialize(X, y)
  X, y = self._initialize(X, y)
  X, y = self._initialize(X, y)
  X, y = self._initialize(X, y)
  X, y = self._initialize(X, y)


Wider 83.87% (6.77%) MSE
