#  Neural Networks: Regression on House Pricing Dataset
We consider a reduced version of a dataset containing house sale prices for King County, which includes Seattle. It includes homes sold between May 2014 and May 2015.

https://www.kaggle.com/harlfoxem/housesalesprediction

For each house we know 18 house features (e.g., number of bedrooms, number of bathrooms, etc.) plus its price, that is what we would like to predict.

## Insert your ID number ("numero di matricola") below

In [46]:
# put here your ``numero di matricola''
numero_di_matricola = 1


In [47]:
# import all packages needed
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
%matplotlib inline


Load the data, remove data samples/points with missing values (NaN) and take a look at them.

In [48]:
# load the data
df = pd.read_csv('kc_house_data.csv', sep=',')

# remove the data samples with missing values (NaN)
df = df.dropna()

df.describe()


Unnamed: 0,id,price,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,view,condition,grade,sqft_above,sqft_basement,yr_built,yr_renovated,zipcode,lat,long,sqft_living15,sqft_lot15
count,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0,21613.0
mean,4580302000.0,540088.1,3.370842,2.114757,2079.899736,15106.97,1.494309,0.007542,0.234303,3.40943,7.656873,1788.390691,291.509045,1971.005136,84.402258,98077.939805,47.560053,-122.213896,1986.552492,12768.455652
std,2876566000.0,367127.2,0.930062,0.770163,918.440897,41420.51,0.539989,0.086517,0.766318,0.650743,1.175459,828.090978,442.575043,29.373411,401.67924,53.505026,0.138564,0.140828,685.391304,27304.179631
min,1000102.0,75000.0,0.0,0.0,290.0,520.0,1.0,0.0,0.0,1.0,1.0,290.0,0.0,1900.0,0.0,98001.0,47.1559,-122.519,399.0,651.0
25%,2123049000.0,321950.0,3.0,1.75,1427.0,5040.0,1.0,0.0,0.0,3.0,7.0,1190.0,0.0,1951.0,0.0,98033.0,47.471,-122.328,1490.0,5100.0
50%,3904930000.0,450000.0,3.0,2.25,1910.0,7618.0,1.5,0.0,0.0,3.0,7.0,1560.0,0.0,1975.0,0.0,98065.0,47.5718,-122.23,1840.0,7620.0
75%,7308900000.0,645000.0,4.0,2.5,2550.0,10688.0,2.0,0.0,0.0,4.0,8.0,2210.0,560.0,1997.0,0.0,98118.0,47.678,-122.125,2360.0,10083.0
max,9900000000.0,7700000.0,33.0,8.0,13540.0,1651359.0,3.5,1.0,4.0,5.0,13.0,9410.0,4820.0,2015.0,2015.0,98199.0,47.7776,-121.315,6210.0,871200.0


Extract input and output data. We want to predict the price by using features other than id as input.

In [49]:
Data = df.values
# m = number of input samples
m = Data.shape[0]
print("Amount of data:", m)
Y = Data[:m, 2]
X = Data[:m, 3:]


Amount of data: 21613


## Data Pre-Processing

We split the data into 3 parts: one will be used for training and choosing the parameters, one for choosing among different models, and one for testing. The part for training and choosing the parameters will consist of $2/3$ of all samples, the one for choosing among different models will consist of $1/6$ of all samples, while the other part consists of the remaining $1/6$-th of all samples.

In [50]:
# Split data into train (2/3 of samples), validation (1/6 of samples), and test data (the rest)
from sklearn.model_selection import train_test_split
m_train = int(2./3.*m)
m_val = int((m-m_train)/2.)
m_test = m - m_train - m_val
print("Amount of data for training and deciding parameters:", m_train)
print("Amount of data for validation (choosing among different models):", m_val)
print("Amount of data for test:", m_test)

Xtrain_and_val, Xtest, Ytrain_and_val, Ytest = train_test_split(
    X, Y, test_size=m_test/m, random_state=numero_di_matricola)
Xtrain, Xval, Ytrain, Yval = train_test_split(
    Xtrain_and_val, Ytrain_and_val, test_size=m_val/(m_train+m_val), random_state=numero_di_matricola)


Amount of data for training and deciding parameters: 14408
Amount of data for validation (choosing among different models): 3602
Amount of data for test: 3603


Let's standardize the data.

In [51]:
# Data pre-processing
from sklearn import preprocessing
scaler = preprocessing.StandardScaler().fit(Xtrain)
Xtrain_scaled = scaler.transform(Xtrain)
Xtrain_and_val_scaled = scaler.transform(Xtrain_and_val)
Xval_scaled = scaler.transform(Xval)
Xtest_scaled = scaler.transform(Xtest)


## Neural Networks
Let's start by learning a simple neural network with 1 hidden node.
Note: we are going to use the input parameter solver='lbfgs' and random_state=numero_di_matricola to fix the random seed (so results are reproducible).

In [52]:
# let's load the MLPRegressor
from sklearn.neural_network import MLPRegressor

# let's define the model
# the default model performs extremely bad : R^2 negative...
# so we use the LBFGS solver
NN = MLPRegressor(hidden_layer_sizes=(1,), random_state=numero_di_matricola, solver="lbfgs")

# let's learn the model on training data
NN.fit(Xtrain_scaled, Ytrain)

# let's print the error (1 - R^2) on training data
print("training error", 1-NN.score(Xtrain_scaled, Ytrain))

# let's print the error (1 - R^2) on validation data
print("validation error", 1-NN.score(Xval_scaled, Yval))

# let's print the coefficients of the model for the input nodes (but not the bias)
print("coeffs (weights from input layer to the hidden neuron):\n", NN.coefs_)

# let's print the coefficient for the bias (i.e., the bias)
print("intercepts (weights of biases of input and hidden layers):\n", NN.intercepts_)


training error 0.29559863217335836
validation error 0.31059586274997
coeffs (weights from input layer to the hidden neuron):
 [array([[ -729.87486122],
       [  831.2303795 ],
       [ 1799.75148514],
       [  124.60013752],
       [  119.38146152],
       [ 1146.72653943],
       [  929.24974225],
       [  381.66546344],
       [ 2763.39475395],
       [ 1668.15762936],
       [  601.24288021],
       [-1820.67409758],
       [  126.75146032],
       [ -703.0951815 ],
       [ 1943.23571365],
       [ -759.15710926],
       [  300.88988146],
       [ -202.38632292]]), array([[43.04451409]])]
intercepts (weights of biases of input and hidden layers):
 [array([12602.10199491]), array([-5370.82907044])]


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)


## Neural Networks vs Linear Models

Let's learn a linear model on the other same data and compare the results with the simple NN above.

In [53]:
from sklearn import linear_model

LR = linear_model.LinearRegression()

LR.fit(Xtrain_scaled, Ytrain)

# let's print the error (1 - R^2) on training data
print("training error", 1-LR.score(Xtrain_scaled, Ytrain))

# let's print the error (1 - R^2) on validation data
print("validation error", 1-LR.score(Xval_scaled, Yval))

# let's print the coefficients of the model for the input nodes (but not the bias)
print("coef:\n", LR.coef_)

# let's print the coefficient for the bias (i.e., the bias)
print("intercept:\n", LR.intercept_)

training error 0.2971718751921173
validation error 0.3113977005845858
coef:
 [-32024.37205507  30857.38702503  81520.08274701   4358.71015115
   4210.76613832  52271.26763491  39275.36300358  17128.54179271
 114582.56984907  76295.51474423  25805.04611065 -75175.13990886
   6580.79073948 -29421.4439202   84347.82703947 -28791.00545764
   9512.22430408  -8875.3212928 ]
intercept:
 539147.4730705542


Is there a way to make a NN network learn a linear model?

Let's first check what is the loss used by MLPRegressor...

In [54]:
# let's write the code to learn a linear model with NN: how?

# let's define the model
# (notice, by default MLPRegressor uses squared loss, exactly like the linear model regressor)
NN = MLPRegressor(hidden_layer_sizes=(1,), random_state=numero_di_matricola, solver="lbfgs", activation="identity")

# let's learn the model on training data
NN.fit(Xtrain_scaled, Ytrain)

# let's print the error (1 - R^2) on training data
print("training error", 1-NN.score(Xtrain_scaled, Ytrain))

# let's print the error (1 - R^2) on validation data
print("validation error", 1-NN.score(Xval_scaled, Yval))

# let's print the coefficients of the model for the input nodes (but not the bias)
print("coeffs (weights from input layer to the hidden neuron):\n", NN.coefs_)

# let's print the coefficient for the bias (i.e., the bias)
print("intercepts (weights of biases of input and hidden layers):\n", NN.intercepts_)

# notice intercepts[0]*coefs[1,0]+intercepts[1] = intercept or linear model regressor

training error 0.2971718758498938
validation error 0.3113975786278589
coeffs (weights from input layer to the hidden neuron):
 [array([[  53.05892257],
       [ -51.11646483],
       [-135.33248854],
       [  -7.22545107],
       [  -6.98193883],
       [ -86.59959823],
       [ -65.07184335],
       [ -28.37751298],
       [-189.83995551],
       [-126.15950406],
       [ -42.62168198],
       [ 124.55003826],
       [ -10.9016803 ],
       [  48.75330118],
       [-139.74756357],
       [  47.70252967],
       [ -15.76187867],
       [  14.7062653 ]]), array([[-603.573508]])]
intercepts (weights of biases of input and hidden layers):
 [array([-892.60243245]), array([402.44197475])]


Note that there is an $\ell_2$ regularization term in MLPRegressor. What about making it smaller?

In [55]:
# COMPLETE


## More Complex NNs

Let's try more complex NN, for example increasing the number of nodes in the only hidden layer, or increasing the number of hidden layers.

Let's build a NN with 2 nodes in the only hidden layer

In [56]:
# let's build a NN with 2 nodes in the only hidden layer

# let's define the model
NN = MLPRegressor(hidden_layer_sizes=(2,), random_state=numero_di_matricola, solver="lbfgs")

# let's learn the model on training data
NN.fit(Xtrain_scaled, Ytrain)

# let's print the error (1 - R^2) on training data
print("training error", 1-NN.score(Xtrain_scaled, Ytrain))

# let's print the error (1 - R^2) on validation data
print("validation error", 1-NN.score(Xval_scaled, Yval))

# let's print the coefficients of the model for the input nodes (but not the bias)
print("coeffs (weights from input layer to the hidden neuron):\n", NN.coefs_)

# let's print the coefficient for the bias (i.e., the bias)
print("intercepts (weights of biases of input and hidden layers):\n", NN.intercepts_)



training error 0.2153208979874328
validation error 0.21690093519558962
coeffs (weights from input layer to the hidden neuron):
 [array([[ -32.10729555,  -16.38981375],
       [ 116.45979582,   27.15117946],
       [ 171.25183209,   55.99693335],
       [-288.5583707 ,   17.21660766],
       [ -98.12489751,   28.17114291],
       [ 106.85686761,   14.07110269],
       [ -29.1029425 ,   52.61406787],
       [  61.74518617,   26.7848754 ],
       [ 277.04724345,  136.12529807],
       [ 181.99576479,   49.44585152],
       [  13.99059296,   24.08211091],
       [ -53.63301901,  -90.12891609],
       [  28.99586503,    9.77813247],
       [-198.64003826,  -28.10630235],
       [ 296.80982024,  120.20746084],
       [-454.94082664,  -18.0556324 ],
       [  12.28997402,   62.54149262],
       [-320.04383237,   -5.6423691 ]]), array([[748.23410896],
       [656.7949309 ]])]
intercepts (weights of biases of input and hidden layers):
 [array([-1084.99886264,   778.08863079]), array([782.893831

Let's build a NN with 5 nodes in the only hidden layer

In [57]:
# let's build a NN with 5 nodes in the only hidden layer

# let's define the model
NN = MLPRegressor(hidden_layer_sizes=(5,), random_state=numero_di_matricola, solver="lbfgs")

# let's learn the model on training data
NN.fit(Xtrain_scaled, Ytrain)

# let's print the error (1 - R^2) on training data
print("training error", 1-NN.score(Xtrain_scaled, Ytrain))

# let's print the error (1 - R^2) on validation data
print("validation error", 1-NN.score(Xval_scaled, Yval))

# # let's print the coefficients of the model for the input nodes (but not the bias)
# print("coeffs (weights from input layer to the hidden neuron):\n", NN.coefs_)

# # let's print the coefficient for the bias (i.e., the bias)
# print("intercepts (weights of biases of input and hidden layers):\n", NN.intercepts_)

training error 0.16409610419710907
validation error 0.17530164058251185


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)


Note that with a smaller number of iterations we had a larger error on training set but a smaller error on validation data -> "early stopping is a form of regularization"

Let's build a NN with 10 nodes in the only hidden layer

In [58]:
# let's build a NN with 10 nodes in the only hidden layer

# let's define the model
NN = MLPRegressor(hidden_layer_sizes=(10,), random_state=numero_di_matricola, solver="lbfgs")

# let's learn the model on training data
NN.fit(Xtrain_scaled, Ytrain)

# let's print the error (1 - R^2) on training data
print("training error", 1-NN.score(Xtrain_scaled, Ytrain))

# let's print the error (1 - R^2) on validation data
print("validation error", 1-NN.score(Xval_scaled, Yval))

# # let's print the coefficients of the model for the input nodes (but not the bias)
# print("coeffs (weights from input layer to the hidden neuron):\n", NN.coefs_)

# # let's print the coefficient for the bias (i.e., the bias)
# print("intercepts (weights of biases of input and hidden layers):\n", NN.intercepts_)


training error 0.127399261512813
validation error 0.14331396707337007


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)


Let's build a NN with 100 nodes in the only hidden layer. Note that this is the default!

In [59]:
# let's build a NN with 100 nodes in the only hidden layer

# let's define the model
NN = MLPRegressor(max_iter=1000, hidden_layer_sizes=(100,), random_state=numero_di_matricola, solver="lbfgs")

# let's learn the model on training data
NN.fit(Xtrain_scaled, Ytrain)

# let's print the error (1 - R^2) on training data
print("training error", 1-NN.score(Xtrain_scaled, Ytrain))

# let's print the error (1 - R^2) on validation data
print("validation error", 1-NN.score(Xval_scaled, Yval))

# # let's print the coefficients of the model for the input nodes (but not the bias)
# print("coeffs (weights from input layer to the hidden neuron):\n", NN.coefs_)

# # let's print the coefficient for the bias (i.e., the bias)
# print("intercepts (weights of biases of input and hidden layers):\n", NN.intercepts_)

training error 0.05750129141969351
validation error 0.11642854289390092


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)


Let's try 2 layers, 1 node each

In [60]:
# let's build a NN with 2 hidden layers, 1 node each

# let's define the model
NN = MLPRegressor(hidden_layer_sizes=(1,1), random_state=numero_di_matricola, solver="lbfgs")

# let's learn the model on training data
NN.fit(Xtrain_scaled, Ytrain)

# let's print the error (1 - R^2) on training data
print("training error", 1-NN.score(Xtrain_scaled, Ytrain))

# let's print the error (1 - R^2) on validation data
print("validation error", 1-NN.score(Xval_scaled, Yval))

# # let's print the coefficients of the model for the input nodes (but not the bias)
# print("coeffs (weights from input layer to the hidden neuron):\n", NN.coefs_)

# # let's print the coefficient for the bias (i.e., the bias)
# print("intercepts (weights of biases of input and hidden layers):\n", NN.intercepts_)

training error 0.2709866028192074
validation error 0.28542758123222955


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)


Let's try 2 layers, 2 nodes each

In [61]:
# let's build a NN with 2 layers, 2 nodes each

# let's define the model
NN = MLPRegressor(hidden_layer_sizes=(2,2), random_state=numero_di_matricola, solver="lbfgs")

# let's learn the model on training data
NN.fit(Xtrain_scaled, Ytrain)

# let's print the error (1 - R^2) on training data
print("training error", 1-NN.score(Xtrain_scaled, Ytrain))

# let's print the error (1 - R^2) on validation data
print("validation error", 1-NN.score(Xval_scaled, Yval))

# # let's print the coefficients of the model for the input nodes (but not the bias)
# print("coeffs (weights from input layer to the hidden neuron):\n", NN.coefs_)

# # let's print the coefficient for the bias (i.e., the bias)
# print("intercepts (weights of biases of input and hidden layers):\n", NN.intercepts_)

training error 0.24579106567172326
validation error 0.2516283049275865


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)


Let's try 2 layers, 10 nodes each

In [62]:
# let's build a NN with 2 layers, 10 nodes each

# let's define the model
NN = MLPRegressor(hidden_layer_sizes=(10,10), random_state=numero_di_matricola, solver="lbfgs")

# let's learn the model on training data
NN.fit(Xtrain_scaled, Ytrain)

# let's print the error (1 - R^2) on training data
print("training error", 1-NN.score(Xtrain_scaled, Ytrain))

# let's print the error (1 - R^2) on validation data
print("validation error", 1-NN.score(Xval_scaled, Yval))

# # let's print the coefficients of the model for the input nodes (but not the bias)
# print("coeffs (weights from input layer to the hidden neuron):\n", NN.coefs_)

# # let's print the coefficient for the bias (i.e., the bias)
# print("intercepts (weights of biases of input and hidden layers):\n", NN.intercepts_)


training error 0.1490405350069579
validation error 0.16497647245517488


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)


Let's try 2 layers, 100 nodes each

In [63]:
# let's build a NN with 2 layers, 100 nodes each

# let's define the model
NN = MLPRegressor(hidden_layer_sizes=(100,100), random_state=numero_di_matricola, solver="lbfgs")

# let's learn the model on training data
NN.fit(Xtrain_scaled, Ytrain)

# let's print the error (1 - R^2) on training data
print("training error", 1-NN.score(Xtrain_scaled, Ytrain))

# let's print the error (1 - R^2) on validation data
print("validation error", 1-NN.score(Xval_scaled, Yval))

# # let's print the coefficients of the model for the input nodes (but not the bias)
# print("coeffs (weights from input layer to the hidden neuron):\n", NN.coefs_)

# # let's print the coefficient for the bias (i.e., the bias)
# print("intercepts (weights of biases of input and hidden layers):\n", NN.intercepts_)


training error 0.06478762206269884
validation error 0.11640398000193142


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)


So it seems that 1 layer (and default number of iterations) works best for this dataset. Let's try 5-fold cross-validation with number of nodes in the hidden layer between 1 and 20.
Note that we use train and validation data together, since we are doing cross-validation.

In [64]:
from sklearn.model_selection import GridSearchCV

# COMPLETE


Now let's check what is the best parameter, and compare the best NNs with the linear model (learned on train and validation) on test data.

In [65]:
# let's print the best model according to grid search
# COMPLETE

# let's print the error 1-R^2 for the best model
# COMPLETE


Let compare the error of the best NN on train and validation and on test data.

In [66]:
# COMPLETE


Now let's learn the linear model on train and validation, and get error (1-R^2) on train and validation and on test data.

In [67]:
# COMPLETE


Note: MLPRegressor has several other parameters!