## Working with simple MLP Classifier models and compare performance against Random Forest Classifier

Create a multi-layer perceptron neural network model to predict on a labeled dataset of your choosing. Compare this model to either a boosted tree or a random forest model and describe the relative tradeoffs between complexity and accuracy. Be sure to vary the hyperparameters of your MLP.

In [44]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sqlalchemy import create_engine
import time


import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_openml
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
import umap
from sklearn.neural_network import MLPClassifier
from sklearn.neural_network import BernoulliRBM
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier

import warnings
warnings.filterwarnings('ignore')

In [11]:
postgres_user = 'dsbc_student'
postgres_pw = '7*.8G9QH21'
postgres_host = '142.93.121.174'
postgres_port = '5432'
postgres_db = 'houseprices'

engine = create_engine('postgresql://{}:{}@{}:{}/{}'.format(
    postgres_user, postgres_pw, postgres_host, postgres_port, postgres_db))
house_prices = pd.read_sql_query('select * from houseprices',con=engine)

engine.dispose()

In [12]:
house_prices.head()

Unnamed: 0,id,mssubclass,mszoning,lotfrontage,lotarea,street,alley,lotshape,landcontour,utilities,...,poolarea,poolqc,fence,miscfeature,miscval,mosold,yrsold,saletype,salecondition,saleprice
0,1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,...,0,,,,0,2,2008,WD,Normal,208500
1,2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,...,0,,,,0,5,2007,WD,Normal,181500
2,3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,...,0,,,,0,9,2008,WD,Normal,223500
3,4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,...,0,,,,0,2,2006,WD,Abnorml,140000
4,5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,...,0,,,,0,12,2008,WD,Normal,250000


In [14]:
house_prices.shape

(1460, 81)

In [15]:
# numerical features I'm interested in
X = house_prices[[x.lower() for x in ['OverallQual',
                                    'OverallCond', 
                                    'GrLivArea',
                                    'bedroomabvgr'
                                    ]]]

In [16]:
X.head()

Unnamed: 0,overallqual,overallcond,grlivarea,bedroomabvgr
0,7,5,1710,3
1,6,8,1262,3
2,7,5,1786,3
3,7,5,1717,3
4,8,5,2198,4


In [29]:
y = pd.get_dummies(house_prices.neighborhood).loc[:,'NoRidge']

## Random Forest - use this to establish a benchmark to be able to compare against MLP Classifier below

In [30]:
rfc = RandomForestClassifier(n_estimators=100, max_depth=4)

rfc.fit(X, y)
cross_val_score(rfc, X, y, cv=10)

array([0.96598639, 0.97260274, 0.97260274, 0.97260274, 0.97260274,
       0.97260274, 0.97260274, 0.97260274, 0.97260274, 0.97241379])

In [31]:
# time random forest model
start_time = time.time()

rfc = ensemble.RandomForestClassifier(n_estimators=100, max_depth=4)

rfc.fit(X, y)
cross_val_score(rfc, X, y, cv=10)
print("--- %s seconds ---" % (round(time.time() - start_time, 2)))

--- 1.29 seconds ---


## Multi-layer Perceptron classifier

#### 1000 neurons wide, 1 layer

In [32]:
# https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier

start_time = time.time()


# Establish and fit the model, with a single, 1000 perceptron layer.
mlp = MLPClassifier(hidden_layer_sizes=(1000,))
mlp.fit(X, y)
cross_val_score(mlp, X, y, cv=5)

print("--- %s seconds ---" % (round(time.time() - start_time, 2)))

--- 7.51 seconds ---


In [33]:
cross_val_score(mlp, X, y, cv=5)

array([0.96928328, 0.97260274, 0.97260274, 0.97260274, 0.97250859])

#### 1000 neurons wide, 1 layer, with solver *lbfgs*

In [38]:
# https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier

start_time = time.time()


# Establish and fit the model, with a single, 1000 perceptron layer.
mlp = MLPClassifier(hidden_layer_sizes=(1000,), solver='lbfgs')
mlp.fit(X, y)
cross_val_score(mlp, X, y, cv=5)

print("--- %s seconds ---" % (round(time.time() - start_time, 2)))

--- 1.45 seconds ---


In [39]:
cross_val_score(mlp, X, y, cv=5)

array([0.96928328, 0.97260274, 0.97260274, 0.97260274, 0.97250859])

Much faster than the above, almost the same performance.

#### 1000 neurons wide, 2 layers

In [40]:
# https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier

start_time = time.time()


# Establish and fit the model, with a single, 1000 perceptron layer.
mlp = MLPClassifier(hidden_layer_sizes=(1000, 1000,))
mlp.fit(X, y)
cross_val_score(mlp, X, y, cv=5)

print("--- %s seconds ---" % (round(time.time() - start_time, 2)))

--- 82.66 seconds ---


In [41]:
cross_val_score(mlp, X, y, cv=5)

array([0.96928328, 0.97260274, 0.97260274, 0.97260274, 0.97250859])

All the extra computing time didn't seem to benefit.

#### 1000 neurons wide, 3 layers with solver *lbfgs*

In [42]:
# https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier

start_time = time.time()


# Establish and fit the model, with a single, 1000 perceptron layer.
mlp = MLPClassifier(hidden_layer_sizes=(1000, 1000, 1000,), solver='lbfgs')
mlp.fit(X, y)
cross_val_score(mlp, X, y, cv=5)

print("--- %s seconds ---" % (round(time.time() - start_time, 2)))

--- 16.72 seconds ---


In [43]:
cross_val_score(mlp, X, y, cv=5)

array([0.03071672, 0.97260274, 0.97260274, 0.02739726, 0.97250859])

This didn't seem to benefit at all.  Also, it produced two scores that are extremely low.