Create a multi-layer perceptron neural network model to predict on a labeled dataset of your choosing. Compare this model to either a boosted tree or a random forest model and describe the relative tradeoffs between complexity and accuracy. Be sure to vary the hyperparameters of your MLP!

In [29]:
conda install -c conda-forge lime

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: C:\Users\M246047\AppData\Local\Continuum\miniconda3

  added / updated specs:
    - lime


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2018.4.16  |                0         176 KB  conda-forge
    certifi-2019.11.28         |           py37_0         157 KB
    lime-0.1.1.37              |             py_0         234 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         567 KB

The following NEW packages will be INSTALLED:

  lime               conda-forge/noarch::lime-0.1.1.37-py_0

The following packages will be UPDATED:

  certifi                                  2019.9.11-py37_0 --> 2019.11.28-py37_0

The following packages w

In [45]:

import os
os.chdir('C:\\Users\\M246047\\Documents\\Python')
import pandas as pd
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt
from sklearn.datasets import fetch_openml
%matplotlib inline

import lime
import lime.lime_tabular
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier
from sklearn import ensemble
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV

In [46]:
print('Breast Cancer Columns')
br = pd.read_csv('breast_cancer.csv')
br = pd.DataFrame(br)
print(br.columns)

Breast Cancer Columns
Index(['id', 'diagnosis', 'radius_mean', 'texture_mean', 'perimeter_mean',
       'area_mean', 'smoothness_mean', 'compactness_mean', 'concavity_mean',
       'concave points_mean', 'symmetry_mean', 'fractal_dimension_mean',
       'radius_se', 'texture_se', 'perimeter_se', 'area_se', 'smoothness_se',
       'compactness_se', 'concavity_se', 'concave points_se', 'symmetry_se',
       'fractal_dimension_se', 'radius_worst', 'texture_worst',
       'perimeter_worst', 'area_worst', 'smoothness_worst',
       'compactness_worst', 'concavity_worst', 'concave points_worst',
       'symmetry_worst', 'fractal_dimension_worst'],
      dtype='object')


In [47]:
br.info()
br.diagnosis.value_counts()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 569 entries, 0 to 568
Data columns (total 32 columns):
id                         569 non-null int64
diagnosis                  569 non-null object
radius_mean                569 non-null float64
texture_mean               569 non-null float64
perimeter_mean             569 non-null float64
area_mean                  569 non-null float64
smoothness_mean            569 non-null float64
compactness_mean           569 non-null float64
concavity_mean             569 non-null float64
concave points_mean        569 non-null float64
symmetry_mean              569 non-null float64
fractal_dimension_mean     569 non-null float64
radius_se                  569 non-null float64
texture_se                 569 non-null float64
perimeter_se               569 non-null float64
area_se                    569 non-null float64
smoothness_se              569 non-null float64
compactness_se             569 non-null float64
concavity_se               569 non

B    357
M    212
Name: diagnosis, dtype: int64

In [48]:
y = br.diagnosis
y_dum = br['diagnosis'].apply(lambda x: 1 if 'M' else 0)
X = br.drop(columns=['diagnosis', 'id'])

# The Models

## Multiple Layer Perceptrons

In [58]:
mlp = MLPClassifier(hidden_layer_sizes=(1000,))
mlp.fit(X, y)
print('MLP Score: ', mlp.score(X, y))
      
cross_val = cross_val_score(mlp, X, y, cv=10)
print('Cross Validation Score: ', cross_val)
print('Mean Cross Validation Score: ', cross_val.mean())

MLP Score:  0.8541300527240774
Cross Validation Score:  [0.85964912 0.9122807  0.92982456 0.94736842 0.92982456 0.92982456
 0.98245614 0.68421053 0.92982456 0.91071429]
Mean Cross Validation Score:  0.9015977443609022


In [59]:
mlp = MLPClassifier(hidden_layer_sizes=(1000,))
mlp.fit(X, y)
print('MLP Score: ', mlp.score(X, y))
      
cross_val = cross_val_score(mlp, X, y, cv=10)
print('Cross Validation Score: ', cross_val)
print('Mean Cross Validation Score: ', cross_val.mean())

MLP Score:  0.9367311072056239
Cross Validation Score:  [0.94736842 0.85964912 0.92982456 0.94736842 0.92982456 0.92982456
 0.9122807  0.92982456 0.89473684 0.98214286]
Mean Cross Validation Score:  0.9262844611528823


In [65]:
explainer = lime.lime_tabular.LimeTabularExplainer(X.values)

def prob(data):
    return np.array(list(zip(1-model.predict(X),model.predict(X))))

i = 1
exp = explainer.explain_instance(X.values, prob, num_features=5)

ValueError: could not broadcast input array from shape (569,30) into shape (569)

In [6]:
mlp = MLPClassifier(hidden_layer_sizes=(100, 100))
mlp.fit(X, y)
print('MLP Score: ', mlp.score(X, y))

cross_val = cross_val_score(mlp, X, y, cv=10)
print('Cross Validation Score: ', cross_val)
print('Mean Cross Validation Score: ', cross_val.mean())

MLP Score:  0.9103690685413005
Cross Validation Score:  [0.92982456 0.85964912 0.9122807  0.94736842 0.92982456 0.87719298
 0.9122807  0.92982456 0.92982456 0.91071429]
Mean Cross Validation Score:  0.9138784461152882


In [8]:
mlp = MLPClassifier(hidden_layer_sizes=(100, 100, 100))
mlp.fit(X, y)
print('MLP Score: ', mlp.score(X, y))
      
cross_val = cross_val_score(mlp, X, y, cv=10)
print('Cross Validation Score: ', cross_val)
print('Mean Cross Validation Score: ', cross_val.mean())

MLP Score:  0.9244288224956063
Cross Validation Score:  [0.92982456 0.94736842 0.92982456 0.94736842 0.9122807  0.9122807
 0.9122807  0.89473684 0.84210526 0.96428571]
Mean Cross Validation Score:  0.919235588972431


In [9]:
mlp = MLPClassifier(hidden_layer_sizes=(50, 50))
mlp.fit(X, y)
print('MLP Score: ', mlp.score(X, y))
      
cross_val = cross_val_score(mlp, X, y, cv=10)
print('Cross Validation Score: ', cross_val)
print('Mean Cross Validation Score: ', cross_val.mean())

MLP Score:  0.9349736379613357
Cross Validation Score:  [0.92982456 0.85964912 0.92982456 0.94736842 0.94736842 0.92982456
 0.94736842 0.92982456 0.9122807  0.875     ]
Mean Cross Validation Score:  0.9208333333333334


In [10]:
mlp = MLPClassifier(hidden_layer_sizes=(10, 10))
mlp.fit(X, y)
print('MLP Score: ', mlp.score(X, y))
      
cross_val = cross_val_score(mlp, X, y, cv=10)
print('Cross Validation Score: ', cross_val)
print('Mean Cross Validation Score: ', cross_val.mean())



MLP Score:  0.6274165202108963




Cross Validation Score:  [0.92982456 0.9122807  0.9122807  0.94736842 0.87719298 0.89473684
 0.98245614 0.9122807  0.89473684 0.91071429]
Mean Cross Validation Score:  0.9173872180451129




In [12]:
mlp_gsc = GridSearchCV(
        estimator=MLPClassifier(),
        param_grid={
            'hidden_layer_sizes': [(20, 20), (10), (50, 10)],
            'activation': ['identity', 'logistic', 'tanh', 'relu'],
            'learning_rate': ['constant', 'invscaling', 'adaptive'],
            'max_iter': [100, 200, 300]
        },
        cv=10, scoring='accuracy', verbose=1, n_jobs=-1)

mlp_gsc.fit(X, y)
best_params = mlp_gsc.best_params_
# svr_gsc.best_estimator_
best_mlp = MLPClassifier(hidden_layer_sizes=best_params['hidden_layer_sizes'], activation=best_params["activation"],
                                         learning_rate=best_params["learning_rate"], max_iter=best_params["max_iter"],
                                         verbose=False)

print('Parameters for the best Multiple Layer Perceptron Classifier: ', best_mlp)

Fitting 10 folds for each of 108 candidates, totalling 1080 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    5.5s
[Parallel(n_jobs=-1)]: Done 184 tasks      | elapsed:   12.0s
[Parallel(n_jobs=-1)]: Done 434 tasks      | elapsed:   31.0s
[Parallel(n_jobs=-1)]: Done 784 tasks      | elapsed:  1.2min
[Parallel(n_jobs=-1)]: Done 1080 out of 1080 | elapsed:  1.6min finished


Parameters for the best Multiple Layer Perceptron Classifier:  MLPClassifier(activation='logistic', alpha=0.0001, batch_size='auto',
              beta_1=0.9, beta_2=0.999, early_stopping=False, epsilon=1e-08,
              hidden_layer_sizes=(20, 20), learning_rate='adaptive',
              learning_rate_init=0.001, max_fun=15000, max_iter=300,
              momentum=0.9, n_iter_no_change=10, nesterovs_momentum=True,
              power_t=0.5, random_state=None, shuffle=True, solver='adam',
              tol=0.0001, validation_fraction=0.1, verbose=False,
              warm_start=False)




In [13]:
best_mlp.fit(X, y)
print('MLP Score: ', best_mlp.score(X, y))
      
cross_val = cross_val_score(best_mlp, X, y, cv=10)
print('Cross Validation Score: ', cross_val)
print('Mean Cross Validation Score: ', cross_val.mean())



MLP Score:  0.9560632688927944




Cross Validation Score:  [0.92982456 0.92982456 0.9122807  0.92982456 0.98245614 0.92982456
 0.96491228 0.9122807  0.89473684 0.98214286]
Mean Cross Validation Score:  0.9368107769423559


In [None]:
# These features were obtained from the feature importance attribute in the random forests model.
X_top_features = X[['concave points_mean', 'radius_worst', 'perimeter_worst', 'concave points_worst']]

best_mlp.fit(X_top_features, y)
print('MLP Score: ', best_mlp.score(X_top_features, y))
      
cross_val = cross_val_score(best_mlp, X_top_features, y, cv=10)
print('Cross Validation Score: ', cross_val)
print('Mean Cross Validation Score: ', cross_val.mean())

## Random Forest Classifier

In [14]:
# Updating the target to a numerical variable.
br['diagnosis'] = br['diagnosis'].apply(lambda x: 1 if 'M' else 0)
br.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 569 entries, 0 to 568
Data columns (total 32 columns):
id                         569 non-null int64
diagnosis                  569 non-null int64
radius_mean                569 non-null float64
texture_mean               569 non-null float64
perimeter_mean             569 non-null float64
area_mean                  569 non-null float64
smoothness_mean            569 non-null float64
compactness_mean           569 non-null float64
concavity_mean             569 non-null float64
concave points_mean        569 non-null float64
symmetry_mean              569 non-null float64
fractal_dimension_mean     569 non-null float64
radius_se                  569 non-null float64
texture_se                 569 non-null float64
perimeter_se               569 non-null float64
area_se                    569 non-null float64
smoothness_se              569 non-null float64
compactness_se             569 non-null float64
concavity_se               569 non-

In [15]:
# Create a random forest classifier
rfc = ensemble.RandomForestClassifier(n_estimators=100, random_state=0, n_jobs=-1)
rfc.fit(X, y)
# Train the classifier
cross_val = cross_val_score(rfc, X, y, cv=10)
print('Cross Validation Score: ', cross_val)
print('Mean Cross Validation Score: ', cross_val.mean())
print('\n Feature Importances: ', rfc.feature_importances_)

Cross Validation Score:  [0.98245614 0.89473684 0.94736842 0.94736842 1.         0.98245614
 0.92982456 0.98245614 0.94736842 1.        ]
Mean Cross Validation Score:  0.9614035087719298

 Feature Importances:  [0.02429854 0.01419575 0.0539929  0.04540104 0.00795465 0.00345576
 0.07186229 0.08852713 0.00558817 0.0021412  0.01900904 0.00490146
 0.01459376 0.0394196  0.00374584 0.00366822 0.00424091 0.00508478
 0.00371126 0.0059118  0.12198315 0.0191726  0.17398717 0.07498058
 0.01269208 0.01023589 0.03039391 0.11858904 0.00796357 0.0082979 ]


In [60]:
# Create a random forest classifier
rfc = ensemble.RandomForestClassifier(n_estimators=100, random_state=0, n_jobs=-1)
rfc.fit(X, y)
# Train the classifier
cross_val = cross_val_score(rfc, X, y, cv=10)
print('Cross Validation Score: ', cross_val)
print('Mean Cross Validation Score: ', cross_val.mean())
print('\n Feature Importances: ', rfc.feature_importances_)

Cross Validation Score:  [0.98245614 0.89473684 0.94736842 0.94736842 1.         0.98245614
 0.92982456 0.98245614 0.94736842 1.        ]
Mean Cross Validation Score:  0.9614035087719298

 Feature Importances:  [0.02429854 0.01419575 0.0539929  0.04540104 0.00795465 0.00345576
 0.07186229 0.08852713 0.00558817 0.0021412  0.01900904 0.00490146
 0.01459376 0.0394196  0.00374584 0.00366822 0.00424091 0.00508478
 0.00371126 0.0059118  0.12198315 0.0191726  0.17398717 0.07498058
 0.01269208 0.01023589 0.03039391 0.11858904 0.00796357 0.0082979 ]


In [16]:
# Create a random forest classifier with all variables with >0.05 feature importance. 
rfc = ensemble.RandomForestClassifier(n_estimators=100, random_state=0, n_jobs=-1)
X_top_features = X[['radius_mean', 'smoothness_mean', 'compactness_mean', 'symmetry_se', 'radius_worst', 'texture_worst', 'smoothness_worst']]
rfc.fit(X_top_features, y)
# Train the classifier
cross_val = cross_val_score(rfc, X_top_features, y, cv=10)
print('Cross Validation Scores: ', cross_val)
print('Mean Cross Validation Score: ', cross_val.mean())

Cross Validation Scores:  [0.98245614 0.9122807  0.92982456 0.94736842 1.         0.98245614
 0.98245614 0.98245614 0.94736842 0.94642857]
Mean Cross Validation Score:  0.9613095238095237


In [17]:
rfc_gsc = GridSearchCV(
        estimator=ensemble.RandomForestClassifier(),
        param_grid={
            'max_depth': [50, 100],
            'max_features': [2, 3, 4],
            'min_samples_leaf': [2, 3, 4],
            'min_samples_split': [8, 10, 12],
            'n_estimators': [50, 100, 500]
        },
        cv=10, scoring='accuracy', verbose=1, n_jobs=-1)

rfc_gsc.fit(X_top_features, y)
best_params = rfc_gsc.best_params_
# svr_gsc.best_estimator_
best_rfc = ensemble.RandomForestClassifier(max_depth=best_params['max_depth'], max_features=best_params["max_features"],
                                         min_samples_leaf=best_params["min_samples_leaf"], min_samples_split=best_params["min_samples_split"],
                                         n_estimators=best_params['n_estimators'],verbose=False)

print('Parameters for the best Random Forest Classifier: ', best_rfc)

Fitting 10 folds for each of 162 candidates, totalling 1620 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    5.6s
[Parallel(n_jobs=-1)]: Done 184 tasks      | elapsed:   14.2s
[Parallel(n_jobs=-1)]: Done 434 tasks      | elapsed:   29.2s
[Parallel(n_jobs=-1)]: Done 784 tasks      | elapsed:   58.5s
[Parallel(n_jobs=-1)]: Done 1234 tasks      | elapsed:  1.5min


Parameters for the best Random Forest Classifier:  RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=100, max_features=2,
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=2, min_samples_split=12,
                       min_weight_fraction_leaf=0.0, n_estimators=50,
                       n_jobs=None, oob_score=False, random_state=None,
                       verbose=False, warm_start=False)


[Parallel(n_jobs=-1)]: Done 1620 out of 1620 | elapsed:  2.0min finished


In [18]:
rfc = ensemble.RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
                       max_depth=50, max_features=2, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=4, min_samples_split=8,
                       min_weight_fraction_leaf=0.0, n_estimators=50,
                       n_jobs=None, oob_score=False, random_state=None,
                       verbose=False, warm_start=False)
rfc.fit(X_top_features, y)
# Train the classifier
cross_val = cross_val_score(rfc, X_top_features, y, cv=10)
print('Cross Validation Scores: ', cross_val)
print('Mean Cross Validation Score: ', cross_val.mean())

Cross Validation Scores:  [0.98245614 0.89473684 0.92982456 0.96491228 1.         0.96491228
 0.96491228 0.96491228 0.92982456 0.96428571]
Mean Cross Validation Score:  0.9560776942355889


## Gradient Boosting

In [21]:
gbc = ensemble.GradientBoostingClassifier(n_estimators=100, random_state=0)
gbc.fit(X, y)
# Train the classifier
cross_val = cross_val_score(gbc, X, y, cv=10)
print('Cross Validation Score: ', cross_val)
print('Mean Cross Validation Score: ', cross_val.mean())
print('\n Feature Importances: ', gbc.feature_importances_)

Cross Validation Score:  [0.98245614 0.89473684 0.92982456 0.94736842 0.98245614 0.96491228
 0.94736842 0.98245614 0.96491228 1.        ]
Mean Cross Validation Score:  0.9596491228070176

 Feature Importances:  [2.03687402e-04 2.56235944e-02 4.40868303e-04 1.10107092e-03
 2.33500093e-05 1.90165830e-03 1.64345099e-04 1.27662646e-01
 3.39281262e-04 3.49498483e-04 4.79348111e-03 3.47169417e-03
 1.12427183e-03 8.35125203e-03 6.81848816e-04 8.25966536e-03
 1.05058392e-02 4.34800595e-03 1.20227781e-03 1.07019007e-03
 4.40493319e-01 3.67372245e-02 1.51223505e-01 2.98214765e-02
 7.39031161e-03 9.11313135e-04 1.16460252e-02 1.16417839e-01
 2.68425524e-03 1.05620442e-03]


In [66]:
gbc = ensemble.GradientBoostingClassifier(n_estimators=100, random_state=0)
gbc.fit(X, y)
# Train the classifier
cross_val = cross_val_score(gbc, X, y, cv=10)
print('Cross Validation Score: ', cross_val)
print('Mean Cross Validation Score: ', cross_val.mean())
print('\n Feature Importances: ', gbc.feature_importances_)

Cross Validation Score:  [0.98245614 0.89473684 0.92982456 0.94736842 0.98245614 0.96491228
 0.94736842 0.98245614 0.96491228 1.        ]
Mean Cross Validation Score:  0.9596491228070176

 Feature Importances:  [2.03687402e-04 2.56235944e-02 4.40868303e-04 1.10107092e-03
 2.33500093e-05 1.90165830e-03 1.64345099e-04 1.27662646e-01
 3.39281262e-04 3.49498483e-04 4.79348111e-03 3.47169417e-03
 1.12427183e-03 8.35125203e-03 6.81848816e-04 8.25966536e-03
 1.05058392e-02 4.34800595e-03 1.20227781e-03 1.07019007e-03
 4.40493319e-01 3.67372245e-02 1.51223505e-01 2.98214765e-02
 7.39031161e-03 9.11313135e-04 1.16460252e-02 1.16417839e-01
 2.68425524e-03 1.05620442e-03]


In [27]:
# Using features of >0.1 importance.
gbc = ensemble.GradientBoostingClassifier(n_estimators=100, random_state=0)
X_top_features = X[['concave points_mean', 'radius_worst', 'perimeter_worst', 'concave points_worst']]
gbc.fit(X_top_features, y)
# Train the classifier
cross_val = cross_val_score(gbc, X_top_features, y, cv=10)
print('Cross Validation Score: ', cross_val)
print('Mean Cross Validation Score: ', cross_val.mean())
print('\n Feature Importances: ', gbc.feature_importances_)

Cross Validation Score:  [0.96491228 0.87719298 0.89473684 0.92982456 0.98245614 0.96491228
 0.92982456 0.94736842 0.87719298 0.94642857]
Mean Cross Validation Score:  0.931484962406015

 Feature Importances:  [0.0698127  0.3974149  0.29454461 0.2382278 ]


In [54]:
gbc_gsc = GridSearchCV(
        estimator=ensemble.GradientBoostingClassifier(),
        param_grid={
            'loss': ['deviance', 'exponential'],
            'learning_rate': [0.01, 0.1, 0.5],
            'min_samples_leaf': [2, 3, 4],
            'min_samples_split': [2, 4, 8],
            'n_estimators': [50, 100, 500]
        },
        cv=10, scoring='accuracy', verbose=1, n_jobs=-1)

gbc_gsc.fit(X, y)
best_params = gbc_gsc.best_params_
# svr_gsc.best_estimator_
best_gbc = ensemble.GradientBoostingClassifier(loss=best_params["loss"], learning_rate=best_params["learning_rate"],
                                         min_samples_leaf=best_params["min_samples_leaf"], min_samples_split=best_params["min_samples_split"],
                                         n_estimators=best_params['n_estimators'],verbose=False)

print('Parameters for the best Gradient Boosting Classifier: ', best_gbc)

Fitting 10 folds for each of 162 candidates, totalling 1620 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    8.7s
[Parallel(n_jobs=-1)]: Done 184 tasks      | elapsed:   38.4s
[Parallel(n_jobs=-1)]: Done 434 tasks      | elapsed:  1.5min
[Parallel(n_jobs=-1)]: Done 784 tasks      | elapsed:  2.6min
[Parallel(n_jobs=-1)]: Done 1234 tasks      | elapsed:  3.5min
[Parallel(n_jobs=-1)]: Done 1620 out of 1620 | elapsed:  4.0min finished


Parameters for the best Random Forest Classifier:  GradientBoostingClassifier(ccp_alpha=0.0, criterion='friedman_mse', init=None,
                           learning_rate=0.1, loss='exponential', max_depth=3,
                           max_features=None, max_leaf_nodes=None,
                           min_impurity_decrease=0.0, min_impurity_split=None,
                           min_samples_leaf=4, min_samples_split=8,
                           min_weight_fraction_leaf=0.0, n_estimators=500,
                           n_iter_no_change=None, presort='deprecated',
                           random_state=None, subsample=1.0, tol=0.0001,
                           validation_fraction=0.1, verbose=False,
                           warm_start=False)


In [55]:
best_gbc.fit(X, y)
# Train the classifier
cross_val = cross_val_score(best_gbc, X, y, cv=10)
print('Cross Validation Scores: ', cross_val)
print('Mean Cross Validation Score: ', cross_val.mean())

Cross Validation Scores:  [0.98245614 0.94736842 0.94736842 0.94736842 0.98245614 0.98245614
 0.98245614 0.98245614 0.96491228 0.98214286]
Mean Cross Validation Score:  0.9701441102756891


Create a multi-layer perceptron neural network model to predict on a labeled dataset of your choosing. Compare this model to either a boosted tree or a random forest model and describe the relative tradeoffs between complexity and accuracy. Be sure to vary the hyperparameters of your MLP!

# Comparisons

MLP: These models were quite a bit slower than the random forest and gradient boosting models. They were in the low 0.9 range before using GridSearch, but hit a high MLP score of 0.956 and mean cross validation score of 0.937 when I used the optimal parameters obtained. While these are both high scores, the other models fared better. I attempted to use the LIME library for information on the features, but was unsuccessful. A final downside of MLPs is that each model runs differently every time it's run - for example, I ran the model twice with the same parameters and received different MLP and cross validation scores.

Random Forests and Gradient Boosting: These models were fast, and I was able to use a feature importance attribute to reveal which variables had the highest impacts on the models, giving me more insight into how predictions are made. Additionally, they had higher cross validation means, with random forests hitting a high of 0.961 and graddient boosting 0.970.

As of this lesson, I prefer models like Random Forests and Gradient Boosting. In addition to being more accurate, they offer more insight into what's going on inside the model.