# Challenge

Create a multi-layer perceptron neural network model to predict on a labeled dataset of your choosing. 

Compare this model to either a boosted tree or a random forest model and describe the relative tradeoffs between complexity and accuracy. Be sure to vary the hyperparameters of your MLP!

## Create a multi-layer perceptron neural network model to predict on a labeled dataset of your choosing.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
df = pd.read_csv(r'C:\Users\katec\Thinkful\data_collections\heart.csv')

In [3]:
pd.set_option('display.max_columns', 50)

In [4]:
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 303 entries, 0 to 302
Data columns (total 14 columns):
age         303 non-null int64
sex         303 non-null int64
cp          303 non-null int64
trestbps    303 non-null int64
chol        303 non-null int64
fbs         303 non-null int64
restecg     303 non-null int64
thalach     303 non-null int64
exang       303 non-null int64
oldpeak     303 non-null float64
slope       303 non-null int64
ca          303 non-null int64
thal        303 non-null int64
target      303 non-null int64
dtypes: float64(1), int64(13)
memory usage: 33.2 KB


In [6]:
df.isnull().sum()

age         0
sex         0
cp          0
trestbps    0
chol        0
fbs         0
restecg     0
thalach     0
exang       0
oldpeak     0
slope       0
ca          0
thal        0
target      0
dtype: int64

In [7]:
# Define the features and the outcome.
X = df.iloc[:, :13]
Y = df.iloc[:, 13]

In [8]:
# Import the model.
from sklearn.neural_network import MLPClassifier

# Establish and fit the model, with a single, 100 perceptron layer.
mlp = MLPClassifier(hidden_layer_sizes=(1000,))
mlp.fit(X, Y)

MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(1000,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
       random_state=None, shuffle=True, solver='adam', tol=0.0001,
       validation_fraction=0.1, verbose=False, warm_start=False)

In [9]:
mlp.score(X, Y)

0.8118811881188119

In [10]:
Y.value_counts()/len(Y)

1    0.544554
0    0.455446
Name: target, dtype: float64

In [11]:
#train/test split; train model
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2)

mlp.fit(X_train, Y_train)


#Get and store predicted values.
Y_pred_train_mlp = mlp.predict(X_train)
mlp_R_sq_train = mlp.score(X_train, Y_train)


print('R^2_train: {}'.format(mlp.score(X_train, Y_train)))

R^2_train: 0.8512396694214877


In [12]:
#cross_validation
from sklearn.model_selection import cross_val_score
mlp_cross_val_score=cross_val_score(mlp, X_train, Y_train, cv=10)
print('Cross Validation Score:',cross_val_score(mlp, X_train, Y_train, cv=10))

Cross Validation Score: [0.64       0.8        0.79166667 0.66666667 0.875      0.83333333
 0.75       0.75       0.79166667 0.875     ]


In [None]:
#test
mlp.fit(X_test, Y_test)

#Get and store predicted values.
Y_pred_test_mlp = mlp.predict(X_test)
mlp_R_sq_test = mlp.score(X_test, Y_test)


print('R^2_test: {}'.format(mlp.score(X_test, Y_test)))

### eval
The model is overfitting, with cross-validation scores varying widely. A probable factor for the overfitting is the small limited amount of data, both data size and number of features. 

There may be some improvement in the model's performance with tuning of hyperparameters. Will focus on three parameters in particular: hidden layer size, alpha, and activation. 



### optimize

In [13]:
from pprint import pprint
# Look at parameters used by our current forest
print('Parameters currently in use:\n')
pprint(mlp.get_params())

Parameters currently in use:

{'activation': 'relu',
 'alpha': 0.0001,
 'batch_size': 'auto',
 'beta_1': 0.9,
 'beta_2': 0.999,
 'early_stopping': False,
 'epsilon': 1e-08,
 'hidden_layer_sizes': (1000,),
 'learning_rate': 'constant',
 'learning_rate_init': 0.001,
 'max_iter': 200,
 'momentum': 0.9,
 'n_iter_no_change': 10,
 'nesterovs_momentum': True,
 'power_t': 0.5,
 'random_state': None,
 'shuffle': True,
 'solver': 'adam',
 'tol': 0.0001,
 'validation_fraction': 0.1,
 'verbose': False,
 'warm_start': False}


#### grid search

In [14]:
parameter_space = {
    'hidden_layer_sizes': [(50,50,50), (50,100,50), (1000,)],
    'activation': ['identity', 'logistic', 'tanh', 'relu'],
    'alpha': [0.0001, 0.05],
    }

In [15]:
from sklearn.model_selection import GridSearchCV

clf = GridSearchCV(mlp, parameter_space, n_jobs=-1, cv=3)
clf.fit(X_train, Y_train)



GridSearchCV(cv=3, error_score='raise-deprecating',
       estimator=MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(1000,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
       random_state=None, shuffle=True, solver='adam', tol=0.0001,
       validation_fraction=0.1, verbose=False, warm_start=False),
       fit_params=None, iid='warn', n_jobs=-1,
       param_grid={'hidden_layer_sizes': [(50, 50, 50), (50, 100, 50), (1000,)], 'activation': ['identity', 'logistic', 'tanh', 'relu'], 'alpha': [0.0001, 0.05]},
       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
       scoring=None, verbose=0)

In [16]:
# Best parameter set
print('Best parameters found:\n', clf.best_params_)

Best parameters found:
 {'activation': 'relu', 'alpha': 0.05, 'hidden_layer_sizes': (1000,)}


In [39]:
def evaluate(model, X_test, Y_test):
    predictions = model.predict(X_test)
    errors = abs(predictions - Y_test)
    mape = 100 * np.mean(errors / Y_test)
    accuracy = 100 - mape
    print('Model Performance')
    print('Average Error: {:0.4f} degrees.'.format(np.mean(errors)))
    print('Accuracy = {:0.2f}%.'.format(accuracy))
    
    return accuracy

base_model = MLPClassifier(hidden_layer_sizes=(1000,))
base_model.fit(X_train, Y_train)
base_accuracy = evaluate(base_model, X_test, Y_test)

best_model = MLPClassifier(activation='relu', alpha=0.05, hidden_layer_sizes=(1000,))
best_model.fit(X_train, Y_train)
grid_accuracy = evaluate(best_model, X_test, Y_test)

print('Improvement of {:0.2f}%.'.format( 100 * (grid_accuracy - base_accuracy) / base_accuracy))

Model Performance
Average Error: 0.1639 degrees.
Accuracy = -inf%.
Model Performance
Average Error: 0.1639 degrees.
Accuracy = -inf%.
Improvement of nan%.




#### randomized search  NEED HELP TO MAKE WORK

In [22]:
from sklearn.model_selection import RandomizedSearchCV

In [35]:
activation = ['identity', 'logistic', 'tanh', 'relu']

alpha = [0.0001, 0.05]

hidden_layer_sizes = [(50,), (100,), (1000,)]

random_grid = {'activation': activation,
               'alpha': alpha,
               'hidden_layer_sizes': hidden_layer_sizes 
               }

pprint(random_grid)

{'activation': ['identity', 'logistic', 'tanh', 'relu'],
 'alpha': [0.0001, 0.05],
 'hidden_layer_sizes': [(50,), (100,), (1000,)]}


In [33]:
# Use the random grid to search for best hyperparameters
# First create the base model to tune
mlp = MLPClassifier()
# Random search of parameters, using 3 fold cross validation, 
# search across 100 different combinations, and use all available cores
mlp_random = RandomizedSearchCV(estimator = mlp, param_distributions = random_grid, n_iter = 100, cv = 3)
# Fit the random search model
mlp_random.fit(X_train, Y_train)





RandomizedSearchCV(cv=3, error_score='raise-deprecating',
          estimator=MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
       random_state=None, shuffle=True, solver='adam', tol=0.0001,
       validation_fraction=0.1, verbose=False, warm_start=False),
          fit_params=None, iid='warn', n_iter=100, n_jobs=None,
          param_distributions={'activation': ['identity', 'logistic', 'tanh', 'relu'], 'alpha': [0.0001, 0.05], 'hidden_layer_sizes': [(50,), (100,), (1000,)]},
          pre_dispatch='2*n_jobs', random_state=None, refit=True,
          return_train_score='warn', scoring=None, verbose=0)

In [36]:
mlp_random.best_params_

AttributeError: 'MLPClassifier' object has no attribute 'best_params_'

In [34]:
# Use the random grid to search for best hyperparameters
# First create the base model to tune
mlp = MLPClassifier()

# Random search of parameters, using 3 fold cross validation, 
# search across 100 different combinations, and use all available cores
mlp_random = MLPClassifier(activation='relu', alpha=0.05, hidden_layer_sizes=(1000,))

mlp_random.fit(X_train, Y_train)

MLPClassifier(activation='relu', alpha=0.05, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(1000,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
       random_state=None, shuffle=True, solver='adam', tol=0.0001,
       validation_fraction=0.1, verbose=False, warm_start=False)

In [28]:
def evaluate(model, X_test, Y_test):
    predictions = model.predict(X_test)
    errors = abs(predictions - Y_test)
    mape = 100 * np.mean(errors / Y_test)
    accuracy = 100 - mape
    print('Model Performance')
    print('Average Error: {:0.4f} degrees.'.format(np.mean(errors)))
    print('Accuracy = {:0.2f}%.'.format(accuracy))
    
    return accuracy

base_model = MLPClassifier(hidden_layer_sizes=(1000,))
base_model.fit(X_train, Y_train)
base_accuracy = evaluate(base_model, X_test, Y_test)

best_random = mlp_random.best_estimator_
random_accuracy = evaluate(best_random, X_test, Y_test)

print('Improvement of {:0.2f}%.'.format( 100 * (random_accuracy - base_accuracy) / base_accuracy))

Model Performance
Average Error: 0.1475 degrees.
Accuracy = -inf%.


AttributeError: 'RandomizedSearchCV' object has no attribute 'best_estimator_'

## Compare this model to either a boosted tree or a random forest model 
and describe the relative tradeoffs between complexity and accuracy. Be sure to vary the hyperparameters of your MLP!

In [40]:
from sklearn import ensemble
from sklearn import metrics
from sklearn.metrics import mean_squared_error, r2_score

In [41]:
X = df.iloc[:, :13]
Y = df.iloc[:, 13]

rfc = ensemble.RandomForestClassifier(random_state=42)
rfc.fit(X, Y)

#Get and store predicted values.
y_pred = rfc.predict(X)
rfr_model_score = rfc.score(X, Y)

print('R^2: {}'.format(rfc.score(X, Y)))


R^2: 0.9900990099009901




In [None]:
#train/test split; train model
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2)

rfc.fit(X_train, Y_train)


#Get and store predicted values.
Y_pred_train = rfc.predict(X_train)
rfc_R_sq_train = rfc.score(X_train, Y_train)


print('R^2: {}'.format(rfc.score(X_train, Y_train)))

In [43]:
#cross_validation

rfc_cross_val_score=cross_val_score(rfc, X_train, Y_train, cv=10)
print('Cross Validation Score:',cross_val_score(rfc, X_train, Y_train, cv=10))

Cross Validation Score: [0.84       0.88       0.75       0.75       0.75       0.875
 0.79166667 0.625      0.66666667 0.875     ]


#### randomized search

In [45]:
# Look at parameters used by our current forest
print('Parameters currently in use:\n')
pprint(rfc.get_params())

Parameters currently in use:

{'bootstrap': True,
 'class_weight': None,
 'criterion': 'gini',
 'max_depth': None,
 'max_features': 'auto',
 'max_leaf_nodes': None,
 'min_impurity_decrease': 0.0,
 'min_impurity_split': None,
 'min_samples_leaf': 1,
 'min_samples_split': 2,
 'min_weight_fraction_leaf': 0.0,
 'n_estimators': 10,
 'n_jobs': None,
 'oob_score': False,
 'random_state': 42,
 'verbose': 0,
 'warm_start': False}


In [46]:
# Number of trees in random forest
n_estimators = [int(x) for x in np.linspace(start = 200, stop = 2000, num = 10)]
# Number of features to consider at every split
max_features = ['auto', 'sqrt']
# Maximum number of levels in tree
max_depth = [int(x) for x in np.linspace(10, 110, num = 11)]
max_depth.append(None)
# Minimum number of samples required to split a node
min_samples_split = [2, 5, 10]
# Minimum number of samples required at each leaf node
min_samples_leaf = [1, 2, 4]
# Method of selecting samples for training each tree
bootstrap = [True, False]

random_grid = {'n_estimators': n_estimators,
               'max_features': max_features,
               'max_depth': max_depth, 
               'min_samples_split': min_samples_split,
               'min_samples_leaf': min_samples_leaf,
               'bootstrap': bootstrap}

pprint(random_grid)

{'bootstrap': [True, False],
 'max_depth': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, None],
 'max_features': ['auto', 'sqrt'],
 'min_samples_leaf': [1, 2, 4],
 'min_samples_split': [2, 5, 10],
 'n_estimators': [200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000]}


In [48]:
# Use the random grid to search for best hyperparameters
# First create the base model to tune
rfc = ensemble.RandomForestClassifier(random_state=42)
# Random search of parameters, using 3 fold cross validation, 
# search across 100 different combinations, and use all available cores
rfc_random = RandomizedSearchCV(estimator = rfc, param_distributions = random_grid, n_iter = 100, cv = 3, verbose=2, random_state=42, n_jobs = -1)
# Fit the random search model
rfc_random.fit(X_train, Y_train)

Fitting 3 folds for each of 100 candidates, totalling 300 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    7.5s
[Parallel(n_jobs=-1)]: Done 146 tasks      | elapsed:   27.6s
[Parallel(n_jobs=-1)]: Done 300 out of 300 | elapsed:   53.8s finished


RandomizedSearchCV(cv=3, error_score='raise-deprecating',
          estimator=RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators='warn', n_jobs=None,
            oob_score=False, random_state=42, verbose=0, warm_start=False),
          fit_params=None, iid='warn', n_iter=100, n_jobs=-1,
          param_distributions={'n_estimators': [200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000], 'max_features': ['auto', 'sqrt'], 'max_depth': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, None], 'min_samples_split': [2, 5, 10], 'min_samples_leaf': [1, 2, 4], 'bootstrap': [True, False]},
          pre_dispatch='2*n_jobs', random_state=42, refit=True,
          return_train_score='warn', scoring=None, verbose=2)

In [49]:
rfc_random.best_params_

{'n_estimators': 400,
 'min_samples_split': 10,
 'min_samples_leaf': 4,
 'max_features': 'sqrt',
 'max_depth': 80,
 'bootstrap': False}

In [50]:
def evaluate(model, X_test, Y_test):
    predictions = model.predict(X_test)
    errors = abs(predictions - Y_test)
    mape = 100 * np.mean(errors / Y_test)
    accuracy = 100 - mape
    print('Model Performance')
    print('Average Error: {:0.4f} degrees.'.format(np.mean(errors)))
    print('Accuracy = {:0.2f}%.'.format(accuracy))
    
    return accuracy

base_model = ensemble.RandomForestClassifier(random_state = 42)
base_model.fit(X_train, Y_train)
base_accuracy = evaluate(base_model, X_test, Y_test)

best_random = rfc_random.best_estimator_
random_accuracy = evaluate(best_random, X_test, Y_test)

print('Improvement of {:0.2f}%.'.format( 100 * (random_accuracy - base_accuracy) / base_accuracy))

Model Performance
Average Error: 0.1639 degrees.
Accuracy = -inf%.
Model Performance
Average Error: 0.1475 degrees.
Accuracy = -inf%.
Improvement of nan%.




In [51]:
#test
rfc.fit(X_test, Y_test)

#Get and store predicted values.
Y_pred_test_rfc = rfc.predict(X_test)
rfc_R_sq_test = rfc.score(X_test, Y_test)


print('R^2_test: {}'.format(rfc.score(X_test, Y_test)))

R^2_test: 1.0




### eval
It's likely that the main problem is the small size of the dataset. If possible, the best thing you can do is get more data, the more data (generally) the less likely it is to overfit, as random patterns that appear predictive start to get drowned out as the dataset size increases.

That said, look at the following params:

**n_estimators:** in general the more trees the less likely the algorithm is to overfit. So try increasing this. The lower this number, the closer the model is to a decision tree, with a restricted feature set.

**max_features:** try reducing this number (try 30-50% of the number of features). This determines how many features each tree is randomly assigned. The smaller, the less likely to overfit, but too small will start to introduce under fitting.

**max_depth:** Experiment with this. This will reduce the complexity of the learned models, lowering over fitting risk. Try starting small, say 5-10, and increasing you get the best result.

**min_samples_leaf:** Try setting this to values greater than one. This has a similar effect to the max_depth parameter, it means the branch will stop splitting once the leaves have that number of samples each.

Note when doing this work to be scientific. Use 3 datasets, a training set, a separate 'development' dataset to tweak your parameters, and a test set that tests the final model, with the optimal parameters. Only change one parameter at a time and evaluate the result. Or experiment with the sklearn gridsearch algorithm to search across these parameters all at once.

In [None]:
# initialise data of lists. 
data = {'R_sq_train':[R_sq_train, lcv_R_sq_train, rcv_R_sq_train, ecv_R_sq_train], 'R_sq_test':[R_sq_test, lcv_R_sq_test, rcv_R_sq_test, ecv_R_sq_test],
       'MAE': [MAE, lcv_MAE, rcv_MAE, ecv_MAE ], 'MSE': [MSE, lcv_MSE, rcv_MSE, ecv_MSE], 'RMSE': [RMSE, lcv_RMSE, rcv_RMSE, ecv_RMSE],
       'MAPE': [MAPE, lcv_MAPE, rcv_MAPE, ecv_MAPE]} 
  
# Creates pandas DataFrame. 
model_stats_compare = pd.DataFrame(data, index =['M1_lrm', 'M1_lcv', 'M1_rcv', 'M1_ecv']) 
  
# print the data 
model_stats_compare 

#Get and store predicted values.

Y_pred_train_mlp = mlp.predict(X_train)
mlp_R_sq_train = mlp.score(X_train, Y_train)

mlp_cross_val_score=cross_val_score(mlp, X_train, Y_train, cv=10)

Y_pred_test_mlp = mlp.predict(X_test)
mlp_R_sq_test = mlp.score(X_test, Y_test)

