In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from IPython.display import display

In [2]:
# Demonstrate classificaiton with a Neural Network - specifically a
# Multilayer Perceptron Network - to the Wisconsin Breast Cancer dataset

In [3]:
# import the Multilayer Perceptron Classification package
from sklearn.neural_network import MLPClassifier

In [4]:
# import the dataset and create training and test sets
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, stratify=cancer.target, random_state=42)

In [5]:
# use our familiar pattern to train and then evaluate a machine learning model

# create a multilayer perceptron classifier
mlp = MLPClassifier(random_state=42)
# fit the classifier to the training data
mlp.fit(X_train, y_train)
# print its accuracy on the training set
print("Accuracy on training set: {:.3f}".format(mlp.score(X_train, y_train)))
# print its accuracy on the test set
print("Accuracy on test set: {:.3f}".format(mlp.score(X_test, y_test)))

Accuracy on training set: 0.939
Accuracy on test set: 0.916


#### QUESTION: How does our multilayer perceptron model compare to our kNN model? Our linear model?
Q1: With the training accuracy of about 94%, the multilayer perceptron model seems to learn well and its training accuracy is similar to the kNN model. But its training accuracy is slightly lower than the training accuracy of the the linear model. For the kNN model, the training score is 0.93427 and the testing score is 0.92308 for k = 28. For the linear mode, the training score is 0.977 and the testing score is 0.972 with c =197.57977. We can see that there is a larger discrepancy between the training set and the testing set for the multilayer perceptron model compared to kNN and linear. This means that there is a greater risk of overfitting in the multilayer perceptron model. 
# test set performance

In [6]:
# OBSERVATION: Neural networks expect all input features to vary over similar domains and
# ideally to have a mean of 0 and a variance of 1. Rescaling your input features to ensure
# this can improve the performance of a neural network. It is important to scale the
# input features of the training set and the test set using the same factors. Output
# values do not need to be scaled.

In [6]:
# first, observe that the input features for the dataset cover very different
# ranges of values by printing out the maximum value for each feature
# Note: recall taht cancer.data is a two-dimension array of examples versus feature values
# having "axis=0" as the argument to max indicates to apply "max" across the first dimension
# this finds the max value for a feature across all examples 
print("Cancer.data.shape:{}".format(cancer.data.shape))
print("Cancer data per-feature maxima:\n{}".format(cancer.data.max(axis=0)))

Cancer.data.shape:(569, 30)
Cancer data per-feature maxima:
[2.811e+01 3.928e+01 1.885e+02 2.501e+03 1.634e-01 3.454e-01 4.268e-01
 2.012e-01 3.040e-01 9.744e-02 2.873e+00 4.885e+00 2.198e+01 5.422e+02
 3.113e-02 1.354e-01 3.960e-01 5.279e-02 7.895e-02 2.984e-02 3.604e+01
 4.954e+01 2.512e+02 4.254e+03 2.226e-01 1.058e+00 1.252e+00 2.910e-01
 6.638e-01 2.075e-01]


In [7]:
# create scaled versions of X_train and X_test

# compute the mean value for each feature of the training set
mean_on_train = X_train.mean(axis=0)
# compute the standard deviation for each feature of the training set
std_on_train = X_train.std(axis=0)

# for each individual feature value, if you subtract the mean for that
# feature and then divide by the standard deviation, the resulting
# set of feature values will have a mean of 0 and a standard deviation
# of 1
X_train_scaled = (X_train - mean_on_train)/std_on_train
# apply the same transformation to the test set
X_test_scaled = (X_test - mean_on_train)/std_on_train

In [8]:
# observe the new maximum values for the training and test sets
print("Training set per-feature maxima:\n{}".format(X_train_scaled.max(axis=0)))
print("Test set per-feature maxima:\n{}".format(X_test_scaled.max(axis=0)))

Training set per-feature maxima:
[ 4.00445109  4.49849937  4.01162075  5.35222811  3.48706735  4.59929699
  4.17444132  3.72084068  4.4834192   4.79559816  9.40604527  6.44742857
  9.82527409 11.78432972  7.79635041  5.94089805 11.08404608  6.49347423
  6.56453224  9.16596134  3.52452623  3.7810191   3.66652714  4.58294179
  3.80303832  4.94538374  4.44669544  2.63268402  5.7062447   6.59065532]
Test set per-feature maxima:
[ 3.80757821  2.25881091  3.94538624  5.35801627  5.05497445  3.95295565
  4.17938925  3.97827701  3.08911193  4.46077093  8.16664526  3.34776831
  8.11718975 12.18648512  5.26796104  3.92088238  2.90601364  3.66469283
  2.78563286  2.63645351  4.13042769  2.56096381  4.32203237  6.05479726
  3.9886931   3.77434119  3.01678784  2.30219171  2.81935551  3.15169611]


In [9]:
# train a new multilayer perceptron network using scaled versions of the
# input features for the training and test sets
mlp = MLPClassifier(random_state=42)
mlp.fit(X_train_scaled, y_train)

# print out the accuracy on the training and test sets, remembering to
# use the scaled versions here as well
print("Accuracy on training set: {:.3f}".format(mlp.score(X_train_scaled, y_train)))
print("Accuracy on test set: {:.3f}".format(mlp.score(X_test_scaled, y_test)))

# Note: you will get a warning that learning does not converge; a classifier
# is still produced that can be analyzed and used

Accuracy on training set: 0.995
Accuracy on test set: 0.965




#### QUESTION: Does rescaling the data improve the behavior of the multilayer perceptron model? By how much? How does it now compare to kNN and linear models?
Q2: After rescaling the data, we can see that the performance of the training set improved by 0.056 and the performance of the testing set improved by 0.049. Compared to both kNN and linear models, the multilayer perceptron model is training better. Though it is performing the prediction better than the kNN model, its prediction performance is slightly lower than the linear model. Considering the discrepancy between the training set and the tresting set, it has not been reduced from rescaling the data. 

In [11]:
# eliminate the warning by providing the number of iterations that the MLP classifier
# is allowed to run while trying to learn good weights for its network

In [10]:
mlp = MLPClassifier(max_iter=200, random_state=42)
mlp.fit(X_train_scaled, y_train)

# print out the accuracy on the training and test sets, remembering to
# use the scaled versions here as well
print("Accuracy on training set: {:.3f}".format(mlp.score(X_train_scaled, y_train)))
print("Accuracy on test set: {:.3f}".format(mlp.score(X_test_scaled, y_test)))

Accuracy on training set: 0.995
Accuracy on test set: 0.965




In [13]:
# TODO: 200 iterations will not be enough to eliminate the warning, experiment to find a
# maximum number of iterations that eliminates the warning

In [14]:
for i in range(200, 250):
    mlp = MLPClassifier(max_iter=i, random_state=42)
    mlp.fit(X_train_scaled, y_train)
    print("Accuracy on training set: {:.3f}".format(mlp.score(X_train_scaled, y_train)), " Accuracy on test set: {:.3f}".format(mlp.score(X_test_scaled, y_test)))



Accuracy on training set: 0.995  Accuracy on test set: 0.965




Accuracy on training set: 0.995  Accuracy on test set: 0.965




Accuracy on training set: 0.995  Accuracy on test set: 0.965




Accuracy on training set: 0.995  Accuracy on test set: 0.965




Accuracy on training set: 0.995  Accuracy on test set: 0.965




Accuracy on training set: 0.995  Accuracy on test set: 0.965




Accuracy on training set: 0.995  Accuracy on test set: 0.965




Accuracy on training set: 0.995  Accuracy on test set: 0.965




Accuracy on training set: 0.995  Accuracy on test set: 0.965




Accuracy on training set: 0.995  Accuracy on test set: 0.965




Accuracy on training set: 0.995  Accuracy on test set: 0.965




Accuracy on training set: 0.995  Accuracy on test set: 0.965




Accuracy on training set: 0.995  Accuracy on test set: 0.965




Accuracy on training set: 0.995  Accuracy on test set: 0.965




Accuracy on training set: 0.995  Accuracy on test set: 0.965




Accuracy on training set: 0.995  Accuracy on test set: 0.965




Accuracy on training set: 0.995  Accuracy on test set: 0.965




Accuracy on training set: 0.995  Accuracy on test set: 0.965




Accuracy on training set: 0.995  Accuracy on test set: 0.965




Accuracy on training set: 0.995  Accuracy on test set: 0.965




Accuracy on training set: 0.995  Accuracy on test set: 0.965




Accuracy on training set: 0.995  Accuracy on test set: 0.965




Accuracy on training set: 0.995  Accuracy on test set: 0.965




Accuracy on training set: 0.998  Accuracy on test set: 0.965




Accuracy on training set: 0.998  Accuracy on test set: 0.965




Accuracy on training set: 0.998  Accuracy on test set: 0.965




Accuracy on training set: 0.998  Accuracy on test set: 0.965




Accuracy on training set: 0.998  Accuracy on test set: 0.965




Accuracy on training set: 0.998  Accuracy on test set: 0.965




Accuracy on training set: 0.998  Accuracy on test set: 0.965




Accuracy on training set: 0.998  Accuracy on test set: 0.965




Accuracy on training set: 0.998  Accuracy on test set: 0.965




Accuracy on training set: 0.998  Accuracy on test set: 0.965




Accuracy on training set: 0.998  Accuracy on test set: 0.965




Accuracy on training set: 0.998  Accuracy on test set: 0.972




Accuracy on training set: 0.998  Accuracy on test set: 0.972




Accuracy on training set: 0.993  Accuracy on test set: 0.965




Accuracy on training set: 0.998  Accuracy on test set: 0.972




Accuracy on training set: 0.998  Accuracy on test set: 0.972




Accuracy on training set: 0.998  Accuracy on test set: 0.972
Accuracy on training set: 0.979  Accuracy on test set: 0.951




Accuracy on training set: 0.998  Accuracy on test set: 0.972
Accuracy on training set: 0.986  Accuracy on test set: 0.972




Accuracy on training set: 0.998  Accuracy on test set: 0.965




Accuracy on training set: 0.998  Accuracy on test set: 0.965




Accuracy on training set: 0.993  Accuracy on test set: 0.972




Accuracy on training set: 0.998  Accuracy on test set: 0.965
Accuracy on training set: 0.998  Accuracy on test set: 0.965
Accuracy on training set: 0.981  Accuracy on test set: 0.972




Accuracy on training set: 0.998  Accuracy on test set: 0.965


#### QUESTION: Does allowing the MLP Classifier to run through learning iterations until it converges improve its performance? What evidence is there we might be overfitting?
Q3: The MLP classifier begins to converge with max_iter >= 247. When the max iterations are at 200, the training accuracy is 99.5% and the testing accuracy is 96.5%. When the max iterations are at 247, the training accuracy is 99.8% and the testing accuracy is 96.5%. Only the training accuracy improved by 0.3%. On the other hand, the max iterations of 247 increased the discrepancy between the training set and the testing set by 0.003. Therefore, the risk of overfitting of the model increased as well.

In [15]:
# OBSERVATION: Recall that we use regularization to address problems of overfitting.
# A regularization parameter alpha can be used to decrease the model's complexity,
# thus having it less exactly fit the training data and hopefully better generalize
# to the test data. Larger alpha values produce more regularization or less complexity.
# The default alpha value for a MLPClassifier is 0.0001

In [16]:
mlp = MLPClassifier(max_iter=200, alpha=0.0001, random_state=42)
mlp.fit(X_train_scaled, y_train)

# print out the accuracy on the training and test sets, remembering to
# use the scaled versions here as well
print("Accuracy on training set: {:.3f}".format(mlp.score(X_train_scaled, y_train)))
print("Accuracy on test set: {:.3f}".format(mlp.score(X_test_scaled, y_test)))

Accuracy on training set: 0.995
Accuracy on test set: 0.965




In [17]:
# TODO: Set the max_iter value for this MLP classifier to the value discovered above that
# permits the classifier to iterate until it converges. Then, experiment with increasing the
# alpha value by factors of 10 until you seem to have eliminated overfitting and produced
# a model that learns and generalizes well.

In [63]:
for i in np.linspace(0.0001, 10, 1000):
    mlp = MLPClassifier(max_iter=247, alpha=i, random_state=42)
    mlp.fit(X_train_scaled, y_train)
    print(i, " Accuracy on training set: {:.3f}".format(mlp.score(X_train_scaled, y_train)), " Accuracy on test set: {:.3f}".format(mlp.score(X_test_scaled, y_test)))

0.0001  Accuracy on training set: 0.998  Accuracy on test set: 0.965
0.01010990990990991  Accuracy on training set: 0.998  Accuracy on test set: 0.965
0.02011981981981982  Accuracy on training set: 0.998  Accuracy on test set: 0.965
0.030129729729729732  Accuracy on training set: 0.995  Accuracy on test set: 0.965




0.040139639639639645  Accuracy on training set: 0.995  Accuracy on test set: 0.965




0.05014954954954955  Accuracy on training set: 0.995  Accuracy on test set: 0.972




0.06015945945945947  Accuracy on training set: 0.995  Accuracy on test set: 0.972




0.07016936936936938  Accuracy on training set: 0.995  Accuracy on test set: 0.972




0.08017927927927929  Accuracy on training set: 0.995  Accuracy on test set: 0.972




0.0901891891891892  Accuracy on training set: 0.995  Accuracy on test set: 0.972




0.1001990990990991  Accuracy on training set: 0.995  Accuracy on test set: 0.972




0.11020900900900901  Accuracy on training set: 0.993  Accuracy on test set: 0.972




0.12021891891891894  Accuracy on training set: 0.993  Accuracy on test set: 0.972




0.13022882882882883  Accuracy on training set: 0.993  Accuracy on test set: 0.972




0.14023873873873874  Accuracy on training set: 0.993  Accuracy on test set: 0.972




0.15024864864864865  Accuracy on training set: 0.993  Accuracy on test set: 0.972




0.16025855855855856  Accuracy on training set: 0.993  Accuracy on test set: 0.972




0.17026846846846846  Accuracy on training set: 0.993  Accuracy on test set: 0.972




0.18027837837837837  Accuracy on training set: 0.993  Accuracy on test set: 0.972




0.19028828828828828  Accuracy on training set: 0.993  Accuracy on test set: 0.972




0.2002981981981982  Accuracy on training set: 0.993  Accuracy on test set: 0.972




0.2103081081081081  Accuracy on training set: 0.993  Accuracy on test set: 0.972




0.220318018018018  Accuracy on training set: 0.993  Accuracy on test set: 0.972




0.23032792792792792  Accuracy on training set: 0.993  Accuracy on test set: 0.972




0.24033783783783785  Accuracy on training set: 0.993  Accuracy on test set: 0.972




0.25034774774774776  Accuracy on training set: 0.993  Accuracy on test set: 0.972




0.26035765765765767  Accuracy on training set: 0.991  Accuracy on test set: 0.972




0.2703675675675676  Accuracy on training set: 0.991  Accuracy on test set: 0.972




0.2803774774774775  Accuracy on training set: 0.991  Accuracy on test set: 0.972




0.2903873873873874  Accuracy on training set: 0.991  Accuracy on test set: 0.972




0.3003972972972973  Accuracy on training set: 0.991  Accuracy on test set: 0.979




0.3104072072072072  Accuracy on training set: 0.991  Accuracy on test set: 0.979




0.3204171171171171  Accuracy on training set: 0.991  Accuracy on test set: 0.979




0.33042702702702703  Accuracy on training set: 0.991  Accuracy on test set: 0.979




0.34043693693693694  Accuracy on training set: 0.991  Accuracy on test set: 0.979




0.35044684684684685  Accuracy on training set: 0.991  Accuracy on test set: 0.979




0.36045675675675676  Accuracy on training set: 0.991  Accuracy on test set: 0.979




0.37046666666666667  Accuracy on training set: 0.991  Accuracy on test set: 0.979




0.3804765765765766  Accuracy on training set: 0.991  Accuracy on test set: 0.979




0.3904864864864865  Accuracy on training set: 0.991  Accuracy on test set: 0.986




0.4004963963963964  Accuracy on training set: 0.991  Accuracy on test set: 0.986




0.4105063063063063  Accuracy on training set: 0.991  Accuracy on test set: 0.986




0.4205162162162162  Accuracy on training set: 0.991  Accuracy on test set: 0.986




0.4305261261261261  Accuracy on training set: 0.991  Accuracy on test set: 0.986




0.440536036036036  Accuracy on training set: 0.991  Accuracy on test set: 0.986




0.45054594594594594  Accuracy on training set: 0.991  Accuracy on test set: 0.986




0.46055585585585584  Accuracy on training set: 0.991  Accuracy on test set: 0.986




0.47056576576576575  Accuracy on training set: 0.991  Accuracy on test set: 0.986




0.4805756756756757  Accuracy on training set: 0.991  Accuracy on test set: 0.986




0.4905855855855856  Accuracy on training set: 0.991  Accuracy on test set: 0.986




0.5005954954954955  Accuracy on training set: 0.991  Accuracy on test set: 0.986




0.5106054054054054  Accuracy on training set: 0.991  Accuracy on test set: 0.986




0.5206153153153154  Accuracy on training set: 0.991  Accuracy on test set: 0.986




0.5306252252252253  Accuracy on training set: 0.991  Accuracy on test set: 0.986




0.5406351351351352  Accuracy on training set: 0.991  Accuracy on test set: 0.986




0.5506450450450451  Accuracy on training set: 0.991  Accuracy on test set: 0.986




0.560654954954955  Accuracy on training set: 0.991  Accuracy on test set: 0.986




0.5706648648648649  Accuracy on training set: 0.991  Accuracy on test set: 0.986




0.5806747747747748  Accuracy on training set: 0.988  Accuracy on test set: 0.986




0.5906846846846847  Accuracy on training set: 0.988  Accuracy on test set: 0.986




0.6006945945945946  Accuracy on training set: 0.988  Accuracy on test set: 0.986




0.6107045045045045  Accuracy on training set: 0.988  Accuracy on test set: 0.986




0.6207144144144144  Accuracy on training set: 0.988  Accuracy on test set: 0.986




0.6307243243243243  Accuracy on training set: 0.988  Accuracy on test set: 0.986




0.6407342342342343  Accuracy on training set: 0.988  Accuracy on test set: 0.986




0.6507441441441442  Accuracy on training set: 0.988  Accuracy on test set: 0.986
0.6607540540540541  Accuracy on training set: 0.988  Accuracy on test set: 0.986
0.670763963963964  Accuracy on training set: 0.988  Accuracy on test set: 0.986
0.6807738738738739  Accuracy on training set: 0.988  Accuracy on test set: 0.986
0.6907837837837838  Accuracy on training set: 0.988  Accuracy on test set: 0.986
0.7007936936936937  Accuracy on training set: 0.988  Accuracy on test set: 0.986
0.7108036036036036  Accuracy on training set: 0.988  Accuracy on test set: 0.986
0.7208135135135135  Accuracy on training set: 0.988  Accuracy on test set: 0.986
0.7308234234234234  Accuracy on training set: 0.988  Accuracy on test set: 0.986
0.7408333333333333  Accuracy on training set: 0.988  Accuracy on test set: 0.986
0.7508432432432433  Accuracy on training set: 0.988  Accuracy on test set: 0.986
0.7608531531531532  Accuracy on training set: 0.988  Accuracy on test set: 0.986
0.7708630630630631  Accuracy 

1.671754954954955  Accuracy on training set: 0.984  Accuracy on test set: 0.986
1.6817648648648649  Accuracy on training set: 0.984  Accuracy on test set: 0.986
1.691774774774775  Accuracy on training set: 0.984  Accuracy on test set: 0.986
1.7017846846846847  Accuracy on training set: 0.984  Accuracy on test set: 0.986
1.7117945945945947  Accuracy on training set: 0.984  Accuracy on test set: 0.986
1.7218045045045045  Accuracy on training set: 0.984  Accuracy on test set: 0.986
1.7318144144144145  Accuracy on training set: 0.984  Accuracy on test set: 0.986
1.7418243243243243  Accuracy on training set: 0.984  Accuracy on test set: 0.986
1.7518342342342343  Accuracy on training set: 0.984  Accuracy on test set: 0.986
1.7618441441441441  Accuracy on training set: 0.984  Accuracy on test set: 0.986
1.7718540540540542  Accuracy on training set: 0.984  Accuracy on test set: 0.986
1.781863963963964  Accuracy on training set: 0.984  Accuracy on test set: 0.986
1.791873873873874  Accuracy on 

2.692765765765766  Accuracy on training set: 0.984  Accuracy on test set: 0.986
2.7027756756756762  Accuracy on training set: 0.984  Accuracy on test set: 0.986
2.712785585585586  Accuracy on training set: 0.984  Accuracy on test set: 0.986
2.722795495495496  Accuracy on training set: 0.984  Accuracy on test set: 0.986
2.7328054054054056  Accuracy on training set: 0.984  Accuracy on test set: 0.986
2.742815315315316  Accuracy on training set: 0.984  Accuracy on test set: 0.986
2.7528252252252257  Accuracy on training set: 0.984  Accuracy on test set: 0.986
2.7628351351351355  Accuracy on training set: 0.984  Accuracy on test set: 0.986
2.7728450450450453  Accuracy on training set: 0.984  Accuracy on test set: 0.986
2.7828549549549555  Accuracy on training set: 0.984  Accuracy on test set: 0.986
2.7928648648648653  Accuracy on training set: 0.984  Accuracy on test set: 0.986
2.802874774774775  Accuracy on training set: 0.984  Accuracy on test set: 0.986
2.812884684684685  Accuracy on tr

3.713776576576577  Accuracy on training set: 0.984  Accuracy on test set: 0.972
3.7237864864864867  Accuracy on training set: 0.984  Accuracy on test set: 0.972
3.733796396396397  Accuracy on training set: 0.984  Accuracy on test set: 0.972
3.7438063063063067  Accuracy on training set: 0.984  Accuracy on test set: 0.972
3.7538162162162165  Accuracy on training set: 0.984  Accuracy on test set: 0.972
3.7638261261261263  Accuracy on training set: 0.984  Accuracy on test set: 0.972
3.7738360360360366  Accuracy on training set: 0.984  Accuracy on test set: 0.972
3.7838459459459464  Accuracy on training set: 0.984  Accuracy on test set: 0.972
3.793855855855856  Accuracy on training set: 0.984  Accuracy on test set: 0.972
3.803865765765766  Accuracy on training set: 0.984  Accuracy on test set: 0.972
3.813875675675676  Accuracy on training set: 0.984  Accuracy on test set: 0.972
3.823885585585586  Accuracy on training set: 0.984  Accuracy on test set: 0.972
3.833895495495496  Accuracy on tra

4.744797297297297  Accuracy on training set: 0.979  Accuracy on test set: 0.958
4.754807207207207  Accuracy on training set: 0.979  Accuracy on test set: 0.958
4.764817117117117  Accuracy on training set: 0.979  Accuracy on test set: 0.958
4.774827027027027  Accuracy on training set: 0.979  Accuracy on test set: 0.958
4.784836936936937  Accuracy on training set: 0.979  Accuracy on test set: 0.958
4.794846846846847  Accuracy on training set: 0.979  Accuracy on test set: 0.958
4.804856756756757  Accuracy on training set: 0.979  Accuracy on test set: 0.958
4.814866666666667  Accuracy on training set: 0.979  Accuracy on test set: 0.958
4.824876576576576  Accuracy on training set: 0.979  Accuracy on test set: 0.958
4.834886486486487  Accuracy on training set: 0.979  Accuracy on test set: 0.958
4.844896396396396  Accuracy on training set: 0.979  Accuracy on test set: 0.958
4.854906306306306  Accuracy on training set: 0.979  Accuracy on test set: 0.958
4.8649162162162165  Accuracy on training

5.775818018018018  Accuracy on training set: 0.979  Accuracy on test set: 0.951
5.785827927927928  Accuracy on training set: 0.979  Accuracy on test set: 0.951
5.795837837837838  Accuracy on training set: 0.979  Accuracy on test set: 0.951
5.805847747747748  Accuracy on training set: 0.979  Accuracy on test set: 0.951
5.8158576576576575  Accuracy on training set: 0.979  Accuracy on test set: 0.951
5.825867567567568  Accuracy on training set: 0.979  Accuracy on test set: 0.951
5.835877477477478  Accuracy on training set: 0.979  Accuracy on test set: 0.951
5.845887387387387  Accuracy on training set: 0.979  Accuracy on test set: 0.951
5.855897297297298  Accuracy on training set: 0.979  Accuracy on test set: 0.951
5.865907207207207  Accuracy on training set: 0.979  Accuracy on test set: 0.951
5.875917117117117  Accuracy on training set: 0.979  Accuracy on test set: 0.951
5.885927027027027  Accuracy on training set: 0.979  Accuracy on test set: 0.951
5.895936936936937  Accuracy on training

6.806838738738739  Accuracy on training set: 0.981  Accuracy on test set: 0.944
6.816848648648649  Accuracy on training set: 0.981  Accuracy on test set: 0.944
6.826858558558559  Accuracy on training set: 0.981  Accuracy on test set: 0.944
6.836868468468468  Accuracy on training set: 0.981  Accuracy on test set: 0.944
6.846878378378379  Accuracy on training set: 0.981  Accuracy on test set: 0.944
6.856888288288288  Accuracy on training set: 0.981  Accuracy on test set: 0.944
6.866898198198198  Accuracy on training set: 0.981  Accuracy on test set: 0.944
6.8769081081081085  Accuracy on training set: 0.981  Accuracy on test set: 0.944
6.886918018018018  Accuracy on training set: 0.981  Accuracy on test set: 0.944
6.896927927927928  Accuracy on training set: 0.981  Accuracy on test set: 0.944
6.906937837837838  Accuracy on training set: 0.981  Accuracy on test set: 0.944
6.916947747747748  Accuracy on training set: 0.981  Accuracy on test set: 0.944
6.926957657657658  Accuracy on training

7.83785945945946  Accuracy on training set: 0.981  Accuracy on test set: 0.944
7.84786936936937  Accuracy on training set: 0.981  Accuracy on test set: 0.944
7.857879279279279  Accuracy on training set: 0.981  Accuracy on test set: 0.944
7.86788918918919  Accuracy on training set: 0.981  Accuracy on test set: 0.944
7.877899099099099  Accuracy on training set: 0.981  Accuracy on test set: 0.944
7.887909009009009  Accuracy on training set: 0.981  Accuracy on test set: 0.944
7.897918918918919  Accuracy on training set: 0.981  Accuracy on test set: 0.944
7.907928828828829  Accuracy on training set: 0.981  Accuracy on test set: 0.944
7.917938738738739  Accuracy on training set: 0.981  Accuracy on test set: 0.944
7.927948648648649  Accuracy on training set: 0.981  Accuracy on test set: 0.944
7.937958558558559  Accuracy on training set: 0.981  Accuracy on test set: 0.944
7.947968468468469  Accuracy on training set: 0.981  Accuracy on test set: 0.944
7.957978378378378  Accuracy on training set

8.86888018018018  Accuracy on training set: 0.979  Accuracy on test set: 0.944
8.87889009009009  Accuracy on training set: 0.979  Accuracy on test set: 0.944
8.8889  Accuracy on training set: 0.979  Accuracy on test set: 0.944
8.89890990990991  Accuracy on training set: 0.979  Accuracy on test set: 0.944
8.90891981981982  Accuracy on training set: 0.979  Accuracy on test set: 0.944
8.91892972972973  Accuracy on training set: 0.979  Accuracy on test set: 0.944
8.92893963963964  Accuracy on training set: 0.979  Accuracy on test set: 0.944
8.93894954954955  Accuracy on training set: 0.979  Accuracy on test set: 0.944
8.94895945945946  Accuracy on training set: 0.979  Accuracy on test set: 0.944
8.95896936936937  Accuracy on training set: 0.979  Accuracy on test set: 0.944
8.96897927927928  Accuracy on training set: 0.979  Accuracy on test set: 0.944
8.978989189189189  Accuracy on training set: 0.979  Accuracy on test set: 0.944
8.9889990990991  Accuracy on training set: 0.979  Accuracy on

9.899900900900901  Accuracy on training set: 0.974  Accuracy on test set: 0.944
9.90991081081081  Accuracy on training set: 0.974  Accuracy on test set: 0.944
9.919920720720722  Accuracy on training set: 0.974  Accuracy on test set: 0.944
9.929930630630631  Accuracy on training set: 0.974  Accuracy on test set: 0.944
9.93994054054054  Accuracy on training set: 0.974  Accuracy on test set: 0.944
9.949950450450451  Accuracy on training set: 0.974  Accuracy on test set: 0.944
9.95996036036036  Accuracy on training set: 0.974  Accuracy on test set: 0.944
9.96997027027027  Accuracy on training set: 0.974  Accuracy on test set: 0.944
9.979980180180181  Accuracy on training set: 0.974  Accuracy on test set: 0.944
9.98999009009009  Accuracy on training set: 0.974  Accuracy on test set: 0.944
10.0  Accuracy on training set: 0.974  Accuracy on test set: 0.944


#### QUESTION: What alpha value did you find that produces a MLP Classifier that generalizes well? What do you observe about the accuracy on the training and test set that provides evidence this is a good value for alpha for this problem?
Q4: α = 0.65 seems to produce a MLP classifier that learns and generalize well (Training accuracy = 98.8%, Test accuracy = 98.6%). The selection of α is based on the discrepancy between the training and the test sets such that the training and the test accuracy scores are as close as possible without lowering the test accuracy. What I observed is that as α increased, the training accuracy tended to decrease and the test accuracy tended to increase. As α is increased to 0.65 (or to some other larger α values that generate the same result), the complexity of the model decreased. So, the discrepancy is less and the risk of overfitting is lower. 

In [None]:
# OBSERVATION: We did not indicate how many hidden layers to have or how many hidden units
# to include in a hidden layer. By default, a MLPClassifier has a single hidden layer
# with 100 units in it. We can use a "hidden_layer_sizes" argument to adjust this.

In [23]:
# Return to using the default number of iterations and regularization parameter
# create a multilayer perceptron classifier with one hidden layer of just 10 units
mlp = MLPClassifier(hidden_layer_sizes=[10], random_state=42)

# fit the classifier to the training data
mlp.fit(X_train, y_train)

# print its accuracy on the training and test set
print("Accuracy on training set: {:.3f}".format(mlp.score(X_train, y_train)))
print("Accuracy on test set: {:.3f}".format(mlp.score(X_test, y_test)))

Accuracy on training set: 0.897
Accuracy on test set: 0.895




In [22]:
# create a multilayer perceptron classifier with two hidden layers,
# the first with 20 units and the second with 5 units
mlp = MLPClassifier(hidden_layer_sizes=[20, 5], random_state=42)

# fit the classifier to the training data
mlp.fit(X_train, y_train)

# print its accuracy on the training and test set
print("Accuracy on training set: {:.3f}".format(mlp.score(X_train, y_train)))
print("Accuracy on test set: {:.3f}".format(mlp.score(X_test, y_test)))

Accuracy on training set: 0.373
Accuracy on test set: 0.371




In [None]:
# OBSERVATION: Generally, we want as simple as model as possible while still being able to
# learn our target function. Often, we start with a neural network with one very large hidden
# layer to see if learning is possible. From there, we experiment with reducing the size
# of the hidden layer and applying regularlization to make computation efficient and get
# good generalization. Additional hidden layers are only added as needed if more complex
# models are needed.

In [21]:
# TODO: Experiment to determine How few units can we have in a single-hidden layer MLP
# Classifer network while still reaching the same level of performance over the
# Wisconsin Breast Cancern dataset?

# --------> As few as 32?

In [18]:
for i in range(1, 101):
    mlp = MLPClassifier(hidden_layer_sizes=[i], random_state=42)
    mlp.fit(X_train, y_train)
    print(i, " Accuracy on training set: {:.3f}".format(mlp.score(X_train_scaled, y_train)), " Accuracy on test set: {:.3f}".format(mlp.score(X_test_scaled, y_test)))



1  Accuracy on training set: 0.373  Accuracy on test set: 0.371




2  Accuracy on training set: 0.373  Accuracy on test set: 0.371




3  Accuracy on training set: 0.627  Accuracy on test set: 0.629




4  Accuracy on training set: 0.404  Accuracy on test set: 0.413




5  Accuracy on training set: 0.338  Accuracy on test set: 0.350
6  Accuracy on training set: 0.627  Accuracy on test set: 0.629




7  Accuracy on training set: 0.657  Accuracy on test set: 0.629




8  Accuracy on training set: 0.648  Accuracy on test set: 0.629
9  Accuracy on training set: 0.732  Accuracy on test set: 0.713




10  Accuracy on training set: 0.387  Accuracy on test set: 0.441




11  Accuracy on training set: 0.838  Accuracy on test set: 0.769




12  Accuracy on training set: 0.812  Accuracy on test set: 0.818
13  Accuracy on training set: 0.376  Accuracy on test set: 0.413
14  Accuracy on training set: 0.634  Accuracy on test set: 0.643




15  Accuracy on training set: 0.796  Accuracy on test set: 0.762




16  Accuracy on training set: 0.627  Accuracy on test set: 0.629
17  Accuracy on training set: 0.890  Accuracy on test set: 0.860




18  Accuracy on training set: 0.660  Accuracy on test set: 0.615




19  Accuracy on training set: 0.216  Accuracy on test set: 0.266
20  Accuracy on training set: 0.650  Accuracy on test set: 0.601




21  Accuracy on training set: 0.568  Accuracy on test set: 0.462
22  Accuracy on training set: 0.265  Accuracy on test set: 0.280




23  Accuracy on training set: 0.911  Accuracy on test set: 0.888
24  Accuracy on training set: 0.566  Accuracy on test set: 0.566
25  Accuracy on training set: 0.805  Accuracy on test set: 0.790




26  Accuracy on training set: 0.535  Accuracy on test set: 0.503
27  Accuracy on training set: 0.531  Accuracy on test set: 0.510
28  Accuracy on training set: 0.526  Accuracy on test set: 0.538




29  Accuracy on training set: 0.843  Accuracy on test set: 0.818
30  Accuracy on training set: 0.793  Accuracy on test set: 0.762
31  Accuracy on training set: 0.094  Accuracy on test set: 0.070




32  Accuracy on training set: 0.831  Accuracy on test set: 0.832




33  Accuracy on training set: 0.824  Accuracy on test set: 0.762
34  Accuracy on training set: 0.608  Accuracy on test set: 0.601




35  Accuracy on training set: 0.838  Accuracy on test set: 0.860




36  Accuracy on training set: 0.847  Accuracy on test set: 0.811
37  Accuracy on training set: 0.735  Accuracy on test set: 0.664




38  Accuracy on training set: 0.829  Accuracy on test set: 0.797




39  Accuracy on training set: 0.873  Accuracy on test set: 0.832
40  Accuracy on training set: 0.420  Accuracy on test set: 0.427
41  Accuracy on training set: 0.404  Accuracy on test set: 0.385
42  Accuracy on training set: 0.793  Accuracy on test set: 0.790




43  Accuracy on training set: 0.871  Accuracy on test set: 0.846
44  Accuracy on training set: 0.798  Accuracy on test set: 0.776
45  Accuracy on training set: 0.563  Accuracy on test set: 0.594




46  Accuracy on training set: 0.765  Accuracy on test set: 0.748
47  Accuracy on training set: 0.624  Accuracy on test set: 0.615




48  Accuracy on training set: 0.704  Accuracy on test set: 0.741




49  Accuracy on training set: 0.850  Accuracy on test set: 0.818
50  Accuracy on training set: 0.791  Accuracy on test set: 0.762
51  Accuracy on training set: 0.761  Accuracy on test set: 0.797
52  Accuracy on training set: 0.840  Accuracy on test set: 0.790
53  Accuracy on training set: 0.772  Accuracy on test set: 0.790
54  Accuracy on training set: 0.836  Accuracy on test set: 0.797
55  Accuracy on training set: 0.378  Accuracy on test set: 0.392
56  Accuracy on training set: 0.826  Accuracy on test set: 0.825
57  Accuracy on training set: 0.793  Accuracy on test set: 0.783
58  Accuracy on training set: 0.556  Accuracy on test set: 0.517
59  Accuracy on training set: 0.822  Accuracy on test set: 0.776
60  Accuracy on training set: 0.838  Accuracy on test set: 0.797
61  Accuracy on training set: 0.702  Accuracy on test set: 0.650
62  Accuracy on training set: 0.354  Accuracy on test set: 0.364
63  Accuracy on training set: 0.122  Accuracy on test set: 0.182
64  Accuracy on training 



85  Accuracy on training set: 0.859  Accuracy on test set: 0.839
86  Accuracy on training set: 0.822  Accuracy on test set: 0.825
87  Accuracy on training set: 0.803  Accuracy on test set: 0.797
88  Accuracy on training set: 0.850  Accuracy on test set: 0.811
89  Accuracy on training set: 0.434  Accuracy on test set: 0.462
90  Accuracy on training set: 0.765  Accuracy on test set: 0.741
91  Accuracy on training set: 0.746  Accuracy on test set: 0.720
92  Accuracy on training set: 0.862  Accuracy on test set: 0.881
93  Accuracy on training set: 0.831  Accuracy on test set: 0.804
94  Accuracy on training set: 0.878  Accuracy on test set: 0.832
95  Accuracy on training set: 0.678  Accuracy on test set: 0.636
96  Accuracy on training set: 0.838  Accuracy on test set: 0.839
97  Accuracy on training set: 0.742  Accuracy on test set: 0.755
98  Accuracy on training set: 0.838  Accuracy on test set: 0.790
99  Accuracy on training set: 0.871  Accuracy on test set: 0.846
100  Accuracy on training

In [20]:
mlp = MLPClassifier(random_state=42)
mlp.fit(X_train, y_train)
print(i, " Accuracy on training set: {:.3f}".format(mlp.score(X_train_scaled, y_train)), " Accuracy on test set: {:.3f}".format(mlp.score(X_test_scaled, y_test)))

100  Accuracy on training set: 0.829  Accuracy on test set: 0.839


Q1: With the training accuracy of about 94%, the multilayer perceptron model seems to learn well and its training accuracy is similar to the kNN model. But its training accuracy is slightly lower than the training accuracy of the linear model. For the kNN model, the training score is 0.93427 and the testing score is 0.92308 for k = 28. For the linear model, the training score is 0.977 and the testing score is 0.972 with c =197.57977. We can see that there is a larger discrepancy between the training set and the testing set for the multilayer perceptron model compared to kNN and linear. This means that there is a greater risk of overfitting in the multilayer perceptron model.

Q2: After rescaling the data, we can see that the performance of the training set improved by 0.056 and the performance of the testing set improved by 0.049. Compared to both kNN and linear models, the multilayer perceptron model is training better. Though it is performing the prediction better than the kNN model, its prediction performance is slightly lower than the linear model. Considering the discrepancy between the training set and the testing set, it has not been reduced from rescaling the data.

TODO1: Using a for loop, the model was given 200 to 300 iterations. With the max iterations of 247 and above, I was able to eliminate the warning.

Q3: The MLP classifier begins to converge with max_iter >= 247. When the max iterations are at 200, the training accuracy is 99.5% and the testing accuracy is 96.5%. When the max iterations are at 247, the training accuracy is 99.8% and the testing accuracy is 96.5%. Only the training accuracy improved by 0.3%. On the other hand, the max iterations of 247 increased the discrepancy between the training set and the testing set by 0.003. Therefore, the risk of overfitting of the model increased as well.

TODO2: With the max_iter = 247, the MLP classifier was produced with the alpha incremented from 0.0001 to 10 by factors of 10 units. The min alpha value of 0.65 seemed to produce a classifier that learns and generalizes well.

Q4: α = 0.65 seems to produce a MLP classifier that learns and generalize well (Training accuracy = 98.8%, Test accuracy = 98.6%). The selection of α is based on the discrepancy between the training and the test sets such that the training and the test accuracy scores are as close as possible without lowering the test accuracy. What I observed is that as α increased, the training accuracy tended to decrease and the test accuracy tended to increase. As α is increased to 0.65 (or to some other larger α values that generate the same result), the complexity of the model decreased. So, the discrepancy is less and the risk of overfitting is lower.

I tried to alter the size of the hidden layer from 1 to 100 and the classifier did not seem to reach the same performance level anymore. Overall, the performance seemed to increase as the size of the hidden layer got larger and the performance seemed to fluctuate a bit with smaller layer sizes. Though the performance did not exceed 0.9 and I am not sure if this is what I am supposed to get from changing the size of the hidden layer. What I do not understand is that if the max units of a hidden layer are 100, I wonder why I would not get the same performance as before when I set the size to be 100 (----> anyone has the answer to this?).

### Parameter Testing

In [17]:
mlp = MLPClassifier(random_state=42)
mlp.fit(X_train_scaled, y_train)

print("Accuracy on training set: {:.3f}".format(mlp.score(X_train_scaled, y_train)))
print("Accuracy on test set: {:.3f}".format(mlp.score(X_test_scaled, y_test)))

Accuracy on training set: 0.995
Accuracy on test set: 0.965




In [18]:
mlp = MLPClassifier(activation='logistic', random_state=42)
mlp.fit(X_train_scaled, y_train)

print("Accuracy on training set: {:.3f}".format(mlp.score(X_train_scaled, y_train)))
print("Accuracy on test set: {:.3f}".format(mlp.score(X_test_scaled, y_test)))

Accuracy on training set: 0.988
Accuracy on test set: 0.965




In [22]:
for i in range(1, 30):
    mlp = MLPClassifier(hidden_layer_sizes=(i,), random_state=42)
    mlp.fit(X_train_scaled, y_train)
    
    print(i)
    print("Accuracy on training set: {:.3f}".format(mlp.score(X_train_scaled, y_train)))
    print("Accuracy on test set: {:.3f}".format(mlp.score(X_test_scaled, y_test)))



1
Accuracy on training set: 0.373
Accuracy on test set: 0.371




2
Accuracy on training set: 0.979
Accuracy on test set: 0.944




3
Accuracy on training set: 0.986
Accuracy on test set: 0.986




4
Accuracy on training set: 0.967
Accuracy on test set: 0.951




5
Accuracy on training set: 0.981
Accuracy on test set: 0.965




6
Accuracy on training set: 0.981
Accuracy on test set: 0.965




7
Accuracy on training set: 0.977
Accuracy on test set: 0.958




8
Accuracy on training set: 0.988
Accuracy on test set: 0.979




9
Accuracy on training set: 0.986
Accuracy on test set: 0.958




10
Accuracy on training set: 0.984
Accuracy on test set: 0.965




11
Accuracy on training set: 0.986
Accuracy on test set: 0.965




12
Accuracy on training set: 0.986
Accuracy on test set: 0.944




13
Accuracy on training set: 0.991
Accuracy on test set: 0.972




14
Accuracy on training set: 0.991
Accuracy on test set: 0.972




15
Accuracy on training set: 0.993
Accuracy on test set: 0.979




16
Accuracy on training set: 0.991
Accuracy on test set: 0.958




17
Accuracy on training set: 0.991
Accuracy on test set: 0.958




18
Accuracy on training set: 0.986
Accuracy on test set: 0.979




19
Accuracy on training set: 0.991
Accuracy on test set: 0.965




20
Accuracy on training set: 0.986
Accuracy on test set: 0.965




21
Accuracy on training set: 0.986
Accuracy on test set: 0.965




22
Accuracy on training set: 0.988
Accuracy on test set: 0.972




23
Accuracy on training set: 0.991
Accuracy on test set: 0.979




24
Accuracy on training set: 0.991
Accuracy on test set: 0.972




25
Accuracy on training set: 0.991
Accuracy on test set: 0.958




26
Accuracy on training set: 0.991
Accuracy on test set: 0.972




27
Accuracy on training set: 0.988
Accuracy on test set: 0.972




28
Accuracy on training set: 0.988
Accuracy on test set: 0.972
29
Accuracy on training set: 0.986
Accuracy on test set: 0.979




In [58]:
mlp = MLPClassifier(learning_rate_init = 0.01, random_state=42)
mlp.fit(X_train_scaled, y_train)

print("Accuracy on training set: {:.3f}".format(mlp.score(X_train_scaled, y_train)))
print("Accuracy on test set: {:.3f}".format(mlp.score(X_test_scaled, y_test)))

Accuracy on training set: 1.000
Accuracy on test set: 0.965


In [60]:
mlp = MLPClassifier(hidden_layer_sizes=(15,), activation='logistic', learning_rate_init = 0.1, random_state=42)
mlp.fit(X_train_scaled, y_train)

print("Accuracy on training set: {:.3f}".format(mlp.score(X_train_scaled, y_train)))
print("Accuracy on test set: {:.3f}".format(mlp.score(X_test_scaled, y_test)))

Accuracy on training set: 0.998
Accuracy on test set: 0.979


In [39]:
mlp = MLPClassifier(solver='lbfgs', random_state=42)
mlp.fit(X_train_scaled, y_train)

print("Accuracy on training set: {:.3f}".format(mlp.score(X_train_scaled, y_train)))
print("Accuracy on test set: {:.3f}".format(mlp.score(X_test_scaled, y_test)))

Accuracy on training set: 1.000
Accuracy on test set: 0.958


In [62]:
mlp = MLPClassifier(solver='lbfgs', hidden_layer_sizes=(20,), random_state=42)
mlp.fit(X_train_scaled, y_train)

print("Accuracy on training set: {:.3f}".format(mlp.score(X_train_scaled, y_train)))
print("Accuracy on test set: {:.3f}".format(mlp.score(X_test_scaled, y_test)))

Accuracy on training set: 1.000
Accuracy on test set: 0.972


With the random state of 42 and all else in the default setting, changing the activation function to logistic (activation='logistic') increased the training accuracy, but there was a concern of overfitting. With the random state of 42 and all else in the default setting, I found that the hidden layer size of 15 (hidden_layer_sizes=(15,)) generated the greatest training as well as testing accuracy among the size variation between 1-30. With the random state of 42 and all else in the default setting, increasing the initial learning rate to 0.1 (learning_rate_init= 0.01,) increase the training accuracy to 100% (I wonder if this is possible?), but the testing accuracy did not differ much. The same result as increasing the initial learning rate was generated when I changed the solver for weight optimization to lbfgs (solver='lbfgs'). With (solver='lbfgs', hidden_layer_sizes=(20,)), the training accuracy increased to 100% and the testing accuracy also increased to 97.2%.