<a href="https://colab.research.google.com/github/encoras/Artificial-Intelligence-Group/blob/master/MLP_test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The code is adopted from:


https://python-course.eu/machine-learning/neural-networks-with-scikit.php




Let's do a correct classification


In [61]:
from  sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn import metrics
from matplotlib import pyplot as plt
from sklearn.neural_network import MLPClassifier

In [80]:
# Load IRIS
X, y = datasets.load_iris(return_X_y=True)

In [63]:
X_temp, X_test, y_temp, y_test = train_test_split(X, y, test_size=0.4, random_state=1, shuffle=True, stratify=y)
X_train, X_valid, y_train, y_valid = train_test_split(X_temp, y_temp, test_size=0.1, random_state=10, shuffle=True, stratify=y_temp)

We will create now a **MLPClassifier**.

A few notes on the used parameters:

*hidden_layer_sizes*: `tuple, length` = *n_layers* - 2, default=(100,)
The ith element represents the number of neurons in the ith hidden layer.
(6,) means one hidden layer with 6 neurons.

*solver*:
The weight optimization can be influenced with the solver parameter. Three solver modes are available

'`lbfgs`' - is an optimizer in the family of quasi-Newton methods.

'`sgd`' - refers to stochastic gradient descent.

'`adam`' - refers to a stochastic gradient-based optimizer proposed by Kingma, Diederik, and Jimmy Ba.

Without understanding in the details of the solvers, you should know the following: 'adam' works pretty well - both training time and validation score - on relatively large datasets, i.e. thousands of training samples or more. For small datasets, however, 'lbfgs' can converge faster and perform better.

'`alpha`' - This parameter can be used to control possible '*overfitting*' and 'underfitting'. We will cover it in detail further down.

In [64]:
# Comparing the optimizers for network with 6 neurons in hiden layer
clf_lb = MLPClassifier(solver='lbfgs', alpha=1e-5, hidden_layer_sizes=(6,), random_state=123).fit(X_train, y_train)
clf_sgd = MLPClassifier(solver='sgd', max_iter=3000, alpha=1e-5, hidden_layer_sizes=(6,), random_state=123).fit(X_train, y_train)
clf_adam = MLPClassifier(solver='adam', max_iter=3000, alpha=1e-5, hidden_layer_sizes=(6,), random_state=123).fit(X_train, y_train)


In [65]:
#lbfgs test
y_pred = clf_lb.predict(X_test)
scores_clf_lb=metrics.accuracy_score(y_test,y_pred)

#sgd test
y_pred = clf_sgd.predict(X_test)
scores_clf_sgd=metrics.accuracy_score(y_test,y_pred)

#adam test
y_pred = clf_adam.predict(X_test)
scores_clf_adam=metrics.accuracy_score(y_test,y_pred)


print('lbfgs = ', scores_clf_lb, ";  sgd = ", scores_clf_sgd, "; adam = ", scores_clf_adam )


lbfgs =  0.6271929824561403 ;  sgd =  0.6271929824561403 ; adam =  0.9429824561403509


What is the influence of space scaling?

In [66]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)
X_train_std=scaler.transform(X_train)
X_test_std=scaler.transform(X_test)
X_valid_std=scaler.transform(X_valid)

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(X_train)
X_train_mm=scaler.transform(X_train)
X_test_mm=scaler.transform(X_test)
X_valid_mm=scaler.transform(X_valid)

In [67]:
# Min-Max scaler
# Comparing the optimizers for network with 6 neurons in hiden layer
clf_lb = MLPClassifier(solver='lbfgs', alpha=1e-5, hidden_layer_sizes=(6,), random_state=123).fit(X_train_mm, y_train)
clf_sgd = MLPClassifier(solver='sgd', max_iter=3000, alpha=1e-5, hidden_layer_sizes=(6,), random_state=123).fit(X_train_mm, y_train)
clf_adam = MLPClassifier(solver='adam', max_iter=3000, alpha=1e-5, hidden_layer_sizes=(6,), random_state=123).fit(X_train_mm, y_train)
#lbfgs test
y_pred = clf_lb.predict(X_test_mm)
scores_clf_lb=metrics.accuracy_score(y_test,y_pred)

#sgd test
y_pred = clf_sgd.predict(X_test_mm)
scores_clf_sgd=metrics.accuracy_score(y_test,y_pred)

#adam test
y_pred = clf_adam.predict(X_test_mm)
scores_clf_adam=metrics.accuracy_score(y_test,y_pred)


print('lbfgs_mm = ', scores_clf_lb, ";  sgd_mm = ", scores_clf_sgd, "; adam_mm = ", scores_clf_adam )

lbfgs_mm =  0.9517543859649122 ;  sgd_mm =  0.9385964912280702 ; adam_mm =  0.9473684210526315


In [68]:
# Standartisation scaler
# Comparing the optimizers for network with 6 neurons in hiden layer
clf_lb = MLPClassifier(solver='lbfgs', alpha=1e-5, hidden_layer_sizes=(6,), random_state=123).fit(X_train_std, y_train)
clf_sgd = MLPClassifier(solver='sgd', max_iter=3000, alpha=1e-5, hidden_layer_sizes=(6,), random_state=123).fit(X_train_std, y_train)
clf_adam = MLPClassifier(solver='adam', max_iter=3000, alpha=1e-5, hidden_layer_sizes=(6,), random_state=123).fit(X_train_std, y_train)
#lbfgs test
y_pred = clf_lb.predict(X_test_std)
scores_clf_lb=metrics.accuracy_score(y_test,y_pred)

#sgd test
y_pred = clf_sgd.predict(X_test_std)
scores_clf_sgd=metrics.accuracy_score(y_test,y_pred)

#adam test
y_pred = clf_adam.predict(X_test_std)
scores_clf_adam=metrics.accuracy_score(y_test,y_pred)


print('lbfgs_std = ', scores_clf_lb, ";  sgd_std = ", scores_clf_sgd, "; adam_std = ", scores_clf_adam )

lbfgs_std =  0.9692982456140351 ;  sgd_std =  0.9517543859649122 ; adam_std =  0.9692982456140351


The scaling may improve the classification result.


How about neuron activation function?

In [69]:
# activation{‘identity’, ‘logistic’, ‘tanh’, ‘relu’}, default=’relu’
# Comparing the optimizers for network with 6 neurons in hiden layer
# https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier
clf_relu = MLPClassifier(solver='lbfgs', activation='relu', alpha=1e-5, hidden_layer_sizes=(3,), random_state=123).fit(X_train_std, y_train)
clf_logistic = MLPClassifier(solver='lbfgs', activation='logistic', alpha=1e-5, hidden_layer_sizes=(3,), random_state=123).fit(X_train_std, y_train)
clf_identity = MLPClassifier(solver='lbfgs', activation='identity', alpha=1e-5, hidden_layer_sizes=(3,), random_state=123).fit(X_train_std, y_train)
#lbfgs test
y_pred = clf_relu.predict(X_test_std)
scores_clf_relu=metrics.accuracy_score(y_test,y_pred)

#sgd test
y_pred = clf_logistic.predict(X_test_std)
scores_clf_logistic=metrics.accuracy_score(y_test,y_pred)

#adam test
y_pred = clf_identity.predict(X_test_std)
scores_clf_adam=metrics.accuracy_score(y_test,y_pred)


print('relu = ', scores_clf_relu, ";  logistic = ", scores_clf_logistic, "; identity = ", scores_clf_adam )

relu =  0.9473684210526315 ;  logistic =  0.956140350877193 ; identity =  0.9517543859649122


In [76]:
from sklearn.model_selection import GridSearchCV
  
#mlp_gs = MLPClassifier(max_iter=1000)
parameter_space = {
    'hidden_layer_sizes': [(3,2) ,(3,) ,(10,)],
    'activation': ['identity', 'relu'],
    'alpha': [0.0001, 0.0005],
    'learning_rate': ['constant','adaptive'],
}

grid = GridSearchCV(clf_logistic, parameter_space, refit = True, cv=5, verbose=3)
grid.fit(X_train_std, y_train) # X is train samples and y is the corresponding labels

Fitting 5 folds for each of 24 candidates, totalling 120 fits
[CV 1/5] END activation=identity, alpha=0.0001, hidden_layer_sizes=(3, 2), learning_rate=constant;, score=0.968 total time=   0.0s
[CV 2/5] END activation=identity, alpha=0.0001, hidden_layer_sizes=(3, 2), learning_rate=constant;, score=0.967 total time=   0.0s
[CV 3/5] END activation=identity, alpha=0.0001, hidden_layer_sizes=(3, 2), learning_rate=constant;, score=0.984 total time=   0.0s
[CV 4/5] END activation=identity, alpha=0.0001, hidden_layer_sizes=(3, 2), learning_rate=constant;, score=0.984 total time=   0.0s
[CV 5/5] END activation=identity, alpha=0.0001, hidden_layer_sizes=(3, 2), learning_rate=constant;, score=0.967 total time=   0.0s
[CV 1/5] END activation=identity, alpha=0.0001, hidden_layer_sizes=(3, 2), learning_rate=adaptive;, score=0.968 total time=   0.0s
[CV 2/5] END activation=identity, alpha=0.0001, hidden_layer_sizes=(3, 2), learning_rate=adaptive;, score=0.967 total time=   0.0s
[CV 3/5] END activati

GridSearchCV(cv=5,
             estimator=MLPClassifier(activation='logistic', alpha=1e-05,
                                     hidden_layer_sizes=(3,), random_state=123,
                                     solver='lbfgs'),
             param_grid={'activation': ['identity', 'relu'],
                         'alpha': [0.0001, 0.0005],
                         'hidden_layer_sizes': [(3, 2), (3,), (10,)],
                         'learning_rate': ['constant', 'adaptive']},
             verbose=3)

In [78]:

# print best parameter after tuning
print(grid.best_params_)
  
# print how our model looks after hyper-parameter tuning
print(grid.best_estimator_)

means = grid.cv_results_['mean_test_score']
stds = grid.cv_results_['std_test_score']
for mean, std, params in zip(means, stds, grid.cv_results_['params']):
    print("%0.3f (+/-%0.03f) for %r" % (mean, std * 2, params))

{'activation': 'relu', 'alpha': 0.0001, 'hidden_layer_sizes': (3, 2), 'learning_rate': 'constant'}
MLPClassifier(hidden_layer_sizes=(3, 2), random_state=123, solver='lbfgs')
0.974 (+/-0.016) for {'activation': 'identity', 'alpha': 0.0001, 'hidden_layer_sizes': (3, 2), 'learning_rate': 'constant'}
0.974 (+/-0.016) for {'activation': 'identity', 'alpha': 0.0001, 'hidden_layer_sizes': (3, 2), 'learning_rate': 'adaptive'}
0.980 (+/-0.013) for {'activation': 'identity', 'alpha': 0.0001, 'hidden_layer_sizes': (3,), 'learning_rate': 'constant'}
0.980 (+/-0.013) for {'activation': 'identity', 'alpha': 0.0001, 'hidden_layer_sizes': (3,), 'learning_rate': 'adaptive'}
0.977 (+/-0.026) for {'activation': 'identity', 'alpha': 0.0001, 'hidden_layer_sizes': (10,), 'learning_rate': 'constant'}
0.977 (+/-0.026) for {'activation': 'identity', 'alpha': 0.0001, 'hidden_layer_sizes': (10,), 'learning_rate': 'adaptive'}
0.974 (+/-0.016) for {'activation': 'identity', 'alpha': 0.0005, 'hidden_layer_sizes': (

In [79]:
from sklearn.metrics import classification_report

grid_predictions = grid.predict(X_test_std)

scores=metrics.accuracy_score(y_test,grid_predictions)  
print('ACC = ', scores)
# print classification report
print(classification_report(y_test, grid_predictions))

ACC =  0.9517543859649122
              precision    recall  f1-score   support

           0       0.99      0.88      0.93        85
           1       0.93      0.99      0.96       143

    accuracy                           0.95       228
   macro avg       0.96      0.94      0.95       228
weighted avg       0.95      0.95      0.95       228

