<h1><center>Random Optimization: Neural Network</center></h1>

<h1><center>By Felicia Fryer</center></h1>

In [1]:
import pandas as pd
import pylab as pl
import numpy as np
import scipy.optimize as opt
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler, OneHotEncoder

%matplotlib inline 
import matplotlib.pyplot as plt
from sklearn.metrics import classification_report, confusion_matrix
import itertools
from sklearn.neural_network import MLPClassifier
from time import perf_counter
from sklearn.metrics import f1_score
from sklearn.metrics import jaccard_similarity_score
from sklearn.metrics import accuracy_score
import mlrose
from sklearn.model_selection import cross_val_score

<h2 id="load_dataset">Load the Cancer data</h2>
The example is based on a dataset that is publicly available from the UCI Machine Learning Repository (Asuncion and Newman, 2007)[http://mlearn.ics.uci.edu/MLRepository.html]. The dataset consists of several hundred human cell sample records, each of which contains the values of a set of cell characteristics. The fields in each record are:

|Field name|Description|
|--- |--- |
|ID|Clump thickness|
|Clump|Clump thickness|
|UnifSize|Uniformity of cell size|
|UnifShape|Uniformity of cell shape|
|MargAdh|Marginal adhesion|
|SingEpiSize|Single epithelial cell size|
|BareNuc|Bare nuclei|
|BlandChrom|Bland chromatin|
|NormNucl|Normal nucleoli|
|Mit|Mitoses|
|Class|Benign or malignant|

<br>
<br>

For the purposes of this example, we're using a dataset that has a relatively small number of predictors in each record. To download the data, we will use `!wget` to download it from IBM Object Storage.  
__Did you know?__ When it comes to Machine Learning, you will likely be working with large datasets. As a business, where can you host your data? IBM is offering a unique opportunity for businesses, with 10 Tb of IBM Cloud Object Storage: [Sign up now for free](http://cocl.us/ML0101EN-IBM-Offer-CC)

In [2]:
cancer = pd.read_csv("cancer.csv", delimiter=",")
cancer[0:5]

Unnamed: 0,ID,Clump,UnifSize,UnifShape,MargAdh,SingEpiSize,BareNuc,BlandChrom,NormNucl,Mit,Class
0,1000025,5,1,1,1,2,1,3,1,1,2
1,1002945,5,4,4,5,7,10,3,2,1,2
2,1015425,3,1,1,1,2,2,3,1,1,2
3,1016277,6,8,8,1,3,4,3,7,1,2
4,1017023,4,1,1,3,2,1,3,1,1,2


## Data pre-processing and selection

Lets first look at columns data types:

In [3]:
cancer.dtypes

ID              int64
Clump           int64
UnifSize        int64
UnifShape       int64
MargAdh         int64
SingEpiSize     int64
BareNuc        object
BlandChrom      int64
NormNucl        int64
Mit             int64
Class           int64
dtype: object

It looks like the __BareNuc__ column includes some values that are not numerical. We can drop those rows:

In [4]:
cancer = cancer[pd.to_numeric(cancer['BareNuc'], errors='coerce').notnull()]
cancer['BareNuc'] = cancer['BareNuc'].astype('int')
cancer.dtypes

ID             int64
Clump          int64
UnifSize       int64
UnifShape      int64
MargAdh        int64
SingEpiSize    int64
BareNuc        int64
BlandChrom     int64
NormNucl       int64
Mit            int64
Class          int64
dtype: object

In [5]:
cancer.shape

(683, 11)

In [6]:
feature_df = cancer[['Clump', 'UnifSize', 'UnifShape', 'MargAdh', 'SingEpiSize', 'BareNuc', 'BlandChrom', 'NormNucl', 'Mit']]
X = np.asarray(feature_df)
X[0:5]

array([[ 5,  1,  1,  1,  2,  1,  3,  1,  1],
       [ 5,  4,  4,  5,  7, 10,  3,  2,  1],
       [ 3,  1,  1,  1,  2,  2,  3,  1,  1],
       [ 6,  8,  8,  1,  3,  4,  3,  7,  1],
       [ 4,  1,  1,  3,  2,  1,  3,  1,  1]])

We want the model to predict the value of Class (that is, benign (=2) or malignant (=4)). As this field can have one of only two possible values, we need to change its measurement level to reflect this.

In [7]:
cancer['Class'] = cancer['Class'].astype('int')
y = np.asarray(cancer['Class'])
y [0:5]

array([2, 2, 2, 2, 2])

## Train/Test dataset

Okay, we split our dataset into train and test set:

In [8]:
np.random.seed(10)
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.30, random_state=4)
print ('Train set:', X_train.shape,  y_train.shape)
print ('Test set:', X_test.shape,  y_test.shape)

Train set: (478, 9) (478,)
Test set: (205, 9) (205,)


In [9]:
# Normalize feature data
scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test) # One hot encode target values
one_hot = OneHotEncoder()
y_train_hot = one_hot.fit_transform(y_train.reshape(-1, 1)).todense()
y_test_hot = one_hot.transform(y_test.reshape(-1, 1)).todense()

In case you used a LabelEncoder before this OneHotEncoder to convert the categories to integers, then you can now use the OneHotEncoder directly.


In [None]:
X_train_scaled[1]

<h2 id="modeling">Benchmark using MLPClassifier</h2>

Relu Activation Function

In [None]:
# 10-fold cross-validation with Neural networks - 2 Layers, with Relu activation
#clf = MLPClassifier(hidden_layer_sizes=(9,9),activation = 'relu', max_iter=1500, solver='lbfgs')
#scores = cross_val_score(clf, X, y, cv=10, scoring='accuracy')
#print(scores)
#print(np.std(scores))
# use average accuracy as an estimate of out-of-sample accuracy
#print(scores.mean())

In [10]:
#Train Model using MLPClassifier function

clf = MLPClassifier(hidden_layer_sizes=(9,9), activation = 'relu', max_iter=1000, solver = 'lbfgs' )
time_start = perf_counter()
clf.fit(X_train_scaled, y_train_hot)
fit_time = perf_counter() - time_start
print(f'Train: fit_time = {fit_time}')

#train_accuracy = accuracy_score(y_train, )
#print (f'train accuracy score = {train_accuracy2}')
#train_accuracy1 = accuracy_score(y_train, yhat_train)
#print (f'train accuracy score = {train_accuracy2}')


Train: fit_time = 0.1795177309999758


In [11]:
time_start = perf_counter()
yhat = clf.predict(X_test_scaled)
fit_time = perf_counter() - time_start
print (f'Test: fit_time = {fit_time}')

#Evaulation
jaccard = jaccard_similarity_score(y_test_hot, yhat)
print("jaccard index: ",jaccard)
print (classification_report(y_test_hot, yhat))


Test: fit_time = 0.0008121580000306494
jaccard index:  0.9512195121951219
              precision    recall  f1-score   support

           0       0.98      0.94      0.96       132
           1       0.90      0.97      0.93        73

   micro avg       0.95      0.95      0.95       205
   macro avg       0.94      0.96      0.95       205
weighted avg       0.95      0.95      0.95       205
 samples avg       0.95      0.95      0.95       205



Sigmoid Activation Function

In [12]:
#Train Model using MLPClassifier function - Sigmoid Funtion
clf_sig = MLPClassifier(hidden_layer_sizes=(9,9), activation = 'logistic', max_iter=1000, solver = 'lbfgs' )
time_start = perf_counter()
clf_sig.fit(X_train_scaled, y_train_hot)
fit_time = perf_counter() - time_start
print(f'Train: fit_time = {fit_time}')

#train_accuracy = accuracy_score(y_train, )
#print (f'train accuracy score = {train_accuracy2}')
#train_accuracy1 = accuracy_score(y_train, yhat_train)
#print (f'train accuracy score = {train_accuracy2}')


Train: fit_time = 0.3530422930000441


In [13]:
time_start = perf_counter()
yhat_sig = clf_sig.predict(X_test_scaled)
fit_time = perf_counter() - time_start
print (f'Test: fit_time = {fit_time}')

#Evaulation
jaccard_sig = jaccard_similarity_score(y_test_hot, yhat_sig)
print("jaccard index: ",jaccard_sig)
print (classification_report(y_test_hot, yhat_sig))


Test: fit_time = 0.0009876909999775307
jaccard index:  0.9512195121951219
              precision    recall  f1-score   support

           0       0.98      0.95      0.96       132
           1       0.91      0.96      0.93        73

   micro avg       0.95      0.95      0.95       205
   macro avg       0.94      0.95      0.95       205
weighted avg       0.95      0.95      0.95       205
 samples avg       0.95      0.95      0.95       205



<h2 id="modeling">Random Hill Climbing</h2>

Activation Function Relu

In [14]:
np.random.seed(7)
clf_hill = mlrose.NeuralNetwork(hidden_nodes = [2], activation = 'relu', 
                                algorithm = 'random_hill_climb', 
                                max_iters=1000, bias = True, is_classifier = True, 
                                learning_rate = 0.5, early_stopping = True, clip_max = 5, 
                                max_attempts = 100)
time_start = perf_counter()
clf_hill.fit(X_train_scaled, y_train_hot)
fit_time = perf_counter() - time_start
print(f'fit_time = {fit_time}')

fit_time = 0.7264363169999797


In [None]:
#Predict Labels for train set and assess accuracy
#time_start = perf_counter()
#yhat_hill = clf_hill.predict(X_train_scaled)
#fit_time = perf_counter() - time_start
#print (f'fit_time = {fit_time}')
#train_accuracy = accuracy_score(y_train, yhat_hill)


In [15]:
#Predict Labels for test set and assess accuracy
time_start = perf_counter()
yhat_hill_test = clf_hill.predict(X_test_scaled)
fit_time = perf_counter() - time_start
test_accuracy1 = accuracy_score(y_test_hot, yhat_hill_test)
f1 = f1_score(y_test_hot, yhat_hill_test, average='weighted') 
jaccard1 = jaccard_similarity_score(y_test_hot, yhat_hill_test)
print (f'fit_time = {fit_time}')
print (f'accuracy score = {test_accuracy1}')
print("f1 score: ", f1)
print("jaccard index: ",jaccard1)
print (classification_report(y_test_hot, yhat_hill_test))

fit_time = 0.0007461910000756689
accuracy score = 0.9463414634146341
f1 score:  0.9467119102396119
jaccard index:  0.9463414634146341
              precision    recall  f1-score   support

           0       0.98      0.94      0.96       132
           1       0.90      0.96      0.93        73

   micro avg       0.95      0.95      0.95       205
   macro avg       0.94      0.95      0.94       205
weighted avg       0.95      0.95      0.95       205
 samples avg       0.95      0.95      0.95       205



Activation Function Sigmoid

In [16]:
np.random.seed(7)
clf_hill_sig = mlrose.NeuralNetwork(hidden_nodes = [2], activation = 'sigmoid', 
                                algorithm = 'random_hill_climb', 
                                max_iters=1000, bias = True, is_classifier = True, 
                                learning_rate = 0.5, early_stopping = True, clip_max = 5, 
                                max_attempts = 100)
time_start = perf_counter()
clf_hill_sig.fit(X_train_scaled, y_train_hot)
fit_time = perf_counter() - time_start
print(f'fit_time = {fit_time}')

fit_time = 1.272307039999987


In [17]:
#Predict Labels for test set and assess accuracy
time_start = perf_counter()
yhat_hill_test_sig = clf_hill_sig.predict(X_test_scaled)
fit_time = perf_counter() - time_start
test_accuracy2 = accuracy_score(y_test_hot, yhat_hill_test_sig)
f2 = f1_score(y_test_hot, yhat_hill_test_sig, average='weighted') 
jaccard2 = jaccard_similarity_score(y_test_hot, yhat_hill_test_sig)
print (f'fit_time = {fit_time}')
print (f'accuracy score = {test_accuracy2}')
print("f1 score: ", f2)
print("jaccard index: ",jaccard2)
print (classification_report(y_test_hot, yhat_hill_test_sig))

fit_time = 0.0008163919999333302
accuracy score = 0.6439024390243903
f1 score:  0.504422088731273
jaccard index:  0.6439024390243903
              precision    recall  f1-score   support

           0       0.64      1.00      0.78       132
           1       0.00      0.00      0.00        73

   micro avg       0.64      0.64      0.64       205
   macro avg       0.32      0.50      0.39       205
weighted avg       0.41      0.64      0.50       205
 samples avg       0.64      0.64      0.64       205



  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)


<h2 id="modeling">Simulated Annealing</h2>

Activation Function Relu

In [18]:
np.random.seed(7)
clf_sim = mlrose.NeuralNetwork(hidden_nodes = [2], activation = 'relu', 
                                algorithm = 'simulated_annealing', 
                                max_iters=1000, bias = True, is_classifier = True, 
                                learning_rate = 0.5, early_stopping = True, clip_max = 5, 
                                max_attempts = 100)
time_start = perf_counter()
clf_sim.fit(X_train_scaled, y_train_hot)
fit_time = perf_counter() - time_start
print(f'fit_time = {fit_time}')

fit_time = 1.679088254999897


In [19]:
#Predict Labels for test set and assess accuracy
time_start = perf_counter()
yhat_sim_test = clf_sim.predict(X_test_scaled)
fit_time = perf_counter() - time_start
test_accuracy3 = accuracy_score(y_test_hot, yhat_sim_test)
f3 = f1_score(y_test_hot, yhat_sim_test, average='weighted') 
jaccard3 = jaccard_similarity_score(y_test_hot, yhat_sim_test)
print (f'fit_time = {fit_time}')
print (f'test accuracy score = {test_accuracy3}')
print("f1 score: ", f3)
print("jaccard index: ",jaccard3)
print (classification_report(y_test_hot, yhat_sim_test))

fit_time = 0.0007612040000140041
test accuracy score = 0.9365853658536586
f1 score:  0.9370231666468138
jaccard index:  0.9365853658536586
              precision    recall  f1-score   support

           0       0.97      0.93      0.95       132
           1       0.88      0.95      0.91        73

   micro avg       0.94      0.94      0.94       205
   macro avg       0.93      0.94      0.93       205
weighted avg       0.94      0.94      0.94       205
 samples avg       0.94      0.94      0.94       205



Activation Function Sigmoid

In [20]:
#np.random.seed(7)
clf_sim_sig = mlrose.NeuralNetwork(hidden_nodes = [2], activation = 'sigmoid', 
                                algorithm = 'simulated_annealing', 
                                max_iters=1000, bias = True, is_classifier = True, 
                                learning_rate = 0.5, early_stopping = True, clip_max = 5, 
                                max_attempts = 100)
time_start = perf_counter()
clf_sim_sig.fit(X_train_scaled, y_train_hot)
fit_time = perf_counter() - time_start
print(f'fit_time = {fit_time}')

fit_time = 1.7124829910000017


In [21]:
#Predict Labels for test set and assess accuracy
time_start = perf_counter()
yhat_sim_test_sig = clf_sim_sig.predict(X_test_scaled)
fit_time = perf_counter() - time_start
test_accuracy4 = accuracy_score(y_test_hot, yhat_sim_test_sig)
f4 = f1_score(y_test_hot, yhat_sim_test_sig, average='weighted') 
jaccard4 = jaccard_similarity_score(y_test_hot, yhat_sim_test_sig)
print (f'fit_time = {fit_time}')
print (f'test accuracy score = {test_accuracy4}')
print("f1 score: ", f4)
print("jaccard index: ",jaccard4)
print (classification_report(y_test_hot, yhat_sim_test_sig))

fit_time = 0.0005778360000476823
test accuracy score = 0.9365853658536586
f1 score:  0.9370231666468138
jaccard index:  0.9365853658536586
              precision    recall  f1-score   support

           0       0.97      0.93      0.95       132
           1       0.88      0.95      0.91        73

   micro avg       0.94      0.94      0.94       205
   macro avg       0.93      0.94      0.93       205
weighted avg       0.94      0.94      0.94       205
 samples avg       0.94      0.94      0.94       205



<h2 id="modeling">Genetic Algorithm</h2>

In [22]:
#np.random.seed(7)
clf_gen = mlrose.NeuralNetwork(hidden_nodes = [2], activation = 'relu', 
                                algorithm = 'genetic_alg', 
                                max_iters=1000, bias = True, is_classifier = True, 
                                learning_rate = 0.5, early_stopping = True, clip_max = 5, 
                                max_attempts = 100)
time_start = perf_counter()
clf_gen.fit(X_train_scaled, y_train_hot)
fit_time = perf_counter() - time_start
print(f'fit_time = {fit_time}')

fit_time = 44.642368153999996


In [23]:
#Predict Labels for test set and assess accuracy
time_start = perf_counter()
yhat_gen_test = clf_gen.predict(X_test_scaled)
fit_time = perf_counter() - time_start
test_accuracy5 = accuracy_score(y_test_hot, yhat_gen_test)
f5 = f1_score(y_test_hot, yhat_gen_test, average='weighted') 
jaccard5 = jaccard_similarity_score(y_test_hot, yhat_gen_test)
print (f'fit_time = {fit_time}')
print (f'accuracy score = {test_accuracy5}')
print("f1 score: ", f5)
print("jaccard index: ",jaccard5)

print (classification_report(y_test_hot, yhat_gen_test))

fit_time = 0.000678228999959174
accuracy score = 0.9512195121951219
f1 score:  0.9514946841776111
jaccard index:  0.9512195121951219
              precision    recall  f1-score   support

           0       0.98      0.95      0.96       132
           1       0.91      0.96      0.93        73

   micro avg       0.95      0.95      0.95       205
   macro avg       0.94      0.95      0.95       205
weighted avg       0.95      0.95      0.95       205
 samples avg       0.95      0.95      0.95       205



Sigmoid Activation Function

In [24]:
#np.random.seed(7)
clf_gen_sig = mlrose.NeuralNetwork(hidden_nodes = [2], activation = 'sigmoid', 
                                algorithm = 'genetic_alg', 
                                max_iters=1000, bias = True, is_classifier = True, 
                                learning_rate = 0.5, early_stopping = True, clip_max = 5, 
                                max_attempts = 100)
time_start = perf_counter()
clf_gen_sig.fit(X_train_scaled, y_train_hot)
fit_time = perf_counter() - time_start
print(f'fit_time = {fit_time}')

fit_time = 21.83741524800007


In [25]:
#Predict Labels for test set and assess accuracy
time_start = perf_counter()
yhat_gen_test_sig = clf_gen_sig.predict(X_test_scaled)
fit_time = perf_counter() - time_start
test_accuracy6 = accuracy_score(y_test_hot, yhat_gen_test_sig)
f6 = f1_score(y_test_hot, yhat_gen_test_sig, average='weighted') 
jaccard6 = jaccard_similarity_score(y_test_hot, yhat_gen_test_sig)
print (f'fit_time = {fit_time}')
print (f'accuracy score = {test_accuracy6}')
print("f1 score: ", f6)
print("jaccard index: ",jaccard6)

print (classification_report(y_test_hot, yhat_gen_test_sig))

fit_time = 0.0008670600000186823
accuracy score = 0.9463414634146341
f1 score:  0.9468381879973526
jaccard index:  0.9463414634146341
              precision    recall  f1-score   support

           0       0.98      0.93      0.96       132
           1       0.89      0.97      0.93        73

   micro avg       0.95      0.95      0.95       205
   macro avg       0.94      0.95      0.94       205
weighted avg       0.95      0.95      0.95       205
 samples avg       0.95      0.95      0.95       205

