# Binary Classification Of Sonar

# Baseline Neural Network Model Performance


In [None]:
Now we can load the dataset using Pandas and split the columns into 60 input variables (X)
and 1 output variable (Y ). We use Pandas to load the data because it easily handles strings
(the output variable), whereas attempting to load the data directly using NumPy would be
more difficult.

# load dataset
dataframe = pandas.read_csv("sonar.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]


    
The output variable is string values. We must convert them into integer values 0 and 1. We
can do this using the LabelEncoder class from scikit-learn. This class will model the encoding
required using the entire dataset via the fit() function, then apply the encoding to create a
new output variable using the transform() function.

# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)

We are now ready to create our neural network model using Keras. We are going to use
scikit-learn to evaluate the model using stratified k-fold cross validation. This is a resampling
technique that will provide an estimate of the performance of the model. To use Keras models
with scikit-learn, we must use the KerasClassifier wrapper. This class takes a function that
creates and returns our neural network model. It also takes arguments that it will pass along to
the call to fit() such as the number of epochs and the batch size. Let’s start o↵ by defining the
function that creates our baseline model. Our model will have a single fully connected hidden
layer with the same number of neurons as input variables. This is a good default starting point
when creating neural networks on a new problem.
The weights are initialized using a small Gaussian random number. The Rectifier activation
function is used. The output layer contains a single neuron in order to make predictions. It
uses the sigmoid activation function in order to produce a probability output in the range of
0 to 1 that can easily and automatically be converted to crisp class values. Finally, we are
using the logarithmic loss function (binary crossentropy) during training, the preferred loss
function for binary classification problems. The model also uses the efficient Adam optimization
algorithm for gradient descent and accuracy metrics will be collected when the model is trained.


Now it is time to evaluate this model using stratified cross validation in the scikit-learn
framework. We pass the number of training epochs to the KerasClassifier, again using
reasonable default values. Verbose output is also turned o↵ given that the model will be created
10 times for the 10-fold cross validation being performed.



In [18]:
import warnings
warnings.filterwarnings('ignore', '.*do not.*', )
warnings.warn('DelftStack')
warnings.warn('Do not show this message')

# Binary Classification with Sonar Dataset: Baseline
import numpy
import pandas
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load dataset
dataframe = pandas.read_csv("sonar.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# baseline model
def create_baseline():
    # create model
    model = Sequential()
    model.add(Dense(60, input_dim=60, kernel_initializer='random_normal', activation= "relu" ))
    model.add(Dense(1,kernel_initializer='random_normal', activation= "sigmoid"))
    # Compile model
    model.compile(loss= "binary_crossentropy" , optimizer= "adam" , metrics=["accuracy"])
    return model
# evaluate model with standardized dataset
estimator = KerasClassifier(build_fn=create_baseline, nb_epoch=100, batch_size=5, verbose=0)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(estimator, X, encoded_Y, cv=kfold)
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

  estimator = KerasClassifier(build_fn=create_baseline, nb_epoch=100, batch_size=5, verbose=0)


Baseline: 53.88% (2.55%)


# Improve Performance With Data Preparation

In [30]:
# Binary Classification with Sonar Dataset: Standardized
import numpy
import pandas
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load dataset
dataframe = pandas.read_csv("sonar.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# baseline model
def create_baseline():
    # create model
    model = Sequential()
    model.add(Dense(60, input_dim=60, kernel_initializer='random_normal', activation= "relu"))
    model.add(Dense(1, kernel_initializer='random_normal', activation= "sigmoid" ))
    # Compile model
    model.compile(loss= "binary_crossentropy" , optimizer= "adam", metrics=["accuracy" ])
    return model
# evaluate baseline model with standardized dataset
numpy.random.seed(seed)
estimators = []
estimators.append(("standardize" , StandardScaler()))
estimators.append(("mlp" , KerasClassifier(build_fn=create_baseline, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Standardized: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

  estimators.append(("mlp" , KerasClassifier(build_fn=create_baseline, epochs=100,


Standardized: 87.02% (6.95%)


In [None]:
https://github.com/adriangb/scikeras/issues/112

In [36]:
# Binary Classification with Sonar Dataset: Standardized Larger and increased epochs
import numpy
import pandas
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load dataset
dataframe = pandas.read_csv("sonar.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# larger model
def create_larger():
    # create model
    model = Sequential()
    model.add(Dense(60, input_dim=60, kernel_initializer='random_normal', activation= "relu"))
    model.add(Dense(30, kernel_initializer='random_normal', activation= "relu"))
    model.add(Dense(1, kernel_initializer='random_normal', activation= "sigmoid" ))
    # Compile model
    model.compile(loss= "binary_crossentropy" , optimizer= "adam", metrics=["accuracy" ])
    return model
numpy.random.seed(seed)
estimators = []
estimators.append(("standardize" , StandardScaler()))
estimators.append(("mlp" , KerasClassifier(build_fn=create_baseline, epochs=400,
batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Larger: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

  estimators.append(("mlp" , KerasClassifier(build_fn=create_baseline, epochs=400,


Larger: 85.07% (7.75%)


In [16]:
import keras
print(keras.__version__)

2.7.0


In [17]:
import tensorflow as tf
print(tf.__version__)

2.7.0
