### Project of Binary Classification Sonar Project for the Navy - Mines vs. Rocks - Dropout Regularization

##### Step 1 - Description of the Dataset

This is a dataset that describes sonar chirp returns bouncing off different services. The 60 input variables are the strength of the returns at different angles. It is a binary classification problem that requires a model to differentiate rocks from metal cylinders.

##### Step 2 - Dropout Regularization For Neural Networks
Dropout is a technique where randomly selected neurons are ignored during training. They are “dropped-out” randomly. This means that their contribution to the activation of downstream neurons is temporally removed on the forward pass and any weight updates are not applied to the neuron on the backward pass

##### Step 3 - We will start off by importing all of the classes and functions we will need:

In [6]:
# Import Libararies

import numpy
import pandas
from pandas import read_csv
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.wrappers.scikit_learn import KerasClassifier
from keras.constraints import maxnorm
from keras.optimizers import SGD
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from keras.constraints import max_norm

##### Extended of Step 3 - Next, we can initialize the random number generator to ensure that we always get the same results when executing this code. This will help if we are debugging:

In [7]:
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)


##### Extended of Step 3 - Now we can load the dataset using pandas and split the columns into 60 input variables (X) and 1 output variable (Y). We use pandas to load the data because it easily handles strings (the output variable), whereas attempting to load the data directly using NumPy would be more difficult.

In [8]:
# load Sonar dataset using Pandas
dataframe = pandas.read_csv("sonar.csv", header=None)
dataset = dataframe.values

X = dataset[:,0:60].astype(float)
Y = dataset[:,60]

In [9]:
# Shape of Data

dataset.shape

(208, 61)

##### Extended of Step 3 - We can do this using the LabelEncoder class from scikit-learn. This class will model the encoding required using the entire dataset via the fit() function, then apply the encoding to create a new output variable using the transform() function or use both functions in one line using fit_transform() function

In [10]:
# Encoded output 'M' & 'R' into '1', '0'
encoder = LabelEncoder()
encoded_Y = encoder.fit_transform(Y)
encoded_Y

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0], dtype=int64)

##### Extended of Step 3 - Defining the function that creates our baseline model

In [11]:
# baseline model
def create_baseline():
    # create model, write code below
    model = Sequential()
    model.add(Dense(60, input_dim=60, kernel_initializer='normal', activation='relu'))
    model.add(Dense(30, kernel_initializer='normal', activation='relu'))
    model.add(Dense(1, kernel_initializer='normal', activation='sigmoid'))

          
    # Compile model, write code below
    sgd = SGD(lr=0.01, momentum=0.8, decay=0.0, nesterov=False)
    model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])

    return model


In [12]:
numpy.random.seed(seed)
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(build_fn=create_baseline, epochs=300, batch_size=16, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Baseline: 82.14% (7.30%)


##### Step 4 - Using Dropout on the visible layer

In [33]:
# create model
def create_model():
    # create model, write code below
    model = Sequential()
    model.add(Dropout(0.2, input_shape=(60,)))
    model.add(Dense(60, input_dim=60, kernel_constraint=max_norm(3), kernel_initializer='normal', activation='relu'))
    model.add(Dense(30, kernel_constraint=max_norm(3), kernel_initializer='normal', activation='relu'))
    model.add(Dense(1, kernel_initializer='normal', activation='sigmoid'))

          
    # Compile model, write code below
    sgd = SGD(lr=0.1, momentum=0.9, decay=0.0, nesterov=False)
    model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])

    return model


In [34]:
numpy.random.seed(seed)
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(build_fn=create_model, epochs=300, batch_size=16, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Visible: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Visible: 84.66% (5.82%)


##### Step 5 - Trying to Improve Performance generate more epochs

In [29]:
numpy.random.seed(seed)
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(build_fn=create_model, epochs=500, batch_size=16, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Improved Model : %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Improved Model : 85.61% (6.34%)


##### Step 6 -  Using Dropout on hidden layers

In [35]:
# Dropout model
def dropout_model():
    # create model, write code below
    model = Sequential()
    model.add(Dense(60, input_dim=60, kernel_constraint=max_norm(3), kernel_initializer='normal', activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(30, kernel_constraint=max_norm(3), kernel_initializer='normal', activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(1, kernel_initializer='normal', activation='sigmoid'))

          
    # Compile model, write code below
    sgd = SGD(lr=0.1, momentum=0.9, decay=0.0, nesterov=False)
    model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])

    return model

In [36]:
numpy.random.seed(seed)
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(build_fn=dropout_model, epochs=300, batch_size=16, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Hidden: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Hidden: 83.16% (5.35%)


##### Step 7 - Trying to Improve Performance generate more epochs

In [38]:
numpy.random.seed(seed)
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(build_fn=dropout_model, epochs=350, batch_size=16, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Improved Model : %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Improved Model : 85.52% (7.23%)


##### Step 8.1 - Try Different Dropout values

In [39]:
# Dropout model with different dropout value
def dropout_model_With_Diff_Value():
    # create model, write code below
    model = Sequential()
    model.add(Dense(60, input_dim=60, kernel_constraint=max_norm(3), kernel_initializer='normal', activation='relu'))
    model.add(Dropout(0.3))
    model.add(Dense(30, kernel_constraint=max_norm(3), kernel_initializer='normal', activation='relu'))
    model.add(Dropout(0.3))
    model.add(Dense(1, kernel_initializer='normal', activation='sigmoid'))

          
    # Compile model, write code below
    sgd = SGD(lr=0.1, momentum=0.9, decay=0.0, nesterov=False)
    model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])

    return model

In [41]:
numpy.random.seed(seed)
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(build_fn=dropout_model_With_Diff_Value, epochs=350, batch_size=16, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Dropout Model With 30 percent : %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Dropout Model With 30 percent : 84.11% (5.38%)


##### Step 8.2 - Try Using Large network

In [42]:
# Larger model
def dropout_model_larger():
    # create model, write code below
    model = Sequential()
    model.add(Dense(60, input_dim=60, kernel_constraint=max_norm(3), kernel_initializer='normal', activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(60, kernel_constraint=max_norm(3), kernel_initializer='normal', activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(30, kernel_constraint=max_norm(3), kernel_initializer='normal', activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(1, kernel_initializer='normal', activation='sigmoid'))

          
    # Compile model, write code below
    sgd = SGD(lr=0.1, momentum=0.9, decay=0.0, nesterov=False)
    model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])

    return model

In [43]:
numpy.random.seed(seed)
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(build_fn=dropout_model_With_Diff_Value, epochs=350, batch_size=16, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Dropout Model Larger network : %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Dropout Model Larger network : 86.45% (8.15%)


##### Step 8.3: Try using Dropout on both visible and hidden units

In [11]:
# Dropout model with visible and hidden
def dropout_model_visible_and_hidden():
    # create model, write code below
    model = Sequential()
    model.add(Dropout(0.2, input_shape=(60,)))
    model.add(Dense(60, input_dim=60, kernel_constraint=max_norm(3), kernel_initializer='normal', activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(30, kernel_constraint=max_norm(3), kernel_initializer='normal', activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(1, kernel_initializer='normal', activation='sigmoid'))

          
    # Compile model, write code below
    sgd = SGD(lr=0.1, momentum=0.9, decay=0.0, nesterov=False)
    model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])

    return model

In [12]:
numpy.random.seed(seed)
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(build_fn=dropout_model_visible_and_hidden, epochs=500, batch_size=16, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Dropout Model visible and hidden : %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Dropout Model visible and hidden : 85.52% (5.49%)


##### Step 8.4 - Try using large learning rate with decay and larger momentum

In [13]:
# Larger learning rate and larger Momentum
def larger_rt_and_larger_momentum():
    # create model, write code below
    model = Sequential()
    model.add(Dense(60, input_dim=60, kernel_constraint=max_norm(3), kernel_initializer='normal', activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(30, kernel_constraint=max_norm(3), kernel_initializer='normal', activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(1, kernel_initializer='normal', activation='sigmoid'))

          
    # Compile model, write code below
    sgd = SGD(lr=0.10, momentum=0.99, decay=0.0, nesterov=False)
    model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])

    return model

In [14]:
numpy.random.seed(seed)
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(build_fn=larger_rt_and_larger_momentum, epochs=350, batch_size=16, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Dropout Model Larger network : %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Dropout Model Larger network : 50.43% (3.58%)


##### Step 8.5: Try constraining the size of the network weights

In [15]:
# Dropout model
def larger_rt_and_larger_momentum_cons_size():
    # create model, write code below
    model = Sequential()
    model.add(Dense(60, input_dim=60, kernel_constraint=max_norm(5), kernel_initializer='normal', activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(30, kernel_constraint=max_norm(5), kernel_initializer='normal', activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(1, kernel_initializer='normal', activation='sigmoid'))

          
    # Compile model, write code below
    sgd = SGD(lr=0.10, momentum=0.99, decay=0.0, nesterov=False)
    model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])

    return model

In [16]:
numpy.random.seed(seed)
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(build_fn=larger_rt_and_larger_momentum_cons_size, epochs=350, batch_size=16, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Dropout Model Larger network : %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Dropout Model Larger network : 51.00% (3.46%)
