# Classification of Mines and Rocks using Sonar Data #

## Preface ##

This is an experiment with Deep Learning, to develop my understanding of how to best use it to solve problems.

The dataset can be found [here](http://archive.ics.uci.edu/ml/datasets/Connectionist+Bench+(Sonar,+Mines+vs.+Rocks)).

In [3]:
import numpy as np
import pandas as pd

df = pd.read_csv("./data/sonar.csv", header=None)
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,51,52,53,54,55,56,57,58,59,60
0,0.02,0.0371,0.0428,0.0207,0.0954,0.0986,0.1539,0.1601,0.3109,0.2111,...,0.0027,0.0065,0.0159,0.0072,0.0167,0.018,0.0084,0.009,0.0032,R
1,0.0453,0.0523,0.0843,0.0689,0.1183,0.2583,0.2156,0.3481,0.3337,0.2872,...,0.0084,0.0089,0.0048,0.0094,0.0191,0.014,0.0049,0.0052,0.0044,R
2,0.0262,0.0582,0.1099,0.1083,0.0974,0.228,0.2431,0.3771,0.5598,0.6194,...,0.0232,0.0166,0.0095,0.018,0.0244,0.0316,0.0164,0.0095,0.0078,R
3,0.01,0.0171,0.0623,0.0205,0.0205,0.0368,0.1098,0.1276,0.0598,0.1264,...,0.0121,0.0036,0.015,0.0085,0.0073,0.005,0.0044,0.004,0.0117,R
4,0.0762,0.0666,0.0481,0.0394,0.059,0.0649,0.1209,0.2467,0.3564,0.4459,...,0.0031,0.0054,0.0105,0.011,0.0015,0.0072,0.0048,0.0107,0.0094,R


### Understanding the Data ###

Each row in the dataset represents one detected element (either a rock or mine).

The first 60 columns (0-59) are the energies "within a particular frequency band, integrated over a certain period of time."

#### Frequency what? ####

From what I understand, a frequency domain is the distribution of signal over a range of frequencies. It records the number of times a type of peak or variation occurs throughout the data. A frequency band is an interval within this domain. The values in the first 60 columns capture the energy of the signal within a band.


The last column is the classification: M for mine and R for rock.

In [4]:
df.describe()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,50,51,52,53,54,55,56,57,58,59
count,208.0,208.0,208.0,208.0,208.0,208.0,208.0,208.0,208.0,208.0,...,208.0,208.0,208.0,208.0,208.0,208.0,208.0,208.0,208.0,208.0
mean,0.029164,0.038437,0.043832,0.053892,0.075202,0.10457,0.121747,0.134799,0.178003,0.208259,...,0.016069,0.01342,0.010709,0.010941,0.00929,0.008222,0.00782,0.007949,0.007941,0.006507
std,0.022991,0.03296,0.038428,0.046528,0.055552,0.059105,0.061788,0.085152,0.118387,0.134416,...,0.012008,0.009634,0.00706,0.007301,0.007088,0.005736,0.005785,0.00647,0.006181,0.005031
min,0.0015,0.0006,0.0015,0.0058,0.0067,0.0102,0.0033,0.0055,0.0075,0.0113,...,0.0,0.0008,0.0005,0.001,0.0006,0.0004,0.0003,0.0003,0.0001,0.0006
25%,0.01335,0.01645,0.01895,0.024375,0.03805,0.067025,0.0809,0.080425,0.097025,0.111275,...,0.008425,0.007275,0.005075,0.005375,0.00415,0.0044,0.0037,0.0036,0.003675,0.0031
50%,0.0228,0.0308,0.0343,0.04405,0.0625,0.09215,0.10695,0.1121,0.15225,0.1824,...,0.0139,0.0114,0.00955,0.0093,0.0075,0.00685,0.00595,0.0058,0.0064,0.0053
75%,0.03555,0.04795,0.05795,0.0645,0.100275,0.134125,0.154,0.1696,0.233425,0.2687,...,0.020825,0.016725,0.0149,0.0145,0.0121,0.010575,0.010425,0.01035,0.010325,0.008525
max,0.1371,0.2339,0.3059,0.4264,0.401,0.3823,0.3729,0.459,0.6828,0.7106,...,0.1004,0.0709,0.039,0.0352,0.0447,0.0394,0.0355,0.044,0.0364,0.0439


It appears we only have 208 samples to work with, which is small.

Let's copy the target values and see how many of each there are.

In [5]:
y = df.values[:, 60]

df[60].value_counts()

M    111
R     97
Name: 60, dtype: int64

111 mines and 97 rocks.

Now, let's get the features.

In [6]:
X = df.values[:, 0:60].astype(float)

X.shape

(208, 60)

As our target values are nominal, we must first convert them into ordinal values (0 and 1).

In [7]:
from sklearn.preprocessing import LabelEncoder

encoder = LabelEncoder()
encoder.fit(y)
y = encoder.transform(y)

np.unique(y)

array([0, 1])

Let's begin composing our neural network.

Using Keras, our first network will take in the 60 sonar energy values as inputs to 60 corresponding ReLus.

The output neuron will have a sigmoid activation, due to the binary nature of the classification problem. Apparently, a loss function of binary cross-entropy is preferred for binary classification problems as well, and is therefore made use of.

In [8]:
from keras.models import Sequential
from keras.layers import Dense

def net1():
    model = Sequential([
        Dense(60, input_dim=60, init="normal", activation="relu"),
        Dense(1, init="normal", activation="sigmoid")
    ])
    
    model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])
    
    return model

In [9]:
from keras.wrappers.scikit_learn import KerasClassifier

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import StratifiedKFold, cross_val_score

clf = KerasClassifier(build_fn=net1, nb_epoch=100, batch_size=5, verbose=0)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=7)

res1 = cross_val_score(clf, X, y, cv=kfold)

In [10]:
print(res1)
print("Results: %.2f%% (%.2f%%)" % (res1.mean()*100, res1.std()*100))

[ 0.86363637  0.80952382  0.80952381  0.85714287  0.85714287  0.85714287
  0.76190477  0.90000001  0.80000001  0.65000001]
Results: 81.66% (6.72%)


Let's tune our data with some more feature engineering.

In [11]:
from sklearn.pipeline import Pipeline

clfs = []
clfs.append(("standardize", StandardScaler()))
clfs.append(("mlp", KerasClassifier(build_fn=net1, nb_epoch=100, batch_size=5, verbose=0)))
pipe = Pipeline(clfs)

res2 = cross_val_score(pipe, X, y, cv=kfold)

In [12]:
print(res2)
print("Standardized: %.2f%% (%.2f%%)" % (res2.mean()*100, res2.std()*100))

[ 0.81818183  0.85714287  0.80952382  0.90476191  0.95238096  0.90476191
  0.76190478  0.85000001  0.85000001  0.85000001]
Standardized: 85.59% (5.16%)


Let's experiment with the network topology. Seeing the effects of decreasing the model size and increasing it will enable us to better tune it.

In [13]:
def sm_net():
    model = Sequential([
        Dense(30, input_dim=60, init="normal", activation="relu"),
        Dense(1, init="normal", activation="sigmoid")
    ])
    
    model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])
    return model

clfs = []
clfs.append(("standardize", StandardScaler()))
clfs.append(("mlp", KerasClassifier(build_fn=sm_net, nb_epoch=100, batch_size=5, verbose=0)))
pipe = Pipeline(clfs)

res3 = cross_val_score(pipe, X, y, cv=kfold)

In [14]:
print(res3)
print("Smaller: %.2f%% (%.2f%%)" % (res3.mean()*100, res3.std()*100))

[ 0.86363637  0.90476191  0.80952382  0.85714287  0.85714287  0.95238096
  0.80952382  0.90000001  0.85000001  0.80000001]
Smaller: 86.04% (4.58%)


In [15]:
def lg_net():
    model = Sequential([
        Dense(60, input_dim=60, init="normal", activation="relu"),
        Dense(30, init="normal", activation="relu"),
        Dense(1, init="normal", activation="sigmoid")
    ])
    
    model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])
    return model

clfs = []
clfs.append(("standardize", StandardScaler()))
clfs.append(("mlp", KerasClassifier(build_fn=lg_net, nb_epoch=100, batch_size=5, verbose=0)))
pipe = Pipeline(clfs)

res4 = cross_val_score(pipe, X, y, cv=kfold)

In [16]:
print(res4)
print("Larger: %.2f%% (%.2f%%)" % (res4.mean()*100, res4.std()*100))

[ 0.90909091  0.85714287  0.76190478  0.85714287  0.85714287  0.85714287
  0.76190478  0.85000001  0.85000001  0.85000001]
Larger: 84.11% (4.29%)


Let's try Stochastic Gradient Descent without Nesterov Momentum as an optimizer.

We'll apply it to the deeper model we just constructed.

In [17]:
from keras.layers import Dropout
from keras.constraints import maxnorm
from keras.optimizers import SGD

def net2():
    model = Sequential([
        Dropout(0.2, input_shape=(60,)),
        Dense(60, init="normal", activation="relu", W_constraint=maxnorm(3)),
        Dense(30, init="normal", activation="relu", W_constraint=maxnorm(3)),
        Dense(1, init="normal", activation="sigmoid")
    ])
    
    sgd = SGD(lr=0.01, momentum=0.9, decay=0.0, nesterov=False)
    model.compile(loss="binary_crossentropy", optimizer=sgd, metrics=["accuracy"])
    return model

clfs = []
clfs.append(("standardize", StandardScaler()))
clfs.append(("mlp", KerasClassifier(build_fn=net2, nb_epoch=100, batch_size=5, verbose=0)))
pipe = Pipeline(clfs)

res5 = cross_val_score(pipe, X, y, cv=kfold)

In [18]:
print(res5)
print("SGD Accuracy: %.2f%% (%.2f%%)" % (res5.mean()*100, res5.std()*100))

[ 0.95454546  0.80952382  0.85714287  0.85714287  0.80952382  0.80952381
  0.76190478  0.8         0.80000001  0.8       ]
SGD Accuracy: 82.59% (5.04%)


Now let's try with Nesterov Momentum.

In [19]:
def net3():
    model = Sequential([
        Dropout(0.2, input_shape=(60,)),
        Dense(60, init="normal", activation="relu", W_constraint=maxnorm(3)),
        Dense(30, init="normal", activation="relu", W_constraint=maxnorm(3)),
        Dense(1, init="normal", activation="sigmoid")
    ])
    
    sgd = SGD(lr=0.01, momentum=0.9, decay=0.0, nesterov=True)
    model.compile(loss="binary_crossentropy", optimizer=sgd, metrics=["accuracy"])
    return model

clfs = []
clfs.append(("standardize", StandardScaler()))
clfs.append(("mlp", KerasClassifier(build_fn=net3, nb_epoch=100, batch_size=5, verbose=0)))
pipe = Pipeline(clfs)

res6 = cross_val_score(pipe, X, y, cv=kfold)

In [20]:
print(res6)
print("Nesterov Momentum Accuracy: %.2f%% (%.2f%%)" % (res6.mean()*100, res6.std()*100))

[ 0.90909091  0.85714287  0.85714287  0.85714287  0.85714287  0.90476191
  0.85714287  0.8         0.90000001  0.80000001]
Nesterov Momentum Accuracy: 86.00% (3.64%)


For the next model, we'll place our Dropout layers intermittently.

In [21]:
def net4():
    model = Sequential([
        Dense(60, input_dim=60, init="normal", activation="relu", W_constraint=maxnorm(3)),
        Dropout(0.2),
        Dense(30, init="normal", activation="relu", W_constraint=maxnorm(3)),
        Dropout(0.2),
        Dense(1, init="normal", activation="sigmoid")
    ])
    
    sgd = SGD(lr=0.01, momentum=0.9, decay=0.0, nesterov=True)
    model.compile(loss="binary_crossentropy", optimizer=sgd, metrics=["accuracy"])
    return model

clfs = []
clfs.append(("standardize", StandardScaler()))
clfs.append(("mlp", KerasClassifier(build_fn=net3, nb_epoch=100, batch_size=5, verbose=0)))
pipe = Pipeline(clfs)

res7 = cross_val_score(pipe, X, y, cv=kfold)

In [22]:
print(res7)
print("Multi-Dropout Accuracy: %.2f%% (%.2f%%)" % (res7.mean()*100, res7.std()*100))

[ 0.90909091  0.85714287  0.80952381  0.90476191  0.80952382  0.80952381
  0.71428572  0.8         0.90000001  0.85000001]
Multi-Dropout Accuracy: 83.64% (5.75%)


## Conclusions ##

It seems that our best model is `net3` with an accuracy of 86% and a variation of 3.64%. This is the best accuracy and lowest variation, which is preferred.

It seemed that Nesterov Momentum with a larger network is more effective than smaller networks that use SGD or Adam as optimizers.