# Project: Binary Classiﬁcation Of Sonar Returns
Discover how to eﬀectively use the Keras library in your machine learning project by working through a binary classiﬁcation project step-by-step.
https://archive.ics.uci.edu/ml/machine-learning-databases/undocumented/connectionist-bench/sonar/sonar.all-data

The Sonar dataset describes **sonar chirp** returns bouncing off different surfaces. The **60 input** variables are the strength of the returns at different angles. It is a binary classiﬁcation problem that requires a model to differentiate rocks from metal cylinders. You can learn more about this dataset on the UCI Machine Learning repository. 

It is a well understood dataset. All of the variables are continuous and generally in the range of 0 to 1. The output variable is a string **M for metal** and **R for rock**, which will need to be converted to integers 1 and 0. The dataset contains **208 observations**. 

In [2]:
import numpy as np
import pandas as pd

from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier

from sklearn.model_selection import cross_val_score
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

np.random.seed(47)

Using TensorFlow backend.


In [3]:
df = pd.read_csv('sonar.csv', header=None)
df.head(2)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,51,52,53,54,55,56,57,58,59,60
0,0.02,0.0371,0.0428,0.0207,0.0954,0.0986,0.1539,0.1601,0.3109,0.2111,...,0.0027,0.0065,0.0159,0.0072,0.0167,0.018,0.0084,0.009,0.0032,R
1,0.0453,0.0523,0.0843,0.0689,0.1183,0.2583,0.2156,0.3481,0.3337,0.2872,...,0.0084,0.0089,0.0048,0.0094,0.0191,0.014,0.0049,0.0052,0.0044,R


In [4]:
data = df.values

X = data[:, 0:60].astype(float)
y = data[:, -1]

encoder = LabelEncoder().fit(y)
encoded_y = encoder.transform(y)

### Define and compile a baseline model.
We are going to use scikit-learn to evaluate the model using **stratified k-fold** cross-validation. This is a resampling technique that will provide an estimate of the performance of the model. To use Keras models with scikit-learn, we must use the **KerasClassifier()** wrapper. This class takes a function that creates and returns our neural network model. It also takes arguments that it will pass along to the call to **fit()** such as the number of **epochs** and the **batch_size**. 

Let’s start off by defining the function that creates our baseline model. Our model will have a single fully connected hidden layer with the same number of neurons as **input** variables. This is a good default starting point when creating neural networks on a new problem. 
- The weights are initialized using a **small Gaussian random number**. 
- The **Rectifier** activation function is used. 
- The output layer contains a single neuron in order to make predictions. It uses the **sigmoid** activation function in order to produce a probability output in the range of 0 to 1 that can easily and automatically be converted to crisp class values. 
- Finally, we are using the logarithmic loss function (binary crossentropy) during training, the preferred loss function for binary classification problems. The model also uses the efficient Adam optimization algorithm for gradient descent and accuracy metrics will be collected when the model is trained.

In [5]:
# (60 inputs -> 60 hidden -> 1 output)

def create_baseline():
    model = Sequential()
    model.add(Dense(60, input_dim=60, activation='relu', kernel_initializer='normal'))
    model.add(Dense(1, activation='sigmoid', kernel_initializer='normal'))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return(model)

### Evaluate model with standardized dataset
We pass the number of training epochs to the KerasClassifier, again using reasonable default values. Verbose output is also turned off given that the model will be created 10 times for the 10-fold cross-validation being performed. The output showing the mean and standard deviation of the estimated accuracy of the model on **unseen data**. 

In [6]:
estimator = KerasClassifier(build_fn=create_baseline, epochs=100, batch_size=5, verbose=0)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=47)

results = cross_val_score(estimator, X, encoded_y, cv=kfold)
print('accuracy(baseline) -> mean:%.2f (+/-%.2f)' %(results.mean(), results.std()))

accuracy(baseline) -> mean:0.82 (+/-0.10)


### Improve Performance With Data Preparation
It is a good practice to prepare your data before modeling. Neural network models are especially suitable to **having consistent input values, both in scale and distribution**. An effective data preparation scheme for tabular data when building neural network models is **standardization**. This is where the data is rescaled such that **the mean value for each attribute is 0 and the standard deviation is 1**. This preserves **Gaussian-like distributions** whilst **normalizing the central tendencies** for each attribute. 

We can use scikit-learn to perform the **standardization** of our Sonar dataset using the **`StandardScaler()`** class.
- Rather than performing the standardization on the entire dataset, it is good practice to **train the standardization procedure on the training data** within the pass of a cross-validation run and to use the trained standardization instance to prepare the unseen test fold. 

This makes standardization a step in model preparation in the cross-validation process and it prevents the algorithm having knowledge of unseen data during evaluation, knowledge that might be passed from the data preparation scheme like a crisper distribution. We can achieve this in scikit-learn using a **`Pipeline()`** class. The pipeline is a **wrapper that executes one or more models within a pass of the cross-validation procedure**. Here, we can deﬁne a pipeline with the StandardScaler followed by our neural network model.

In [7]:
# evaluate baseline model with standardized dataset

estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('MLP', KerasClassifier(build_fn=create_baseline, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=47)

results = cross_val_score(pipeline, X, encoded_y, cv=kfold)
print('accuracy(baseline) -> mean:%.2f (+/-%.2f)' %(results.mean(), results.std()))

accuracy(baseline) -> mean:0.86 (+/-0.08)


### Improve Performance With Tuning Topology 
There are many things to tune on a neural network, such as the **weight(kernel?) initialization**, **activation functions**, **optimization procedure** and so on. One aspect that may have an outsized effect is the structure of the network itself (network topology). 

From (60 inputs -> 60 hidden -> 1 output)

In this section we take a look at two experiments on the **structure of the network**: making it smaller and making it larger. These are good experiments to perform when tuning a neural network on your problem.
>a) Evaluate a Narrower Network (60 inputs -> 30 hidden -> 1 output)
>- I suspect that there is a lot of redundancy in the input variables for this problem. The data describes the same signal from different angles. Perhaps some of those angles are more relevant than others. We can force a type of feature extraction by the network **by restricting the representational space** in the ﬁrst hidden layer. 
>- In this experiment, **we take our baseline model with 60 neurons in the hidden layer and reduce it by half to 30**. This will put pressure on the network during training **to pick out the most important structure in the input data** to model. We will also standardize the data (as in the previous experiment with data preparation) and try to take advantage of the **small lift** in performance.

>b) Evaluate a Deeper Network (60 inputs -> 60 hidden -> 30 hidden -> 1 output)
>- Next, A neural network topology with **more layers** offers more opportunity for the network to **extract key features and recombine them in useful nonlinear ways**. We can evaluate whether adding more layers to the network improves the performance easily by making another small tweak to the function used to create our model. Here, we add one new layer to the network that introduces another hidden layer with 30 neurons after the ﬁrst hidden layer. 
>- The idea here is that the network is given the opportunity to model all input variables before being bottlenecked and forced to halve the representational capacity, much like we did in the experiment above with the smaller network. **Instead of squeezing the representation of the inputs themselves, we have an additional hidden layer to aid in the process**.

In [8]:
# Binary Classification with Sonar Dataset: Standardized Narrower (60 inputs -> 30 hidden -> 1 output)

def create_baseline_small():
    model = Sequential()
    model.add(Dense(30, input_dim=60, activation='relu', kernel_initializer='normal'))
    model.add(Dense(1, activation='sigmoid', kernel_initializer='normal'))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return(model)

estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('MLP', KerasClassifier(build_fn=create_baseline_small, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=47)

results = cross_val_score(pipeline, X, encoded_y, cv=kfold)
print('accuracy(baseline) -> mean:%.2f (+/-%.2f)' %(results.mean(), results.std()))

accuracy(baseline) -> mean:0.87 (+/-0.10)


we have a very slight boost in the mean estimated accuracy and an important reduction in the standard deviation (average spread) of the accuracy scores for the model. This is a great result because we are doing slightly better with a network half the size, which in turn takes half the time to train.

In [10]:
# Binary Classification with Sonar Dataset: Standardized Deeper (60 inputs -> [60 -> 30] hidden -> 1 output) 

def create_baseline_large():
    model = Sequential()
    model.add(Dense(60, input_dim=60, activation='relu', kernel_initializer='normal'))
    model.add(Dense(30, activation='relu', kernel_initializer='normal'))
    model.add(Dense(1, activation='sigmoid', kernel_initializer='normal'))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return(model)

estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('MLP', KerasClassifier(build_fn=create_baseline_large, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=47)

results = cross_val_score(pipeline, X, encoded_y, cv=kfold)
print('accuracy(baseline) -> mean:%.2f (+/-%.2f)' %(results.mean(), results.std()))

accuracy(baseline) -> mean:0.85 (+/-0.09)


We do not get a lift in the model performance. This may be statistical noise or **a sign that further training is needed**. With further tuning of aspects like the **optimization algorithm** and the number of training **epochs**, it is expected that further improvements are possible. What is the best score that you can achieve on this dataset?