In this class, we will use a tutorial prepared by Jason Browlee.

The dataset we will use in this tutorial is the Sonar dataset. This is a dataset that describes sonar chirp returns bouncing off different surfaces. The 60 input variables are the strength of the returns at different angles. It is a binary classification problem that requires a model to differentiate rocks from metal cylinders. The output variable is a string M for mine and R for rock, which will need to be converted to integers 1 and 0. For more info about the dataset, check https://archive.ics.uci.edu/ml/machine-learning-databases/undocumented/connectionist-bench/sonar/sonar.all-data

### 1. Develop a Baseline Model

In [1]:
from pandas import read_csv
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
# load dataset
dataframe = read_csv("2-SonarData.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]


Using TensorFlow backend.


In [5]:
# encode class values as integers
encoder=LabelEncoder()
encoder.fit(Y)
encoded_Y=encoder.transform(Y)
# develop a baseline model

# evaluate model with stratified kfold and cross_val_score


### 2. Improve Performance With Data Preparation

In [3]:
# Binary Classification with Sonar Dataset: Standardized

from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline


# evaluate baseline model with standardized dataset.
# Create a pipeline


Standardized: 85.02 (6.00)


### 3. Create a smaller Network
Due to some possible redundancies in the dataset, we can force a type of feature extraction by the network by restricting the representational space in the first hidden layer. In this experiment we take our baseline model with 60 neurons in the hidden layer and reduce it by half to 30.                                                      

In [5]:
# smaller model
                                                  

Smaller: 87.99 (7.96)


### 4. Evaluate a Larger Network
A neural network topology with more layers offers more opportunity for the network to extract key features and recombine them in useful nonlinear ways. We can evaluate whether adding more layers to the network improves the performance easily by making another small tweak to the function used to create our model. Here, we add one new layer (one line) to the network that introduces another hidden layer with 30 neurons after the first hidden layer.

In [6]:
# larger model


Larger:  86.97 (7.25)


### 5. Dropout Regularization
Dropout is easily implemented by randomly selecting nodes to be dropped-out with a given probability (e.g. 20%) each weight update cycle. This is how Dropout is implemented in Keras. Dropout is only used during the training of a model and is not used when evaluating the skill of the model.

In [8]:
from keras.optimizers import SGD

 # create baseline, use SGD optimizer with lr=0.01
    



Baseline:  85.61 (3.58)


#### 5.1 Using Dropout on the Visible Layer
Dropout can be applied to input neurons called the visible layer. In the example below we add a new Dropout layer between the input (or visible layer) and the first hidden layer. The dropout rate is set to 20%, meaning one in five inputs will be randomly excluded from each update cycle.
Additionally, as recommended in the original paper on dropout, a constraint is imposed on the weights for each hidden layer, ensuring that the maximum norm of the weights does not exceed a value of 3. This is done by setting the kernel constraint argument on the Dense class when constructing the layers.

In [10]:
from keras.layers import Dropout
from keras.constraints import maxnorm
# dropout in the input layer with weight constraint


Visible: 87.59 (5.21)


#### 5.2 Using Dropout on Hidden Layers
Dropout can be applied to hidden neurons in the body of your network model. In the example below dropout is applied between the two hidden layers and between the last hidden layer and the output layer. Again a dropout rate of 20% is used as is a weight constraint on those layers.