<a href="https://colab.research.google.com/github/marcelounb/Deep_Learning_with_python_JasonBrownlee/blob/master/11_4_Tuning_Layers_and_Neurons_Sonar_Object_Classi%EF%AC%81cation_Dataset.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tuning Layers and Neurons in The Model

There are many things to tune on a neural network, such as the weight initialization, activation functions, optimization procedure and so on. 

One aspect that may have an outsized e↵ect is the structure of the network itself called the network topology. In this section we take a look at two experiments on the structure of the network: making it smaller and making it larger. These are good experiments to perform when tuning a neural network on your problem


In [1]:
import numpy as np 
import pandas as pd 
from keras.models import Sequential 
from keras.layers import Dense 
from keras.wrappers.scikit_learn import KerasClassifier 
from sklearn.model_selection import cross_val_score 
from sklearn.preprocessing import LabelEncoder 
from sklearn.model_selection import StratifiedKFold 
from sklearn.preprocessing import StandardScaler 
from sklearn.pipeline import Pipeline

Using TensorFlow backend.


In [0]:
# load dataset 
dataframe = pd.read_csv("/content/sonar.csv", header=None) 
dataset = dataframe.values 
# split into input (X) and output (Y) variables 
X = dataset[:,0:60].astype(float) 
Y = dataset[:,60]


In [3]:
dataset[0]

array([0.02, 0.0371, 0.0428, 0.0207, 0.0954, 0.0986, 0.1539, 0.1601,
       0.3109, 0.2111, 0.1609, 0.1582, 0.2238, 0.0645, 0.066, 0.2273,
       0.31, 0.2999, 0.5078, 0.4797, 0.5783, 0.5071, 0.4328, 0.555,
       0.6711, 0.6415, 0.7104, 0.808, 0.6791, 0.3857, 0.1307, 0.2604,
       0.5121, 0.7547, 0.8537, 0.8507, 0.6692, 0.6097, 0.4943, 0.2744,
       0.051, 0.2834, 0.2825, 0.4256, 0.2641, 0.1386, 0.1051, 0.1343,
       0.0383, 0.0324, 0.0232, 0.0027, 0.0065, 0.0159, 0.0072, 0.0167,
       0.018, 0.0084, 0.009, 0.0032, 'R'], dtype=object)

In [0]:
# fix random seed for reproducibility 
seed = 7 
np.random.seed(seed)

In [5]:
X[0], Y[0]

(array([0.02  , 0.0371, 0.0428, 0.0207, 0.0954, 0.0986, 0.1539, 0.1601,
        0.3109, 0.2111, 0.1609, 0.1582, 0.2238, 0.0645, 0.066 , 0.2273,
        0.31  , 0.2999, 0.5078, 0.4797, 0.5783, 0.5071, 0.4328, 0.555 ,
        0.6711, 0.6415, 0.7104, 0.808 , 0.6791, 0.3857, 0.1307, 0.2604,
        0.5121, 0.7547, 0.8537, 0.8507, 0.6692, 0.6097, 0.4943, 0.2744,
        0.051 , 0.2834, 0.2825, 0.4256, 0.2641, 0.1386, 0.1051, 0.1343,
        0.0383, 0.0324, 0.0232, 0.0027, 0.0065, 0.0159, 0.0072, 0.0167,
        0.018 , 0.0084, 0.009 , 0.0032]), 'R')

The output variable is string values. We must convert them into integer values 0 and 1. We can do this using the LabelEncoder class from scikit-learn. This class will model the encoding required using the entire dataset via the fit() function, then apply the encoding to create a new output variable using the transform() function.


In [0]:
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)

In [7]:
encoded_Y

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

# Evaluate a Smaller Network

I suspect that there is a lot of redundancy in the input variables for this problem. The data describes the same signal from di↵erent angles. Perhaps some of those angles are more relevant than others. We can force a type of feature extraction by the network by restricting the representational space in the ﬁrst hidden layer. 

In this experiment we take our baseline model with 60 neurons in the hidden layer and reduce it **by half to 30.** This will put pressure on the network during training to pick out the most important structure in the input data to model. We will also standardize the data as in the previous experiment with data preparation and try to take advantage of the small lift in performance.

In [0]:
def create_smaller(): 
  # create model 
  model = Sequential() 
  model.add(Dense(30, input_dim=60, kernel_initializer= 'normal' , activation= 'relu' )) 
  model.add(Dense(1, kernel_initializer= 'normal' , activation= 'sigmoid' )) 
  # Compile model 
  model.compile(loss= 'binary_crossentropy' , optimizer= 'adam' , metrics=[ 'accuracy' ]) 
  return model 

# Evaluate a Larger Network 

A neural network topology with more layers o↵ers more opportunity for the network to extract key features and recombine them in useful nonlinear ways. 

We can evaluate whether adding more layers to the network improves the performance easily by making another small tweak to the function used to create our model. Here, we add one new layer (one line) to the network that introduces another hidden layer with 30 neurons after the ﬁrst hidden layer. 

The idea here is that the network is given the opportunity to model all input variables before being bottlenecked and forced to halve the representational capacity, much like we did in the experiment above with the smaller network. Instead of squeezing the representation of the inputs themselves, we have an additional hidden layer to aid in the process.


In [0]:
def create_bigger(): 
  # create model 
  model = Sequential() 
  model.add(Dense(60, input_dim=60, kernel_initializer= 'normal' , activation= 'relu' )) 
  model.add(Dense(30, kernel_initializer='normal', activation='relu'))
  model.add(Dense(1, kernel_initializer= 'normal' , activation= 'sigmoid' )) 
  # Compile model 
  model.compile(loss= 'binary_crossentropy' , optimizer= 'adam' , metrics=[ 'accuracy' ]) 
  return model 

# Evaluating the model
Now it is time to evaluate this model using stratiﬁed cross validation in the scikit-learn framework. We pass the number of training epochs to the KerasClassifier, again using reasonable default values. Verbose output is also turned o↵ given that the model will be created 10 times for the 10-fold cross validation being performed.

In [10]:
# evaluate model with standardized dataset 
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(build_fn=create_smaller, epochs=100, batch_size=5, verbose=0)))

pipeline = Pipeline(estimators)

kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed ) 
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold) 
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100)) 

Baseline: 84.12% (7.18%)


In [11]:
results

array([0.85714287, 0.95238096, 0.76190478, 0.80952382, 0.95238096,
       0.85714287, 0.80952382, 0.76190478, 0.89999998, 0.75      ])

In [12]:
# evaluate model with standardized dataset 
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(build_fn=create_bigger, epochs=100, batch_size=5, verbose=0)))

pipeline = Pipeline(estimators)

kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed ) 
results2 = cross_val_score(pipeline, X, encoded_Y, cv=kfold) 
print("Baseline: %.2f%% (%.2f%%)" % (results2.mean()*100, results2.std()*100)) 

Baseline: 84.12% (7.18%)


In [13]:
results2

array([0.90476191, 0.85714287, 0.76190478, 0.85714287, 0.95238096,
       0.80952382, 0.80952382, 0.90476191, 0.89999998, 0.69999999])