# Neural Networks with Keras

We must first decide what kind of model to apply to our data. For categorical data, we use a classifier model. 

In this example, we will use a classifier to build the following network:

![nnet.png](../Images/nnet.png)

## Defining and compiling our Model Architecture (the layers)

We first need to create a sequential model

In [1]:
from keras.models import Sequential

model = Sequential()

Using TensorFlow backend.


Next, we add our first layer. This layer requires you to specify both the number of inputs and the number of nodes that you want in the hidden layer.

In [2]:
from keras.layers import Dense
number_inputs = 3
number_hidden_nodes = 4
model.add(Dense(units=number_hidden_nodes,
                activation='relu', input_dim=number_inputs))

![first_layer](../Images/nnet_first_layer.png)

Our final layer is the output layer. Here, we need to specify the activation function (typically `softmax` for classification) and the number of classes (labels) that we are trying to predict (2 in this example).

In [3]:
number_classes = 2
model.add(Dense(units=number_classes, activation='softmax'))

![output_layer](../Images/nnet_output_layer.png)

In [4]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 4)                 16        
_________________________________________________________________
dense_2 (Dense)              (None, 2)                 10        
Total params: 26
Trainable params: 26
Non-trainable params: 0
_________________________________________________________________


Now that we have our model architecture defined, we must compile the model using a loss function and optimizer. We can also specify additional training metrics such as accuracy.

In [5]:
# Use categorical crossentropy for categorical data and accuracy for scoring.
# Hint: your output layer in this example is using softmax for logistic regression (categorical)
# If your output layer activation was `linear` then you may want to use `mse` for loss
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

## Making a scikit-learn-compatible Keras classifier

To take advantage of scikit-learn's pipelines, grid search, etc. Keras provides a wrapper for both classifiers and regressors.

In [6]:
# Define helper function to do the same steps as above
def build_classifier():
    classifier = Sequential()
    number_inputs = 3
    number_hidden_nodes = 4
    classifier.add(Dense(units=number_hidden_nodes, activation='relu', input_dim=number_inputs))
    number_classes = 2
    classifier.add(Dense(units=number_classes, activation='softmax'))
    classifier.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    return classifier

# Create a Keras model that's compatible with scikit-learn
from keras.callbacks import EarlyStopping
from keras.wrappers.scikit_learn import KerasClassifier
keras_classifier = KerasClassifier(build_classifier, epochs=300, shuffle=True, verbose=2, callbacks=[EarlyStopping(monitor='acc', patience=50, verbose=2)])

## Getting data

In [7]:
# Generate some fake data with 3 features

from sklearn.datasets import make_classification

X, y = make_classification(n_features=3, n_redundant=0, n_informative=3,
                           random_state=42, n_classes=2, n_clusters_per_class=1)

y = y.reshape(-1, 1)

print(X.shape)
print(y.shape)

(100, 3)
(100, 1)


## Splitting data into training and testing sets

Use train_test_split to create training and testing data

In [8]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)

## Preprocessing data

It is really important to scale our data before using multilayer perceptron models. Without scaling, it is often difficult for the training cycle to converge.

In [9]:
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
model = make_pipeline(StandardScaler(), keras_classifier)

Also one-hot encode the labels.

In [10]:
from keras.utils import to_categorical

# One-hot encoding
y_train_categorical = to_categorical(y_train)
y_test_categorical = to_categorical(y_test)
y_test_categorical[:10]

array([[0., 1.],
       [1., 0.],
       [1., 0.],
       [1., 0.],
       [0., 1.],
       [0., 1.],
       [1., 0.],
       [0., 1.],
       [1., 0.],
       [0., 1.]], dtype=float32)

## Training the Model
Finally, we train our model using our training data

Training consists of updating our weights using our optimizer and loss function. In this example, we choose 1000 iterations (loops) of training that are called epochs.

We also choose to shuffle our training data and increase the detail printed out during each training cycle.

In [11]:
# Fit (train) the model
model.fit(X_train, y_train_categorical)

Epoch 1/300
 - 5s - loss: 0.7941 - acc: 0.5467
Epoch 2/300
 - 0s - loss: 0.7870 - acc: 0.5600
Epoch 3/300
 - 0s - loss: 0.7807 - acc: 0.5600
Epoch 4/300
 - 0s - loss: 0.7744 - acc: 0.5733
Epoch 5/300
 - 0s - loss: 0.7668 - acc: 0.5733
Epoch 6/300
 - 0s - loss: 0.7599 - acc: 0.6000
Epoch 7/300
 - 0s - loss: 0.7529 - acc: 0.6000
Epoch 8/300
 - 0s - loss: 0.7456 - acc: 0.6000
Epoch 9/300
 - 0s - loss: 0.7376 - acc: 0.6000
Epoch 10/300
 - 0s - loss: 0.7308 - acc: 0.6000
Epoch 11/300
 - 0s - loss: 0.7227 - acc: 0.6000
Epoch 12/300
 - 0s - loss: 0.7154 - acc: 0.6000
Epoch 13/300
 - 0s - loss: 0.7088 - acc: 0.6000
Epoch 14/300
 - 0s - loss: 0.7006 - acc: 0.6000
Epoch 15/300
 - 0s - loss: 0.6930 - acc: 0.6000
Epoch 16/300
 - 0s - loss: 0.6863 - acc: 0.6133
Epoch 17/300
 - 0s - loss: 0.6797 - acc: 0.6133
Epoch 18/300
 - 0s - loss: 0.6722 - acc: 0.6133
Epoch 19/300
 - 0s - loss: 0.6655 - acc: 0.6000
Epoch 20/300
 - 0s - loss: 0.6588 - acc: 0.6000
Epoch 21/300
 - 0s - loss: 0.6526 - acc: 0.6000
E

Epoch 171/300
 - 0s - loss: 0.2434 - acc: 0.9467
Epoch 172/300
 - 0s - loss: 0.2421 - acc: 0.9467
Epoch 173/300
 - 0s - loss: 0.2409 - acc: 0.9467
Epoch 174/300
 - 0s - loss: 0.2396 - acc: 0.9467
Epoch 175/300
 - 0s - loss: 0.2383 - acc: 0.9467
Epoch 176/300
 - 0s - loss: 0.2370 - acc: 0.9467
Epoch 177/300
 - 0s - loss: 0.2358 - acc: 0.9467
Epoch 178/300
 - 0s - loss: 0.2345 - acc: 0.9467
Epoch 179/300
 - 0s - loss: 0.2333 - acc: 0.9467
Epoch 180/300
 - 0s - loss: 0.2321 - acc: 0.9467
Epoch 181/300
 - 0s - loss: 0.2309 - acc: 0.9467
Epoch 182/300
 - 0s - loss: 0.2297 - acc: 0.9467
Epoch 183/300
 - 0s - loss: 0.2286 - acc: 0.9467
Epoch 184/300
 - 0s - loss: 0.2273 - acc: 0.9467
Epoch 185/300
 - 0s - loss: 0.2262 - acc: 0.9467
Epoch 186/300
 - 0s - loss: 0.2250 - acc: 0.9467
Epoch 187/300
 - 0s - loss: 0.2238 - acc: 0.9467
Epoch 188/300
 - 0s - loss: 0.2227 - acc: 0.9467
Epoch 189/300
 - 0s - loss: 0.2216 - acc: 0.9467
Epoch 190/300
 - 0s - loss: 0.2204 - acc: 0.9467
Epoch 191/300
 - 0s 

Pipeline(memory=None,
     steps=[('standardscaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('kerasclassifier', <keras.wrappers.scikit_learn.KerasClassifier object at 0x000001B9E5B5BF98>)])

## Quantifying the Model
We use our testing data to validate our model. This is how we determine the validity of our model (i.e. the ability to predict new and previously unseen data points)

In [12]:
# Evaluate the model using the testing data
model_accuracy = model.score(X_test, y_test_categorical)

In [13]:
print(f"Accuracy: {model_accuracy}")

Accuracy: 0.9200000166893005


## Making Predictions with new data

We can use our trained model to make predictions using `model.predict`

In [14]:
import numpy as np
new_data = np.array([[0.2, 0.3, 0.4]])

In [15]:
print(f"Predicted class: {model.predict(new_data)}")

Predicted class: [1]
