# Chapter Four. Defining neural networks with Keras

In the final chapter, you'll use high-level APIs in TensorFlow to train a sign language letter classifier. You will use both the sequential and functional Keras APIs to train, validate, and evaluate models. You will also learn how to use the Estimators API to streamline the model definition and training process and to avoid errors.

> **Topics:**
- 1. Defining neural networks with Keras
    - 1.1 The sequential model in Keras
    - 1.2 Compiling a sequential model
    - 1.3 Defining a multiple input model
- 2. Training and validation with Keras
    - 2.1. Training with Keras
    - 2.2. Metrics and validation with Keras
    - 2.3 Overfitting detection
    - 2.4 Evaluating models
- 3. Training models with the Estimators API
    - 3.1. Preparing to train with Estimators
    - 3.2. Defining Estimators
- 4. Congratulations!

In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf

from tensorflow import keras, Variable, ones, matmul

filepath = '../_datasets/'

## 1. Defining Neural Networks with Keras

### Classifying sign language letters

![][01-sign_language_letters]

### The sequential API

![][02-sequential_API]

- Input layer
- Hidden layers
- Output layer
- Ordered in sequence

### Building a sequential model
```Python
# Import tensorflow
import tensorflow as tf

# Define a sequential model
model = tf.keras.Sequential()

# Define first hidden layer
model.add(keras.layers.Dense(16, activation='relu', input_shape=(28*28,)))

# Define second hidden layer
model.add(keras.layers.Dense(8, activation='relu'))

# Define output layer
model.add(keras.layers.Dense(4, activation='softmax'))

# Compile the model
model.compile('adam', loss='categorical_crossentropy')
```

### The functional API

![][03-functional_API]

### Using the functional API
```Python
# Import tensorflow
import tensorflow as tf

# Define model 1 input layer shape
model1_inputs = tf.keras.Input(shape=(28*28,))

# Define model 2 input layer shape
model2_inputs = tf.keras.Input(shape=(10,))

# Define layer 1 for model 1
model1_layer1 = tf.keras.layers.Dense(12, activation='relu')(model1_inputs)

# Define layer 2 for model 1
model1_layer2 = tf.keras.layers.Dense(4, activation='softmax')(model1_layer1)

# Define layer 1 for model 2
model2_layer1 = tf.keras.layers.Dense(8, activation='relu')(model2_inputs)

# Define layer 2 for model 2
model2_layer2 = tf.keras.layers.Dense(4, activation='softmax')(model2_layer1)

# Merge model 1 and model 2
merged = tf.keras.layers.add([model1_layer2, model2_layer2])

# Define a functional model
model = tf.keras.Model(inputs=[model1_inputs, model2_inputs], outputs=merged)

# Compile the model
model.compile('adam', loss='categorical_crossentropy')
```

[01-sign_language_letters]:_Docs/01-sign_language_letters.png
[02-sequential_API]:_Docs/02-sequential_API.png
[03-functional_API]:_Docs/03-functional_API.png

### 1.1 The sequential model in Keras
In chapter 3, we used components of the `keras` API in `tensorflow` to define a neural network, but we stopped short of using its full capabilities to streamline model definition and training. In this exercise, you will use the `keras` sequential model API to define a neural network that can be used to classify images of sign language letters. You will also use the `.summary()` method to print the model's architecture, including the shape and number of parameters associated with each layer.

Note that the images were reshaped from (28, 28) to (784,), so that they could be used as inputs to a dense layer. Additionally, note that `keras` has been imported from `tensorflow` for you.

In [2]:
# Define a Keras sequential model
model = keras.Sequential()

# Define the first dense layer
model.add(keras.layers.Dense(16, activation='relu', input_shape=(784,)))

# Define the second dense layer
model.add(keras.layers.Dense(8, activation='relu'))

# Define the output layer
model.add(keras.layers.Dense(4, activation='softmax'))

# Print the model architecture
print(model.summary())

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 16)                12560     
_________________________________________________________________
dense_1 (Dense)              (None, 8)                 136       
_________________________________________________________________
dense_2 (Dense)              (None, 4)                 36        
Total params: 12,732
Trainable params: 12,732
Non-trainable params: 0
_________________________________________________________________
None


Notice that we've defined a model, but we haven't compiled it. ***The compilation step in `keras` allows us to set the optimizer, loss function, and other useful training parameters in a single line of code***. Furthermore, the `.summary()` method allows us to view the model's architecture.

### 1.2 Compiling a sequential model
In this exercise, you will work towards classifying letters from the Sign Language MNIST dataset; however, you will adopt a different network architecture than what you used in the previous exercise. There will be fewer layers, but more nodes. Additionally, you will compile the model to use the `adam` optimizer and the `categorical_crossentropy` loss. You will also use a method in `keras` to summarize your model's architecture.

In [3]:
# Define a Keras sequential model
model = keras.Sequential()

# Define the first dense layer
model.add(keras.layers.Dense(16, activation='sigmoid', input_shape=(784,)))

# Define the output layer
model.add(keras.layers.Dense(4, activation='softmax'))

# Compile the model
model.compile('adam', loss='categorical_crossentropy')

# Print a model summary
print(model.summary())

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_3 (Dense)              (None, 16)                12560     
_________________________________________________________________
dense_4 (Dense)              (None, 4)                 68        
Total params: 12,628
Trainable params: 12,628
Non-trainable params: 0
_________________________________________________________________
None


### 1.3 Defining a multiple input model
In some cases, the **sequential API** will not be sufficiently flexible to accommodate your desired model architecture and you will need to use the **functional API** instead. ***If, for instance, you want to train two models with different architectures jointly, you will need to use the functional API to do this***. In this exercise, we will see how to do this. We will also use the `.summary()` method to examine the joint model's architecture.

Note that `keras` has been imported from `tensorflow` for you. Additionally, the input layers of the first and second models have been defined as `m1_inputs` and `m2_inputs`, respectively. Note that the two models have the same architecture, but one of them uses a `sigmoid` activation in the first layer and the other uses a `relu`.

In [4]:
m1_inputs = tf.keras.layers.Input(shape=(28*28,))
m2_inputs = tf.keras.layers.Input(shape=(28*28,))

print(m1_inputs)
print(m2_inputs)

Tensor("input_1:0", shape=(None, 784), dtype=float32)
Tensor("input_2:0", shape=(None, 784), dtype=float32)


In [5]:
# For model 1, pass the input layer to layer 1 and layer 1 to layer 2
m1_layer1 = keras.layers.Dense(12, activation='sigmoid')(m1_inputs)
m1_layer2 = keras.layers.Dense(4, activation='softmax')(m1_layer1)

# For model 2, pass the input layer to layer 1 and layer 1 to layer 2
m2_layer1 = keras.layers.Dense(12, activation='relu')(m2_inputs)
m2_layer2 = keras.layers.Dense(4, activation='softmax')(m2_layer1)

# Merge model outputs and define a functional model
merged = keras.layers.add([m1_layer2, m2_layer2])
model = keras.Model(inputs=[m1_inputs, m2_inputs], outputs=merged)

# Print a model summary
print(model.summary())

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 784)]        0                                            
__________________________________________________________________________________________________
input_2 (InputLayer)            [(None, 784)]        0                                            
__________________________________________________________________________________________________
dense_5 (Dense)                 (None, 12)           9420        input_1[0][0]                    
__________________________________________________________________________________________________
dense_7 (Dense)                 (None, 12)           9420        input_2[0][0]                    
______________________________________________________________________________________________

Notice that the `.summary()` method yields a new column: `connected to`. This column tells you how layers connect to each other within the network. We can see that `dense_2`, for instance, is connected to the `input_2` layer. We can also see that the `add` layer, which merged the two models, connected to both `dense_1` and `dense_3`.

## 2. Training and validation with Keras

### Overview of training and evaluation
1. Load and clean data
2. Define model
3. Train and validate model
4. Evaluate model

### How to train a model

```Python
# Import tensorflow
import tensorflow as tf

# Define a sequential model
model = tf.keras.Sequential()

# Define the hidden layer
model.add(tf.keras.layers.Dense(16, activation='relu', input_shape=(784,)))

# Define the output layer
model.add(tf.keras.layers.Dense(4, activation='softmax'))

# Compile model
model.compile('adam', loss='categorical_crossentropy')

# Train model
model.fit(image_features, image_labels)
```

### The fit() operation
- Required arguments
    - `features`
    - `labels`
- Many optional arguments
    - `batch_size`
    - `epochs`
    - `validation_split`

### Batch size and epochs

- The numbers of examples in each batch is the **batch size**.
- The number of times you train on the full set of batches is called **numbers of epochs**
- In the image the batch size is 5 and the number of epochs is 2.

![][04-Batches_epochs]

### Performing validation

- The `validation_split` parameter it divide the data in two parts. 
    - The first part is the train set
    - The second part is the validation set
- Defining `validation_split = 0.20` means 20% of the data will be for validation   

![][05-validation]

```Python
# Train model with validation split
model.fit(features, labels, epochs=10, validation_split=0.20)
```

- In the next image we can see the training loss and the evaluation loss separately.
- If the training loss becomes substantially lower than the evaluation loss, is a clear indication the model is **overfitting**. To avoid overfittig we could: 
    - Terminate the training process before that point or
    - add regularization or
    - dropout    

![][06-validation]

### Changing the metric
```Python
# Recomile the model with the accuracy metric
model.compile('adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train model with validation split
model.fit(features, labels, epochs=10, validation_split=0.20)
```

![][07-metric]

### The evaluation() operation

- It's always a good idea to split off a test set before you begin to train and validate, **this way you can check the performance on the test set and the end of the training process**.
- Since you may tune model parameters in response to validation set performance, **using a separate test set will provide you with further assurance that you haven't overfitted**.

![][08-evaluation]

```Python
# Evaluate the test set
model.evaluate(test)
```

[04-Batches_epochs]:_Docs/04-Batches_epochs.png
[05-validation]:_Docs/05-validation.png
[06-validation]:_Docs/06-validation.png
[07-metric]:_Docs/07-metric.png
[08-evaluation]:_Docs/08-evaluation.png