## Building Complex Models Using the Functional API

Although, Sequential models are extremely common, it is sometimes useful to build neural networks with more complex topologies, or with multiple inputs or outputs. For this purpose, Keras offers the Functional API.

### Wide & Deep Neural Network

One example of nonsequential neural networks is a Wide and Deep neural network.It connects all or part of the inputs directly to the output layer. This architecture makes it possible for the neural network to learn  both deep patterns (using the deep path) and simple rules (through the short path). In contrast, a regular MLP forces all the data to flow through the full stack of layers, thus, simple patterns in the data may end up being distorted by this sequence of transformation.

Let's build such a neural network to tackle the California Housing problem:

In [1]:
import tensorflow as tf
from tensorflow import keras

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

In [2]:
#loading the data
housing = fetch_california_housing()
#splitting into train, validation and test sets
X_train_full, X_test, y_train_full, y_test = train_test_split(housing.data, housing.target)
X_train, X_valid, y_train, y_valid = train_test_split(X_train_full, y_train_full)
#scale all the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_valid = scaler.transform(X_valid)
X_test = scaler.transform(X_test)

In [5]:
input_ = keras.layers.Input(shape=X_train.shape[1:])
hidden1 = keras.layers.Dense(30, activation='relu')(input_)
hidden2 = keras.layers.Dense(10, activation='relu')(hidden1)
concat = keras.layers.Concatenate()([input_, hidden2])
output = keras.layers.Dense(1)(concat)
model = keras.Model(inputs = [input_], outputs=[output])

First we need to create an Input object. This is a specfication of the kind of input the model will get, including its shape and dtype. A model may actually have multiple inputs, as we will see shortly.

Next, we create a Dense layer with 30 neurons using the ReLu activation function. As soon as it is created, notice that we call it like a function, passing it the input. This is why it's called functional API. Note that we are just telling Keras how it should connect the layers together; no actual data is being processed yet.

We then create a second hidden layer, and again we use it as a funtion. Note that we pass it the output of the first hidden layer.

Next we create a Concatenate layer, and once again we immediately use it like a function, to concatenate the input and the output of the second hidden layer. You may prefer the keras.layers.concatenate() function which creates a Concatenate layer and immediately calls it with the given inputs.

Then we create the output layer, with a single neuron and no activation function and we call it like a function, passing it the result of the concatenation.

Lastly, we create a Keras Model, specifying which inputs and outputs to use.

Once you have built the Keras model, you must compile the model, train it, evaluate it, and use it to make predictions.

But what if you want to send a subset of the features through the wide path and a different subset (possibly overlapping) throuht the deep path. In this case, one solution is to use multiple inputs. For example, supoose we want to send five features through the wide path (features 0-4), and six features through the deep path(features2-7):

In [13]:
input_A = keras.layers.Input(shape=[5], name='wide-input')
input_B = keras.layers.Input(shape=[6],name='deep_input')
hidden1 = keras.layers.Dense(30, activation='relu')(input_B)
hidden2 = keras.layers.Dense(30, activation='relu')(hidden1)
concat = keras.layers.concatenate([input_A, hidden2])
output = keras.layers.Dense(1, name='output')(concat)
model = keras.Model(inputs=[input_A, input_B],outputs = [output])

The code is self explanatory. You should name at least the most important layers, especially when the model get a bit complex like this. Note that we specified inputs=[input_A, inpu_B] when creating the model. Now we can compile the model as usual, but when we call the fit() method, instead of passing a single input matrix X_train, we pass a pair of metrices (X_train_A, X_train_B): one per input. The same is true for X_valid, and also for X_test and X_new when you call evaluate() or predict():

(Alternatively you can pass a dictionary mapping the input names to the input values, like ('wide_input':X_train_A, 'deep_input': X_train_B) This is especially useful when there are many inputs, to avoid getting the order wrong.

In [14]:
model.compile(loss='mse', optimizer=keras.optimizers.SGD(lr=1e-3))

X_train_A, X_train_B = X_train[:, :5], X_train[:, 2:]
X_valid_A, X_valid_B = X_valid[:, :5], X_valid[:, 2:]
X_test_A, X_test_B = X_test[:, :5], X_test[:, 2:]
X_new_A, X_new_B = X_test_A[:3], X_test_B[:3]

In [15]:
history = model.fit((X_train_A, X_train_B), y_train, epochs=20, validation_data=((X_valid_A, X_valid_B), y_valid))

Train on 11610 samples, validate on 3870 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [16]:
mse_test = model.evaluate((X_test_A, X_test_B), y_test)
y_pred = model.predict((X_new_A, X_new_B))



In [24]:
print(y_train[:3], "\n\n")
print((y_pred))

[1.375 3.742 2.017] 


[[2.2951705]
 [1.8837408]
 [2.9152997]]


There are many cases which you may want to have multiple outputs:

--The task may demand it. For instance, you may want to locate and classify the main object in a picture. This is both a regression task(finding the coordinates of the object's center, as well as its width and height) and a classification task.

--Similarly, you may have multiple independent tasks based on the same data. Sure, you could train one neural network per task, but in many cases you will get better results on all tasks by training a single neural network with one output per task. This is because the neural network can learn features in the data that are useful across tasks. For example, you could perform multitask classification on pictures of faces, using one output to calssify person's facial expression (smiling, surprised etc.), and another output to identify whether they are wearing glasses or not.

--Another use case is as a regularization technique(i.e. a training constraints whose objective is to reduce overfitting and thus improve the model's ability to generalize). For example, you may want to add some auxiliary outputs in a neural network acrhitecture(see Figure 10-16)to ensure that the underlying part of the network learns something useful on its own, without relying on the rest of the network.

Adding extra output is quite easy: just connect them to appropriate layers and add them to your model's list of outputs.

In [27]:
input_A = keras.layers.Input(shape=[5], name='wide-input')
input_B = keras.layers.Input(shape=[6],name='deep_input')
hidden1 = keras.layers.Dense(30, activation='relu')(input_B)
hidden2 = keras.layers.Dense(30, activation='relu')(hidden1)
concat = keras.layers.concatenate([input_A, hidden2])
output = keras.layers.Dense(1, name='main_output')(concat)
aux_output = keras.layers.Dense(1, name='aux_output')(hidden2)
model = keras.Model(inputs = [input_A, input_B], outputs = [output, aux_output])

Each output will need it's own loss function. Therefore, when we compile the model, we should pass a list of losses.(if we pass a single loss Keras will assume that the same loss should be used for all outputs). By default, Keras will compute all these losses and simply add them up to get the final loss used for training. We care much more about the main output than about the auxiliary output(as it is just used for regularization), so we want to give the main output's loss a much greater weight. Fortunately, it is possible to set all the loss weights when compiling the model.

Alternatively you can pass a dictionary that maps each output name to the corresponding loss. Just like for the inputs, this is useful when there are multiple outputs, to avoid getting the order wrong. The loss weights and metrics(discussed shortly) can also be set using dictionaries.

In [28]:
model.compile(loss=['mse', 'mse'], loss_weights=[0.9, 0.1], optimizer='sgd')

Now when we train the model, we need to provide labels for each output. In this example, the main output and the auxiliary output should try to predict the same thing, so they should use the same labels. So instead of passing y_train we should pass (y_train, y_train) (and the same goes for y_valid and y_test)

In [29]:
history = model.fit([X_train_A, X_train_B], [y_train, y_train], epochs=20,
                    validation_data=([X_valid_A, X_valid_B], [y_valid, y_valid]))

Train on 11610 samples, validate on 3870 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


When we evaluate the model, Keras will return the total loss, as well as all the idividual losses:

In [31]:
total_loss, main_loss, aux_loss = model.evaluate([X_test_A, X_test_B], [y_test, y_test])



Similarly, the predict() method will return predictions for each output:

In [32]:
y_pred_main, y_pred_aux = model.predict([X_new_A, X_new_B])

As you can see you can build any type sort of architecture with Keras quite easily.