# Chapter7 : Advanced deep-learning best practices

This chapter explores a number of powerful tools that will bring you closer to being able to develop state-of-the-art models on difficult problems. Using the Keras functional API, you can build graph-like models, share a layer across different inputs, and use Keras models just like Python functions. We'll also discuss several other best practices including batch normalization, residual connections, hyperparameter optimization, and model ensembling.

## 7.1 Going beyond the Sequential model: the Keras functional API

Until now, all neural networks introduced in this book have been implemented using the `Sequential` model. The `Sequential` model makes the assumption that the network has exactly one input and exactly one output, and that it consists of a linear stack of layers. However, this set of assumptions is too inflexible in a number of cases.

Some tasks, for instance, require *multimodal* inputs: they merge data coming from different input sources, processing each type of data using different kinds of neural layers. A naive approach would be to train separate models and then do a weighted average of their predictions. But this may be suboptimal, because the information extracted by the models my be redundant. A better way is to *jointly* learn a more accurate model of the data by using a model that can see all available input modalities simultaneously: a model with three input branches.
<img src="image/fig72.PNG"  width="500">

Similarly, some tasks need to predict multiple target attributes of input data. Of course, you could train two separte models, but you could build a better model by learning to jointly predict targets at the same time.
<img src="image/fig73.PNG" width="300">

Additionally, many recently developed neural architectures require nonlinear network topology: networks structured as directed acyclic graphs. The Inception family of networks, for instance, relies on *Inception modules*, where the input is processed by several parallel convolutional branches whose outputs are then merged back into a single tensor.
<img src='image/fig74.PNG' width='450'/>

There's also the recent trend of adding *residual connections* to a model. A residual connection consists of reinjecting previous representations into the downstream flow of data by adding a past output tensor to a later output tensor, which helps prevent information loss along the data-processing flow.
<img src='image/fig75.PNG' width='400'>

### 7.1.1 Introduction to the functional API

In the functional API, you directly manipulate tensors, and you use layers as *functions* that take tensors and return tensors.

In [2]:
from keras import Input, layers

In [0]:
input_tensor = Input(shape=(32,))             # a tensor
dense = layers.Dense(32, activation='relu')   # a layer is a function
output_tensor = dense(input_tensor)           # returns a tensor

The following is a minimal example that shows side by side a simple `Sequential` model and its equivalent in the functional API

In [3]:
from keras.models import Sequential, Model
from keras import layers
from keras import Input

# Sequential model (which you already know about!)
seq_model = Sequential()
seq_model.add(layers.Dense(32, activation='relu', input_shape=(64,)))
seq_model.add(layers.Dense(32, activation='relu'))
seq_model.add(layers.Dense(10, activation='softmax'))

# Its functional equivalent
input_tensor = Input(shape=(64,))
x = layers.Dense(32, activation='relu')(input_tensor)
x = layers.Dense(32, activation='relu')(x)
output_tensor = layers.Dense(10, activation='softmax')(x)

# the `Model` class turns an input tensor and output tensor into a model
model = Model(input_tensor, output_tensor)

model.summary()

Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_2 (InputLayer)         (None, 64)                0         
_________________________________________________________________
dense_5 (Dense)              (None, 32)                2080      
_________________________________________________________________
dense_6 (Dense)              (None, 32)                1056      
_________________________________________________________________
dense_7 (Dense)              (None, 10)                330       
Total params: 3,466
Trainable params: 3,466
Non-trainable params: 0
_________________________________________________________________


When it comes to compiling, training, or evaluating such an instance of `Model`, the API is the same as that of `Sequential`.

In [4]:
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy')

import numpy as np
x_train = np.random.random((1000, 64))
y_train = np.random.random((1000, 10))

model.fit(x_train, y_train,
          epochs=10, batch_size=128,
          validation_split=0.2,
          verbose=2)
score = model.evaluate(x_train, y_train)

Train on 800 samples, validate on 200 samples
Epoch 1/10
 - 2s - loss: 11.8087 - val_loss: 12.2828
Epoch 2/10
 - 0s - loss: 12.1982 - val_loss: 13.2863
Epoch 3/10
 - 0s - loss: 13.4401 - val_loss: 15.0153
Epoch 4/10
 - 0s - loss: 15.3179 - val_loss: 17.1669
Epoch 5/10
 - 0s - loss: 17.4612 - val_loss: 19.9794
Epoch 6/10
 - 0s - loss: 20.4400 - val_loss: 23.1770
Epoch 7/10
 - 0s - loss: 23.8272 - val_loss: 27.2266
Epoch 8/10
 - 0s - loss: 28.0020 - val_loss: 31.9996
Epoch 9/10
 - 0s - loss: 32.9151 - val_loss: 37.0635
Epoch 10/10
 - 0s - loss: 38.0761 - val_loss: 43.1848


### 7.1.2 Multi-input models

The functional API can be used to build models that have multiple inputs. Typically, such models at some point merge their different input branches using a layer that can combine several tensors: by adding them, concatenating them, and so on.

Let's look at a very simple example of a multi-input model: a **question-answering model**. A typical question-answering model has two inputs: a natural-language question and a text snippet (such as a news article) providing information to be used for answering the question.

<img src='image/fig76.PNG' width='500'>

Following is the example of how you can build such a model with the functional API. You set up two independent branches, encoding the text input and the question input as representation vectors; then, concatenate these vectors; and finally, add a softmax classifier on top of the concatenated representations.

#### Functional API implementation of a two-input question-answering model

In [0]:
from keras.models import Model
from keras import layers
from keras import Input

In [0]:
text_vocabulary_size = 10000
question_vocabulary_size = 10000
answer_vocabulary_size = 500

# text input is a variable length sequence of integers
text_input = Input(shape=(None,), dtype='int32', name='text')
embedded_text = layers.Embedding(text_vocabulary_size, 64)(text_input)
encoded_text = layers.LSTM(32)(embedded_text)

# same process for the question
question_input = Input(shape=(None,), dtype='int32', name='question')
embedded_question = layers.Embedding(question_vocabulary_size, 32)(question_input)
encoded_question = layers.LSTM(16)(embedded_question)

# concatenates the encoded question and encoded text
concatenated = layers.concatenate([encoded_text, encoded_question], axis=-1)

# adds a softmax classifier on top
answer = layers.Dense(answer_vocabulary_size, activation='softmax')(concatenated)

model = Model([text_input, question_input], answer)
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['acc'])

In [28]:
model.summary()

Model: "model_6"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
text (InputLayer)               (None, None)         0                                            
__________________________________________________________________________________________________
question (InputLayer)           (None, None)         0                                            
__________________________________________________________________________________________________
embedding_4 (Embedding)         (None, None, 64)     640000      text[0][0]                       
__________________________________________________________________________________________________
embedding_5 (Embedding)         (None, None, 32)     320000      question[0][0]                   
____________________________________________________________________________________________

#### Feeding data to a multi-input model

There are two possible APIs:
  + feed the model a list of Numpy arrays as inputs
  + feed it a dictionary that maps input names to Numpy arrays
    - (available only if you give names to your inputs)

In [0]:
import numpy as np
import keras
num_samples = 1000
max_length = 100

text = np.random.randint(1, text_vocabulary_size, 
                         size=(num_samples, max_length))
question = np.random.randint(1, question_vocabulary_size, 
                             size=(num_samples, max_length))
answers = np.random.randint(answer_vocabulary_size, size=(num_samples))
answers = keras.utils.to_categorical(answers, answer_vocabulary_size)

In [30]:
# fitting using a list of inputs
model.fit([text, question], answers, epochs=10, batch_size=128)

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.callbacks.History at 0x7f368f02c860>

In [31]:
# fitting using a dictionary of inputs (only if inputs are named)
model.fit({'text': text, 'question': question}, answers, 
          epochs=10, batch_size=128)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.callbacks.History at 0x7f368ed88860>

### 7.1.3 Multi-output models

In the same way, you can use the functional API to build models with multiple outputs (or multiple *heads*).

A simple example is a network that attempts to simultaneously predict different properties of the data, such as a network that tries to predict attributes of a single person, such as age, gender, and income level.

#### Functional API implementation of a three-output model

In [0]:
from keras import layers
from keras import Input
from keras.models import Model

<img src='image/fig77.PNG' width='400'>

In [33]:
vocabulary_size = 50000
num_income_groups = 10

posts_input = Input(shape=(None,), dtype='int32', name='posts')
embedded_posts = layers.Embedding(vocabulary_size, 256)(posts_input)

x = layers.Conv1D(128, 5, activation='relu', padding='same')(embedded_posts)
x = layers.MaxPooling1D(5)(x)
x = layers.Conv1D(256, 5, activation='relu', padding='same')(x)
x = layers.Conv1D(256, 5, activation='relu', padding='same')(x)
x = layers.MaxPooling1D(5)(x)
x = layers.Conv1D(256, 5, activation='relu', padding='same')(x)
x = layers.Conv1D(256, 5, activation='relu', padding='same')(x)
x = layers.GlobalMaxPooling1D()(x)
x = layers.Dense(128, activation='relu')(x)

age_prediction = layers.Dense(1, name='age')(x)
income_prediction = layers.Dense(num_income_groups,
                                 activation='softmax',
                                 name='income')(x)
gender_prediction = layers.Dense(1, activation='sigmoid', name='gender')(x)

model = Model(posts_input, [age_prediction, income_prediction, gender_prediction])
model.summary()

Model: "model_7"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
posts (InputLayer)              (None, None)         0                                            
__________________________________________________________________________________________________
embedding_6 (Embedding)         (None, None, 256)    12800000    posts[0][0]                      
__________________________________________________________________________________________________
conv1d_6 (Conv1D)               (None, None, 128)    163968      embedding_6[0][0]                
__________________________________________________________________________________________________
max_pooling1d_3 (MaxPooling1D)  (None, None, 128)    0           conv1d_6[0][0]                   
____________________________________________________________________________________________

#### Compilation options of a multi-output model: multiple losses

In Keras, you can use either a list or a dictionary of losses in `compile` to specify different objects for different outputs; the resulting loss values are summed into a global loss, which is minimized during training.

In [0]:
model.compile(optimizer='rmsprop',
              loss=['mse', 'categorical_crossentropy', 'binary_crossentropy'])

In [0]:
# equivalent to the above
model.compile(optimizer='rmsprop',
              loss={'age': 'mse',
                    'income': 'categorical_crossentropy',
                    'gender': 'binary_crossentropy'})

#### Compilation options of a multi-output model: loss weighting

Note that very imbalanced loss contributions will cause the model representations to be optimized preferentially for the task with the largest individual loss, at the expense of the other tasks. To remedy this, you can **assign different levels of importance to the loss values** in their contribution to the final loss.

In [0]:
model.compile(optimizer='rmsprop',
              loss=['mse', 'categorical_crossentropy', 'binary_crossentropy'],
              loss_weights=[0.25, 1., 10.])

In [0]:
# equivalent to the above
model.compile(optimizer='rmsprop',
              loss={'age': 'mse',
                    'income': 'categorical_crossentropy',
                    'gender': 'binary_crossentropy'},
              loss_weights={'age': 0.25,
                            'income': 1.,
                            'gender': 10.})

#### Feeding data to a multi-output model

In [0]:
num_samples = 1000 
max_length = 100 

posts = np.random.randint(1, vocabulary_size, size=(num_samples, max_length))
age_targets = np.random.randint(0, 100, size=(num_samples,1))
income_targets = np.random.randint(1, num_income_groups, size=(num_samples,1))
income_targets = keras.utils.to_categorical(income_targets,num_income_groups)
gender_targets = np.random.randint(0, 2, size=(num_samples,1))

In [39]:
model.fit(posts, [age_targets, income_targets, gender_targets],
          epochs=10, batch_size=64)

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.callbacks.History at 0x7f368f02c630>

In [40]:
# equivalent to the above
model.fit(posts, {'age': age_targets,
                  'income': income_targets,
                  'gender': gender_targets},
          epochs=10, batch_size=64)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.callbacks.History at 0x7f368e7d80b8>

### 7.1.4 Directed acyclic graphs of layers

With the functional API, not only can you build models with multiple inputs and multiple-outputs, but you can also implement networks with a complex internal topology. Neural networks in Keras are allowed to be arbitrary *directed acyclic graphs* of layers.

To better understand how the functional API can be used to build graphs of layers, let's take a look at how you can implement both of them in Keras.

#### INCEPTION MODULES

The most basic form of an Inception module has three to four branches starting with a 1 x 1 convolution, followed by a 3 x 3 convolution, and ending with the concatenation of the resulting features. This setup helps the network separately learn spatial features and channel-wise features, which is more efficient than learning them jointly.

<img src='image/fig78.PNG' width='550'>

In [0]:
from keras import layers
from keras import Input
from keras.models import Model, Sequential

x = Input(shape=(28, 28, 1), dtype='float32', name='images')

branch_a = layers.Conv2D(128, 1, padding='same', activation='relu', strides=2)(x)

# In this branch, the striding occurs in the spatial convolution layer
branch_b = layers.Conv2D(128, 1, padding='same', activation='relu')(x)
branch_b = layers.Conv2D(128, 3, padding='same', activation='relu', strides=2)(branch_b)

# In this branch, the striding occurs in the average pooling layer
branch_c = layers.AveragePooling2D(3, padding='same', strides=2)(x)
branch_c = layers.Conv2D(128, 3, padding='same', activation='relu')(branch_c)

branch_d = layers.Conv2D(128, 1, padding='same', activation='relu')(x)
branch_d = layers.Conv2D(128, 3, padding='same', activation='relu')(branch_d)
branch_d = layers.Conv2D(128, 3, padding='same', activation='relu', strides=2)(branch_d)

# Concatenates the branch outputs to obtain the module output
output = layers.concatenate([branch_a, branch_b, branch_c, branch_d], axis=-1)

In [0]:
# Adding a classifier on top of the convnet
output = layers.Flatten()(output)
output = layers.Dense(512, activation='relu')(output)
predictions = layers.Dense(10, activation='softmax')(output)

model = keras.models.Model(inputs=x, outputs=predictions)

##### Train the Inception model using the Dataset API and the MNIST data

+ code source from [`ClaudeCoulombe` GitHub](https://github.com/ClaudeCoulombe/deep-learning-with-python-notebooks)

In [0]:
import numpy as np
import os
import tempfile

import keras
from keras import backend as K
from keras import layers
from keras.datasets import mnist

import tensorflow as tf

# Clean up the TF session.
K.clear_session()

if K.backend() != 'tensorflow':
    raise RuntimeError('This example can only run with the TensorFlow backend,'
                       ' because it requires the Dataset API, which is not'
                       ' supported on other platforms.')

batch_size = 128
buffer_size = 10000
steps_per_epoch = int(np.ceil(60000 / float(batch_size)))  # = 469
epochs = 5
num_classes = 10

def cnn_layers(x):
    
    # This example assumes the existence of a 4D input tensor x:
    # This returns a typical image tensor like those of MNIST dataset 
    print("x.shape:",x.shape)

    # Every branch has the same stride value (2), which is necessary to 
    # keep all branch outputs the same size so you can concatenate them
    branch_a = layers.Conv2D(128, 1, padding='same', activation='relu', strides=2)(x)

    # In this branch, the striding occurs in the spatial convolution layer.
    branch_b = layers.Conv2D(128, 1, padding='same', activation='relu')(x)
    branch_b = layers.Conv2D(128, 3, padding='same', activation='relu', strides=2)(branch_b)

    # In this branch, the striding occurs in the average pooling layer.
    branch_c = layers.AveragePooling2D(3,  padding='same', strides=2)(x)
    branch_c = layers.Conv2D(128, 3, padding='same', activation='relu')(branch_c)

    branch_d = layers.Conv2D(128, 1, padding='same', activation='relu')(x) 
    branch_d = layers.Conv2D(128, 3, padding='same', activation='relu')(branch_d)
    branch_d = layers.Conv2D(128, 3, padding='same', activation='relu', strides=2)(branch_d)

    # Concatenates the branch outputs to obtain the module output
    output = layers.concatenate([branch_a, branch_b, branch_c, branch_d], axis=-1)

    # Adding a classifier on top of the convnet
    output = layers.Flatten()(output)
    output = layers.Dense(512, activation='relu')(output)
    predictions = layers.Dense(num_classes, activation='softmax')(output)
    
    return predictions

In [60]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.astype(np.float32) / 255
x_train = np.expand_dims(x_train, -1)
y_train = tf.one_hot(y_train, num_classes)

# Create the dataset and its associated one-shot iterator.
dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
dataset = dataset.repeat()
dataset = dataset.shuffle(buffer_size)
dataset = dataset.batch(batch_size)
iterator = tf.compat.v1.data.make_one_shot_iterator(dataset)

# Model creation using tensors from the get_next() graph node.
inputs, targets = iterator.get_next()

print("inputs.shape:",inputs.shape)
print("targets.shape:",targets.shape)

inputs.shape: (128, 28, 28, 1)
targets.shape: (128, 10)


In [61]:
model_input = layers.Input(tensor=inputs)
model_output = cnn_layers(model_input)

model = keras.models.Model(inputs=model_input, outputs=model_output)

model.compile(optimizer=keras.optimizers.RMSprop(lr=2e-3, decay=1e-5),
              loss='categorical_crossentropy',
              metrics=['accuracy'],
              target_tensors=[targets])

x.shape: (128, 28, 28, 1)


In [62]:
model.summary()

Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            (128, 28, 28, 1)     0                                            
__________________________________________________________________________________________________
conv2d_5 (Conv2D)               (128, 28, 28, 128)   256         input_1[0][0]                    
__________________________________________________________________________________________________
conv2d_2 (Conv2D)               (128, 28, 28, 128)   256         input_1[0][0]                    
__________________________________________________________________________________________________
average_pooling2d_1 (AveragePoo (128, 14, 14, 1)     0           input_1[0][0]                    
____________________________________________________________________________________________

In [63]:
model.fit(epochs=epochs,
          steps_per_epoch=steps_per_epoch)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.callbacks.History at 0x7f368e187c50>

#### RESIDUAL CONNECTIONS

*Residual connections* are a common graph-like network component found n many post-2015 network architectures, including Xception. They tackle two common problems that plague any large-scale deep-learning model: **vanishing gradients** and **representational bottlenecks**. In general, adding residual connections to any model that has more than 10 layers is likely to be beneficial.

##### Implementing a residual connection in Keras when the feature-map sizes are the same, using identity residual connections

In [64]:
from keras import layers 
from keras.layers import Input

# This example assumes the existence of a 4D input tensor x:
# This returns a typical image tensor like those of MNIST dataset 
x = Input(shape=(28, 28, 1), dtype='float32', name='images')

# Applies a transformation to x
y = layers.Conv2D(128, 3, activation='relu', padding='same')(x)
y = layers.Conv2D(128, 3, activation='relu', padding='same')(y)
y = layers.Conv2D(128, 3, activation='relu', padding='same')(y)

# Adds the original x back to the output features
output = layers.add([y, x])

# Adding a classifier on top of the convnet
output = layers.Flatten()(output)
output = layers.Dense(512, activation='relu')(output)
predictions = layers.Dense(10, activation='softmax')(output)

model = keras.models.Model(inputs=x, outputs=predictions)
model.summary()

Model: "model_2"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
images (InputLayer)             (None, 28, 28, 1)    0                                            
__________________________________________________________________________________________________
conv2d_8 (Conv2D)               (None, 28, 28, 128)  1280        images[0][0]                     
__________________________________________________________________________________________________
conv2d_9 (Conv2D)               (None, 28, 28, 128)  147584      conv2d_8[0][0]                   
__________________________________________________________________________________________________
conv2d_10 (Conv2D)              (None, 28, 28, 128)  147584      conv2d_9[0][0]                   
____________________________________________________________________________________________

##### Implementing a residual connection when the feature-map sizes differ, using a linear residual connection

In [65]:
from keras import layers 
from keras.layers import Input

# This example assumes the existence of a 4D input tensor x:
# This returns a typical image tensor like those of MNIST dataset 
x = Input(shape=(28, 28, 1), dtype='float32', name='images')
print("x.shape:",x.shape)

# Applies a transformation to x
y = layers.Conv2D(128, 3, activation='relu', padding='same')(x)
y = layers.Conv2D(128, 3, activation='relu', padding='same')(y)
y = layers.MaxPooling2D(2, strides=2)(y)

# Uses a 1 × 1 convolution to linearly downsample the original x tensor to the same shape as y
residual = layers.Conv2D(128, 1, strides=2, padding='same')(x)

# Adds the residual tensor back to the output features
output = layers.add([y, residual])

# Adding a classifier on top of the convnet
output = layers.Flatten()(output)
output = layers.Dense(512, activation='relu')(output)
predictions = layers.Dense(10, activation='softmax')(output)

model = keras.models.Model(inputs=x, outputs=predictions)
model.summary()

x.shape: (None, 28, 28, 1)
Model: "model_3"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
images (InputLayer)             (None, 28, 28, 1)    0                                            
__________________________________________________________________________________________________
conv2d_11 (Conv2D)              (None, 28, 28, 128)  1280        images[0][0]                     
__________________________________________________________________________________________________
conv2d_12 (Conv2D)              (None, 28, 28, 128)  147584      conv2d_11[0][0]                  
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)  (None, 14, 14, 128)  0           conv2d_12[0][0]                  
_________________________________________________________________

### 7.1.5 Layer weight sharing

One more important feature of the functional API is the ability to reuse a layer instance several times. This allows you to build models that have shared branches--several branches that all share the same knowledge and perform the same operations. That is, they share the same representations and learn these representations simultaneously for different sets of inputs.

In [68]:
from keras import layers
from keras import Input
from keras.models import Model

# Instantiates a single LSTM layer, once
lstm = layers.LSTM(32)

# Builiding the left branch of the model: 
# inputs are variable-length sequences of vectors of size 128
left_input = Input(shape=(None, 128))
left_output = lstm(left_input)

# Building the right branch of the model:
# when you call an existing layer instance, you resue its weights
right_input = Input(shape=(None, 128))
right_output = lstm(right_input)

# Builds the classifier on top
merged = layers.concatenate([left_output, right_output], axis=-1)
predictions = layers.Dense(1, activation='sigmoid')(merged)

# Instantiating and training the model:
# when you train such a model, the weights of the LSTM layer are updated based on both inputs
model = Model([left_input, right_input], predictions)
model.summary()

Model: "model_5"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_6 (InputLayer)            (None, None, 128)    0                                            
__________________________________________________________________________________________________
input_7 (InputLayer)            (None, None, 128)    0                                            
__________________________________________________________________________________________________
lstm_3 (LSTM)                   (None, 32)           20608       input_6[0][0]                    
                                                                 input_7[0][0]                    
__________________________________________________________________________________________________
concatenate_4 (Concatenate)     (None, 64)           0           lstm_3[0][0]               

In [69]:
import numpy as np

num_samples = 100
num_symbols = 2

left_data = np.random.randint(0, num_symbols, size=(num_samples,1,128))
right_data = np.random.randint(0, num_symbols, size=(num_samples,1,128))

matching_list = [np.random.randint(0, num_symbols) for _ in range(num_samples)]
targets = np.array(matching_list)

# We must compile a model before training/testing.
model.compile(optimizer='rmsprop',loss='binary_crossentropy',metrics=['acc'])

# Training the model: when you train such a model,
# the weights of the LSTM layer are updated based on both inputs.
model.fit([left_data, right_data], targets)

Epoch 1/1


<keras.callbacks.callbacks.History at 0x7f3571cfe2b0>

### 7.1.6 Models as layers

One simple practical example of what you can build by reusing a model instance is a vision model that uses a dual camera as its input: two parallel cameras, a few centimeters apart. Such low-level processing can be shared across the two inputs: that is, done via layers that use the same weights and thus share the same representations.

In [70]:
from keras import applications 
from keras import Input

nbr_classes = 10

# The base image-processing model is the Xception network (convolutional base only).
xception_base = applications.Xception(weights=None,include_top=False)

# The inputs are 250 × 250 RGB images.
left_input = Input(shape=(250, 250, 3))
right_input = Input(shape=(250, 250, 3))

left_features = xception_base(left_input)
# right_input = xception_base(right_input)
right_features = xception_base(right_input)

merged_features = layers.concatenate([left_features, right_features], axis=-1)

predictions = layers.Dense(nbr_classes, activation='softmax')(merged_features)

# Instantiating the model
model = Model([left_input, right_input], predictions)
model.summary()

Model: "model_6"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_9 (InputLayer)            (None, 250, 250, 3)  0                                            
__________________________________________________________________________________________________
input_10 (InputLayer)           (None, 250, 250, 3)  0                                            
__________________________________________________________________________________________________
xception (Model)                multiple             20861480    input_9[0][0]                    
                                                                 input_10[0][0]                   
__________________________________________________________________________________________________
concatenate_5 (Concatenate)     (None, 8, 8, 4096)   0           xception[1][0]             