# Setup

In [1]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Introduction
- The Keras functional API is a way to create models that is more flexible than the `tf.keras.Sequential` API. 
- The functional API can handle models with **non-linear topology**, models with **shared layers**, and models with **multiple inputs or outputs**.
- The main idea that a deep learning model is usually a directed acyclic graph (DAG) of layers. So the functional API is a way to build **graphs of layers**.

In [2]:
inputs = keras.Input(shape=(784,))
inputs.dtype

tf.float32

In [3]:
dense = layers.Dense(64, activation='relu')
X = dense(inputs)
X = layers.Dense(64, activation='relu')(X)
outputs = layers.Dense(10)(X)

- At this point, you can create a `Model` by specifying its inputs in the graph of layers.

In [4]:
model = keras.Model(inputs=inputs, outputs=outputs, name='mnist_model')

In [5]:
model.summary()

Model: "mnist_model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 784)]             0         
_________________________________________________________________
dense (Dense)                (None, 64)                50240     
_________________________________________________________________
dense_1 (Dense)              (None, 64)                4160      
_________________________________________________________________
dense_2 (Dense)              (None, 10)                650       
Total params: 55,050
Trainable params: 55,050
Non-trainable params: 0
_________________________________________________________________


# Training, evaluation, and inference
- Training, evaluation, and inference work exactly in the same way for models built using the functional API as for `Sequential` models.
- Here, load the MNIST image data, reshape it into vectors, fit the model on the data (while monitoring performance on a validation split), then evaluate the model on the test data.

In [6]:
(X_train, y_train), (X_test, y_test) = keras.datasets.mnist.load_data()

X_train = X_train.reshape(60000, 784).astype("float32") / 255
X_test = X_test.reshape(10000, 784).astype("float32") / 255

In [7]:
model.compile(loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True), 
              optimizer=keras.optimizers.RMSprop(),
              metrics=['accuracy'])

In [8]:
history = model.fit(X_train, y_train, batch_size=64, epochs=2, validation_split=0.2)

Train on 48000 samples, validate on 12000 samples
Epoch 1/2
Epoch 2/2


In [9]:
model.evaluate(X_test, y_test, verbose=0)

[0.13393157647661866, 0.9589]

# Save and serialize
- Saving the model and serialization work the same way for models built using the functional API as they do for `Sequential` models. 
- The standard way to save a functional model is to call `model.save()` to save the entire model as a single file.
    - You can later recreate the same model from this file, even if the code that built the model is no longer available.
    - This saved file includes the:
        - model architecture
        - model weight values (that were learned during training)
        - model training config, if any (as passed to `compile`)
        - optimizer and its state, if any (to restart training where you left off)

In [10]:
# model.save("path_to_save_the_model")
# model = keras.models.load_model("path_to_save_the_model")

# Use the same graph of layers to define multiple models
- In the functional API, models are created by specifying their inputs and outputs in a graph of layers. 
    - That means that a single graph of layers can be used to generate multiple models.
- In the example below, you use the same stack of layers to instantiate two models: an encoder model that turns image inputs into 16-dimensional vectors, and an end-to-end autoencoder model for training.
    - Here, the decoding architecture is strictly symmetrical to the encoding architecture, so the output shape is the same as the input shape `(28, 28, 1)`.
    - The reverse of a `Conv2D` layer is a `Conv2DTranspose` layer, and the reverse of a `MaxPooling2D` layer is an `UpSampling2D` layer.

In [11]:
encoder_input = keras.Input(shape=(28, 28, 1), name='img')
X = layers.Conv2D(16, 3, activation='relu')(encoder_input)
X = layers.Conv2D(32, 3, activation='relu')(X)
X = layers.MaxPooling2D(3)(X)
X = layers.Conv2D(32, 3, activation='relu')(X)
X = layers.Conv2D(16, 3, activation='relu')(X)
encoder_output = layers.GlobalMaxPooling2D()(X)

encoder = keras.Model(encoder_input, encoder_output, name='encoder')
encoder.summary()

Model: "encoder"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
img (InputLayer)             [(None, 28, 28, 1)]       0         
_________________________________________________________________
conv2d (Conv2D)              (None, 26, 26, 16)        160       
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 24, 24, 32)        4640      
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 8, 8, 32)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 6, 6, 32)          9248      
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 4, 4, 16)          4624      
_________________________________________________________________
global_max_pooling2d (Global (None, 16)                0   

In [12]:
X = layers.Reshape((4, 4, 1))(encoder_output)
X = layers.Conv2DTranspose(16, 3, activation="relu")(X)
X = layers.Conv2DTranspose(32, 3, activation="relu")(X)
X = layers.UpSampling2D(3)(X)
X = layers.Conv2DTranspose(16, 3, activation="relu")(X)
decoder_output = layers.Conv2DTranspose(1, 3, activation="relu")(X)

autoencoder = keras.Model(encoder_input, decoder_output, name="autoencoder")
autoencoder.summary()

Model: "autoencoder"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
img (InputLayer)             [(None, 28, 28, 1)]       0         
_________________________________________________________________
conv2d (Conv2D)              (None, 26, 26, 16)        160       
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 24, 24, 32)        4640      
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 8, 8, 32)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 6, 6, 32)          9248      
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 4, 4, 16)          4624      
_________________________________________________________________
global_max_pooling2d (Global (None, 16)                

# All models are callable, just like layers
- You can treat any model as if it were a layer by invoking it on an `Input` or on the output of another layer. 
- By calling a model you aren't just reusing the architecture of the model, you're also reusing its weights.
- To see this in action, here's a different take on the autoencoder example that creates an encoder model, a decoder model, and chain them in two calls to obtain the autoencoder model.

In [13]:
encoder_input = keras.Input(shape=(28, 28, 1), name="original_img")
X = layers.Conv2D(16, 3, activation="relu")(encoder_input)
X = layers.Conv2D(32, 3, activation="relu")(X)
X = layers.MaxPooling2D(3)(X)
X = layers.Conv2D(32, 3, activation="relu")(X)
X = layers.Conv2D(16, 3, activation="relu")(X)
encoder_output = layers.GlobalMaxPooling2D()(X)

encoder = keras.Model(encoder_input, encoder_output, name="encoder")

decoder_input = keras.Input(shape=(16,), name="encoded_img")
X = layers.Reshape((4, 4, 1))(decoder_input)
X = layers.Conv2DTranspose(16, 3, activation="relu")(X)
X = layers.Conv2DTranspose(32, 3, activation="relu")(X)
X = layers.UpSampling2D(3)(X)
X = layers.Conv2DTranspose(16, 3, activation="relu")(X)
decoder_output = layers.Conv2DTranspose(1, 3, activation="relu")(X)

decoder = keras.Model(decoder_input, decoder_output, name="decoder")

autoencoder_input = keras.Input(shape=(28, 28, 1), name="img")
encoded_img = encoder(autoencoder_input)
decoded_img = decoder(encoded_img)
autoencoder = keras.Model(autoencoder_input, decoded_img, name="autoencoder")
autoencoder.summary()

Model: "autoencoder"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
img (InputLayer)             [(None, 28, 28, 1)]       0         
_________________________________________________________________
encoder (Model)              (None, 16)                18672     
_________________________________________________________________
decoder (Model)              (None, 28, 28, 1)         9569      
Total params: 28,241
Trainable params: 28,241
Non-trainable params: 0
_________________________________________________________________


- As you can see, the model can be nested: a model can contain sub-models (since a model is just like a layer). 
- A common use case for model nesting is **ensembling**. 
- For example, here's how to ensemble a set of models into a single model that averages their predictions.

In [14]:
def get_model():
    inputs = keras.Input(shape=(128,))
    outputs = layers.Dense(1)(inputs)
    return keras.Model(inputs, outputs)

In [15]:
model1 = get_model()
model2 = get_model()
model3 = get_model()

inputs = keras.Input(shape=(128,))
y1 = model1(inputs)
y2 = model2(inputs)
y3 = model3(inputs)
outputs = layers.average([y1, y2, y3])
ensemble_model = keras.Model(inputs=inputs, outputs=outputs)

# Manipulate complex graph topologies
## Models with multiple inputs and outputs
- The functional API makes it easy to manipulate multiple inputs and outputs, which cannot be handled with the `Sequential` API.
- For example, if you're building a system for ranking custom issue tickets by priority and routing them to the correct department, then the model will have three inputs:
    - The title of the ticket (text input)
    - The text body of the ticket (text input)
    - Any tags added by the user (categorical input)
- The model will have two outputs:
    - The priority score between 0 and 1 (scaler sigmoid output)
    - The department that should handle the ticket (sigmoid output over the set of departments)

In [16]:
num_tags = 12 
num_words = 10000 
num_departments = 4

title_input = keras.Input(shape=(None,), name='title')
body_input = keras.Input(shape=(None,), name='body')
tags_input = keras.Input(shape=(num_tags,), name='tags')

# Embed each word in the title into a 64-dimensional vector
title_features = layers.Embedding(num_words, 64)(title_input)
# Embed each word in the text into a 64-dimensional vector
body_features = layers.Embedding(num_words, 64)(body_input)

# Reduce sequence of embedded words in the title into a single 128-dimensional vector
title_features = layers.LSTM(128)(title_features)
# Reduce sequence of embedded words in the body into a single 32-dimensional vector
body_features = layers.LSTM(32)(body_features)

# Merge all available features into a single large vector via concatenation
X = layers.concatenate([title_features, body_features, tags_input])

# Stick a logistic regression for priority prediction on top of the features
priority_pred = layers.Dense(1, name='priority')(X)

# Stick a department classifier on top of the features
department_pred = layers.Dense(num_departments, name='department')(X)

# Instantiate an end-to-end model predicting both priority and department
model = keras.Model(
    inputs=[title_input, body_input, tags_input],
    outputs=[priority_pred, department_pred]
)

In [17]:
# keras.utils.plot_model(model, 'multi_input_and_output.png', show_shapes=True)

- When compiling the model, you can assign different losses to each output.
- You can even assign different weights to each loss to modulate their contribution to the total training loss.

In [18]:
model.compile(optimizer=keras.optimizers.RMSprop(1e-3),
             loss=[keras.losses.BinaryCrossentropy(from_logits=True),
                   keras.losses.CategoricalCrossentropy(from_logits=True)],
             loss_weights=[1.0, 0.2])

- Since the output layers have different names, you could also specify the loss like below.

In [19]:
model.compile(optimizer=keras.optimizers.RMSprop(1e-3),
             loss={"priority": keras.losses.BinaryCrossentropy(from_logits=True),
                   "department": keras.losses.CategoricalCrossentropy(from_logits=True)},
             loss_weights=[1.0, 0.2])

- Train the model by passing lists of Numpy arrays of inputs and targets.

In [20]:
# Dummy input data
title_data = np.random.randint(num_words, size=(1280, 10))
body_data = np.random.randint(num_words, size=(1280, 100))
tags_data = np.random.randint(2, size=(1280, num_tags)).astype("float32")

# Dummy target data
priority_targets = np.random.random(size=(1280, 1))
dept_targets = np.random.randint(2, size=(1280, num_departments))

In [21]:
model.fit(
    {'title': title_data, 'body': body_data, 'tags': tags_data},
    {'priority': priority_targets, 'department': dept_targets},
    epochs=2,
    batch_size=32
)

Train on 1280 samples
Epoch 1/2
Epoch 2/2


<tensorflow.python.keras.callbacks.History at 0x7ffde13b1390>

## A toy ResNet model
- In addition to models with multiple inputs and outputs, the functional API makes it easy to manipulate **non-linear connectivity topologies** - these are models with layers that are not connected sequentially.
- A common use case for this is **residual connections**. 
- Let's build a toy ResNet model for CIFAR10 to demonstrate this.

In [22]:
inputs = keras.Input(shape=(32, 32, 3), name='img')
X = layers.Conv2D(32, 3, activation='relu')(inputs)
X = layers.Conv2D(64, 3, activation='relu')(X)
block_1_output = layers.MaxPool2D(3)(X)

X = layers.Conv2D(64, 3, activation='relu', padding='same')(block_1_output)
X = layers.Conv2D(64, 3, activation='relu', padding='same')(X)
block_2_output = layers.add([X, block_1_output])

X = layers.Conv2D(64, 3, activation='relu', padding='same')(block_2_output)
X = layers.Conv2D(64, 3, activation='relu', padding='same')(X)
block_3_output = layers.add([X, block_2_output])

X = layers.Conv2D(64, 3, activation='relu')(block_3_output)
X = layers.GlobalAveragePooling2D()(X)
X = layers.Dense(256, activation='relu')(X)
X = layers.Dropout(0.5)(X)
outputs = layers.Dense(10)(X)

model = keras.Model(inputs, outputs, name='toy_resnet')
model.summary()

Model: "toy_resnet"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
img (InputLayer)                [(None, 32, 32, 3)]  0                                            
__________________________________________________________________________________________________
conv2d_8 (Conv2D)               (None, 30, 30, 32)   896         img[0][0]                        
__________________________________________________________________________________________________
conv2d_9 (Conv2D)               (None, 28, 28, 64)   18496       conv2d_8[0][0]                   
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D)  (None, 9, 9, 64)     0           conv2d_9[0][0]                   
_________________________________________________________________________________________

In [23]:
# keras.utils.plot_model(model, 'mini_resnet.png', show_shapes=True)

- Now, let's train the model.

In [24]:
(X_train, y_train), (X_test, y_test) = keras.datasets.cifar10.load_data()

X_train = X_train.astype("float32") / 255.0
X_test = X_test.astype("float32") / 255.0
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

In [25]:
model.compile(optimizer=keras.optimizers.RMSprop(1e-3),
             loss=keras.losses.CategoricalCrossentropy(from_logits=True),
             metrics=['accuracy'])

In [26]:
model.fit(X_train[:1000], y_train[:1000], batch_size=64, epochs=3, validation_split=0.2)

Train on 800 samples, validate on 200 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x7ffdae094710>

# Shared layers
- Shared layers are layer instances that are reused multiple times in a same model - they learn features that correspond to multiple paths in the graph-of-layers.
- Shared layers are often used to encode inputs from similar spaces (say, two different pieces of text that feature similar vocabulary). 
    - They enable sharing of information across these different inputs, and they make it possible to train such a model on less data. 
    - If a given word is seen in one of the inputs, that will benefit the processing of all inputs that pass through the shared layer.
- To share a layer in the functional API, call the same layer instance multiple times. 
- For instance, here's an `Embedding` layer shared across two different text inputs.

In [27]:
# Embedding for 1000 unique words mapped to 128-dimensional vectors
shared_embedding = layers.Embedding(1000, 128)

# Variable-length sequence of integers
text_input_a = keras.Input(shape=(None,), dtype='int32')

# Variable-length sequence of integers
text_input_b = keras.Input(shape=(None,), dtype='int32')

# Reuse the same layer to encode both inputs
encoded_input_a = shared_embedding(text_input_a)
encoded_input_b = shared_embedding(text_input_b)

# Extract and reuse nodes in the graph of layers
- Because the graph of layers you are manipulating is a static data structure, it can be accessed and inspected.
    - This is how you are able to plot functional models as images.
    - This also means that you can access the activations of intermediate layers ("nodes" in the graph) and reuse them elsewhere, which is very useful for something like feature extraction.
- Let's look at an example of a VGG19 model with weights pretrained on ImageNet.

In [28]:
vgg19 = tf.keras.applications.VGG19()

- Let's obtain the intermediate activations of the model by querying the graph data structure.

In [29]:
feature_list = [layer.output for layer in vgg19.layers]
feature_list

[<tf.Tensor 'input_8:0' shape=(None, 224, 224, 3) dtype=float32>,
 <tf.Tensor 'block1_conv1/Identity:0' shape=(None, 224, 224, 64) dtype=float32>,
 <tf.Tensor 'block1_conv2/Identity:0' shape=(None, 224, 224, 64) dtype=float32>,
 <tf.Tensor 'block1_pool/Identity:0' shape=(None, 112, 112, 64) dtype=float32>,
 <tf.Tensor 'block2_conv1/Identity:0' shape=(None, 112, 112, 128) dtype=float32>,
 <tf.Tensor 'block2_conv2/Identity:0' shape=(None, 112, 112, 128) dtype=float32>,
 <tf.Tensor 'block2_pool/Identity:0' shape=(None, 56, 56, 128) dtype=float32>,
 <tf.Tensor 'block3_conv1/Identity:0' shape=(None, 56, 56, 256) dtype=float32>,
 <tf.Tensor 'block3_conv2/Identity:0' shape=(None, 56, 56, 256) dtype=float32>,
 <tf.Tensor 'block3_conv3/Identity:0' shape=(None, 56, 56, 256) dtype=float32>,
 <tf.Tensor 'block3_conv4/Identity:0' shape=(None, 56, 56, 256) dtype=float32>,
 <tf.Tensor 'block3_pool/Identity:0' shape=(None, 28, 28, 256) dtype=float32>,
 <tf.Tensor 'block4_conv1/Identity:0' shape=(None,

- Now, let's use these features to create a new feature-extraction model that returns the values of the intermediate layer activations.

In [30]:
feature_extraction_model = keras.Model(inputs=vgg19.input, outputs=feature_list)

In [31]:
img = np.random.random((1, 244, 244, 3)).astype('float32')
extracted_features = feature_extraction_model(img)

In [32]:
extracted_features

[<tf.Tensor: id=18440, shape=(1, 244, 244, 3), dtype=float32, numpy=
 array([[[[0.01910218, 0.23959348, 0.03273841],
          [0.59789544, 0.51862437, 0.4374817 ],
          [0.5622721 , 0.785809  , 0.71979773],
          ...,
          [0.89344704, 0.96663594, 0.03261928],
          [0.12798543, 0.54168135, 0.07553687],
          [0.00182972, 0.4259034 , 0.5178315 ]],
 
         [[0.47503135, 0.46647358, 0.6928288 ],
          [0.66985554, 0.9231054 , 0.58157104],
          [0.54263586, 0.32283202, 0.36899155],
          ...,
          [0.31769887, 0.18798204, 0.8339225 ],
          [0.3950599 , 0.57703817, 0.71434957],
          [0.89083767, 0.6465756 , 0.38688296]],
 
         [[0.70425844, 0.66526914, 0.6328171 ],
          [0.03174131, 0.24581972, 0.86410886],
          [0.4923992 , 0.5273285 , 0.21161635],
          ...,
          [0.32229134, 0.5500966 , 0.42561767],
          [0.3449951 , 0.62926257, 0.2221869 ],
          [0.50959224, 0.8059536 , 0.84928644]],
 
         ...,

# Extend the API using custom layers
- `tf.keras` includes a wide range of built-in layers.
    - Convolutional layers: `Conv1D`, `Conv2D`, `Conv3D`, `Conv2DTranspose`
    - Pooling layers: `MaxPooling1D`, `MaxPooling2D`, `MaxPooling3D`, `AveragePooling1D`
    - RNN layers: `GRU`, `LSTM`, `ConvLSTM2D`
    - `BatchNormalization`, `Dropout`, `Embedding`, etc.
- But you can also extend the API by creating your own layers.
- All layers subclass the `Layer` class and implement the following.
    - **`build` method**: creates the weights of the layer (you can also create weights in `__init__()`)
    - **`call` method**: specifies the computation done by the layer
- The following is a basic implementation of `tf.keras.layers.Dense`.

In [33]:
class CustomDense(keras.layers.Layer):
    def __init__(self, units=32, **kwargs):
        super().__init__(**kwargs)
        self.units = units
        
    def build(self, input_shape):
        self.w = self.add_weight(
            shape = (input_shape[-1], self.units),
            initializer='random_normal',
            trainable=True
        )
        
        self.b = self.add_weight(
            shape = (self.units,),
            initializer='random_normal',
            trainable=True
        )
        
    def call(self, inputs):
        return tf.matmul(inputs, self.w) + self.b
    
    def get_config(self):
        base_config = super().get_config()
        return {**base_config, "units": self.units}

In [34]:
inputs = keras.Input(shape=(4,))
outputs = CustomDense(10)(inputs)

In [35]:
model = keras.Model(inputs, outputs)

In [36]:
config = model.get_config()
config

{'name': 'model_6',
 'layers': [{'class_name': 'InputLayer',
   'config': {'batch_input_shape': (None, 4),
    'dtype': 'float32',
    'sparse': False,
    'name': 'input_9'},
   'name': 'input_9',
   'inbound_nodes': []},
  {'class_name': 'CustomDense',
   'config': {'name': 'custom_dense',
    'trainable': True,
    'dtype': 'float32',
    'units': 10},
   'name': 'custom_dense',
   'inbound_nodes': [[['input_9', 0, 0, {}]]]}],
 'input_layers': [['input_9', 0, 0]],
 'output_layers': [['custom_dense', 0, 0]]}

In [37]:
new_model = keras.models.Model.from_config(config, custom_objects={'CustomDense': CustomDense})

# When to use the functional API
- In general, the functional API is higher-level, easier and safer, and has a number of features that subclassed models do not support.
- However, model subclassing provides greater flexibility when building models that are not easily expressible as directed acyclic graphs of layers. 
    - For example, you could not implement a Tree-RNN with the functional API and would have to subclass Model directly.

# Mix-and-match API styles
- Choosing between the functional API or Model subclassing isn't a binary decision that restricts you into one category of models. 
- All models in the `tf.keras` API can interact with each other, whether they're Sequential models, functional models, or subclassed models that are written from scratch.
- You can always use a functional model or Sequential model as part of a subclassed model or layer.

In [38]:
units = 32
timesteps = 10
input_dim = 5

# Define a Functional model
inputs = keras.Input(shape=(None, units))
X = layers.GlobalAveragePooling1D()(inputs)
outputs = layers.Dense(1)(X)
model = keras.Model(inputs, outputs)

# Define a subclassed model that includes the Functional model defined above
class CustomRNN(layers.Layer):
    def __init__(self):
        super().__init__()
        self.units = units
        self.projection_1 = layers.Dense(units=units, activation='tanh')
        self.projection_2 = layers.Dense(units=units, activation='tanh')
        self.classifier = model
        
    def call(self, inputs):
        outputs = []
        state = tf.zeros(shape=(inputs.shape[0], self.units))
        
        for t in range(inputs.shape[1]):
            X = inputs[:, t, :]
            h = self.projection_1(X)
            y = h + self.projection_2(state)
            state = y
            outputs.append(y)
            
        features = tf.stack(outputs, axis=1)
        print(features.shape)
        return self.classifier(features)

In [39]:
rnn_model = CustomRNN()

In [40]:
rnn_model(tf.zeros((1, timesteps, input_dim)))

(1, 10, 32)


<tf.Tensor: id=18846, shape=(1, 1), dtype=float32, numpy=array([[0.]], dtype=float32)>

- You can use any subclassed layer or model in the functional API as long as it implements a `call` method that follows one of the following patterns.
    - **`call(self, inputs, **kwargs)`**
        - Where `inputs` is a tensor or nested structure of tensors (e.g. a list of tensors), and `**kwargs` are non-tensor arguments (non-inputs).
    - **`call(self, inputs, training=None, **kwargs)`**
        - Where `training` is a boolean indicating whether the layer should behave in training mode and inference mode.
    - **`call(self, inputs, mask=None, **kwargs)`**
        - Where `mask` is a boolean mask tensor (useful for RNNs, for instance).
    - **`call(self, inputs, training=None, mask=None, **kwargs)`** 
        - You can have both masking and training-specific behavior at the same time.
- Additionally, if you implement the get_config method on your custom Layer or model, the functional models you create will still be serializable and cloneable.
- Here's a quick example of a custom RNN, written from scratch, being used in a functional model.

In [41]:
units = 32
timesteps = 10
input_dim = 5
batch_size = 16


class CustomRNN(layers.Layer):
    def __init__(self):
        super(CustomRNN, self).__init__()
        self.units = units
        self.projection_1 = layers.Dense(units=units, activation="tanh")
        self.projection_2 = layers.Dense(units=units, activation="tanh")
        self.classifier = layers.Dense(1)

    def call(self, inputs):
        outputs = []
        state = tf.zeros(shape=(inputs.shape[0], self.units))
        for t in range(inputs.shape[1]):
            x = inputs[:, t, :]
            h = self.projection_1(x)
            y = h + self.projection_2(state)
            state = y
            outputs.append(y)
        features = tf.stack(outputs, axis=1)
        return self.classifier(features)

- Note that you can specify a static batch size for the inputs with the `batch_size` argument because the inner computation of `CustomRNN` requires a static batch size (when you create the `state` zero tensor).

In [42]:
inputs = keras.Input(batch_shape=(batch_size, timesteps, input_dim))
X = layers.Conv1D(32,3)(inputs)
outputs = CustomRNN()(X)

In [43]:
model = keras.Model(inputs, outputs)