<a href="https://colab.research.google.com/github/rahiakela/deep-learning-with-python-francois-chollet/blob/7-advanced-deep-learning-best-practices/1_going_beyond_the_sequential_model_keras_functional_api.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Going beyond the Sequential model: the Keras functional API

Until now, all neural networks introduced in this book have been implemented using the Sequential model. The Sequential model makes the assumption that the
network has exactly one input and exactly one output, and that it consists of a linear stack of layers.

<img src='https://github.com/rahiakela/img-repo/blob/master/deep-learning-with-python/sequential.png?raw=1' width='800'/>

This is a commonly verified assumption; the configuration is so common that we’ve been able to cover many topics and practical applications in these pages so far using only the Sequential model class. But this set of assumptions is too inflexible in a number of cases. Some networks require several independent inputs, others require multiple outputs, and some networks have internal
branching between layers that makes them look like graphs of layers rather than linear stacks of layers.

Some tasks, for instance, require multimodal inputs: they merge data coming from
different input sources, processing each type of data using different kinds of neural
layers. Imagine a deep-learning model trying to predict the most likely market price of
a second-hand piece of clothing, using the following inputs: user-provided metadata
(such as the item’s brand, age, and so on), a user-provided text description, and a picture
of the item. If you had only the metadata available, you could one-hot encode it
and use a densely connected network to predict the price. If you had only the text
description available, you could use an RNN or a 1D convnet. If you had only the picture,
you could use a 2D convnet. But how can you use all three at the same time? A
naive approach would be to train three separate models and then do a weighted average
of their predictions. But this may be suboptimal, because the information
extracted by the models may be redundant. A better way is to jointly learn a more accurate
model of the data by using a model that can see all available input modalities
simultaneously: a model with three input branches.


<img src='https://github.com/rahiakela/img-repo/blob/master/deep-learning-with-python/multi-input-model.png?raw=1' width='800'/>

Similarly, some tasks need to predict multiple target attributes of input data. Given the
text of a novel or short story, you might want to automatically classify it by genre (such
as romance or thriller) but also predict the approximate date it was written. Of course,
you could train two separate models: one for the genre and one for the date. But
because these attributes aren’t statistically independent, you could build a better
model by learning to jointly predict both genre and date at the same time. Such a
joint model would then have two outputs, or heads.

Due to correlations
between genre and date, knowing the date of a novel would help the model
learn rich, accurate representations of the space of novel genres, and vice versa.

<img src='https://github.com/rahiakela/img-repo/blob/master/deep-learning-with-python/multi-output-model.png?raw=1' width='800'/>

Additionally, many recently developed neural architectures require nonlinear network topology: networks structured as directed acyclic graphs. The Inception family of networks (developed by Szegedy et al. at Google),1 for instance, relies on Inception modules, where the input is processed by several parallel convolutional branches whose outputs are then merged back into a single tensor.

<img src='https://github.com/rahiakela/img-repo/blob/master/deep-learning-with-python/inception-module.png?raw=1' width='800'/>

There’s also the recent trend of adding residual connections to a model, which started with the ResNet family of networks. A residual connection consists
of reinjecting previous representations into the downstream flow of data by adding a past output tensor to a later output tensor, which helps prevent
information loss along the data-processing flow. There are many other examples of such graph-like networks.


<img src='https://github.com/rahiakela/img-repo/blob/master/deep-learning-with-python/residual-connection.png?raw=1' width='800'/>

These three important use cases—multi-input models, multi-output models, and
graph-like models—aren’t possible when using only the Sequential model class in
Keras. But there’s another far more general and flexible way to use Keras: the functional API.



## Setup

In [None]:
from __future__ import absolute_import, division, print_function, unicode_literals

try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass
import tensorflow as tf
from tensorflow import keras

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten, Dense, Conv1D, MaxPooling1D, GlobalMaxPooling1D, LSTM, Embedding, GRU, Bidirectional
from tensorflow.keras.optimizers import RMSprop

from tensorflow.keras.datasets import imdb

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

from tensorflow.keras import backend as K

import numpy as np
import pandas as pd

import string
import os

import matplotlib.pyplot as plt
%matplotlib inline

## Introduction to the functional API