# TensorFlow Guide: Keras

This notes summarize TensorFlow Guide on Keras, from https://www.tensorflow.org/guide/keras/

In [2]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

## I. The sequential model

A Sequential model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor.

It behaves very much like a list of layers. In particular, it can be created by passing a list of layers to the Sequential constructor

In [2]:
model = keras.Sequential(
    [
        keras.Input(shape=(4,)),
        layers.Dense(2, activation="relu", name="layer1"),
        layers.Dense(3, activation="relu", name="layer2"),
        layers.Dense(4, name="layer3"),
    ]
)
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
layer1 (Dense)               (None, 2)                 10        
_________________________________________________________________
layer2 (Dense)               (None, 3)                 9         
_________________________________________________________________
layer3 (Dense)               (None, 4)                 16        
Total params: 35
Trainable params: 35
Non-trainable params: 0
_________________________________________________________________


A few more comments on the Sequential and its components
 - The inputs returned from keras.Input() contains information about the shape and dtype of the input data
   - Batch size is always omitted from shape
   - keras.Input() may be replaced by passing input_shape=(4,) to the 1st layer (layer1 above)
 - In parallel with list behavior, model.add() and model.pop() adds and removes layer
   - One can use .add() and .summary() to inspect the architecture, namely, input and output dimensions
 - layer.dense() returns layer, a function-like class
   - y=layer(x)
   - layer.output field contains layer's output
   - layer.weights data field
 - model.layers behave like a list of layers
   - outputs=[layer.output for layer in model.layers] gathers all layers' outputs
 - model.get_layer(name='layer1') gets layer by name
 

### I.1 Transfer learning with sequential learning

## II Keras functional API

This API is used to created model more flexible than sequntial model. It can handle models with non-linear topology, shared layers, and even multiple inputs or outputs.

The <b>main idea</b> is that a deep learning model is usually a directed acyclic graph (DAG) of layers. Using the functional API to feed one layer's output to another's input, one establish a DAG (of layers).


In [4]:
inputs = keras.Input(shape=(784,))    # Input, note batch size is omitted
dense = layers.Dense(64, activation="relu") # 1st layer
x = dense(inputs)
x = layers.Dense(64, activation="relu")(x) # 2nd layer
outputs = layers.Dense(10)(x)  # 3rd layer
model = keras.Model(inputs=inputs, outputs=outputs, name="mnist_model")
model.summary()

Model: "mnist_model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_2 (InputLayer)         [(None, 784)]             0         
_________________________________________________________________
dense_3 (Dense)              (None, 64)                50240     
_________________________________________________________________
dense_4 (Dense)              (None, 64)                4160      
_________________________________________________________________
dense_5 (Dense)              (None, 10)                650       
Total params: 55,050
Trainable params: 55,050
Non-trainable params: 0
_________________________________________________________________


Model can be displayed as a graph below

In [5]:
keras.utils.plot_model(model, "my_first_model.png")

Failed to import pydot. You must install pydot and graphviz for `pydotprint` to work.


### II.1. Use the same graph of layers to define multiple models

In the functional API, models are created by specifying their inputs and outputs in a graph of layers. That means that a single graph of layers can be used to generate multiple models.

### II.2. All models are callable, just like layers -- connecting and nesting models

You can treat any model as if it were a layer by invoking it on an Input or on the output of another layer. By calling a model you aren't just reusing the architecture of the model, you're also reusing its weights.

In the example below, we define an encoder model and a decoder model. Then we connect them to form a autoencoder model.

In [6]:
encoder_input = keras.Input(shape=(28, 28, 1), name="original_img")
x = layers.Conv2D(16, 3, activation="relu")(encoder_input)
x = layers.Conv2D(32, 3, activation="relu")(x)
x = layers.MaxPooling2D(3)(x)
x = layers.Conv2D(32, 3, activation="relu")(x)
x = layers.Conv2D(16, 3, activation="relu")(x)
encoder_output = layers.GlobalMaxPooling2D()(x)

encoder = keras.Model(encoder_input, encoder_output, name="encoder")
encoder.summary()

decoder_input = keras.Input(shape=(16,), name="encoded_img")
x = layers.Reshape((4, 4, 1))(decoder_input)
x = layers.Conv2DTranspose(16, 3, activation="relu")(x)
x = layers.Conv2DTranspose(32, 3, activation="relu")(x)
x = layers.UpSampling2D(3)(x)
x = layers.Conv2DTranspose(16, 3, activation="relu")(x)
decoder_output = layers.Conv2DTranspose(1, 3, activation="relu")(x)

decoder = keras.Model(decoder_input, decoder_output, name="decoder")
decoder.summary()

autoencoder_input = keras.Input(shape=(28, 28, 1), name="img")
encoded_img = encoder(autoencoder_input)
decoded_img = decoder(encoded_img)
autoencoder = keras.Model(autoencoder_input, decoded_img, name="autoencoder")
autoencoder.summary()

Model: "encoder"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
original_img (InputLayer)    [(None, 28, 28, 1)]       0         
_________________________________________________________________
conv2d (Conv2D)              (None, 26, 26, 16)        160       
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 24, 24, 32)        4640      
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 8, 8, 32)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 6, 6, 32)          9248      
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 4, 4, 16)          4624      
_________________________________________________________________
global_max_pooling2d (Global (None, 16)                0   

The model can also be nested: a model can contain sub-models (since a model is just like a layer). A common use case for model nesting is ensembling. For example, here's how to ensemble a set of models into a single model that averages their predictions:

In [7]:
def get_model():
    inputs = keras.Input(shape=(128,))
    outputs = layers.Dense(1)(inputs)
    return keras.Model(inputs, outputs)

model1 = get_model()
model2 = get_model()
model3 = get_model()

inputs = keras.Input(shape=(128,))
y1 = model1(inputs)
y2 = model2(inputs)
y3 = model3(inputs)
outputs = layers.average([y1, y2, y3])
ensemble_model = keras.Model(inputs=inputs, outputs=outputs)

### II.3. Manipulate complex graph topologies
[Yet to read]

With functional API, one can
 - Construct models with multiple inputs and outputs
 - Include non-linear connectivity topology -- A toy ResNet model
 - Share layers
 - Extract and reuse nodes in the graph of layers
 - Extend the API using custom layers


### II.4 When to use the functional API
[Yet to read]

Two alternatives
 - Functional API
 - Model subclassing

### II.5 Directly subclass 
[Yet to read]

All models in the tf.keras API can interact with each other. For example, one can use a functional model or Sequential model as part of a subclassed model or layer.


## III. Training and evaluation with the built-in methods

This section covers training, evaluation, and prediction (inference) models when using built-in APIs for training & validation (such as model.fit(), model.evaluate(), model.predict()).

If you are interested in leveraging fit() while specifying your own training step function, see the guide "customizing what happens in fit()".

If you are interested in writing your own training & evaluation loops from scratch, see the guide "writing a training loop from scratch".


### III.1 API overview: a first end-to-end example

When passing data to the built-in training loops of a model, you should either use NumPy arrays (if your data is small and fits in memory) or tf.data Dataset objects. A typical training workflow consists of the following steps
 - Split data into train, validation, and test sets
 - Specify model, using keras.Sequential, functional API, Model subclassing, etc.
   - We end up with a model object, call it model
 - Specify the training configuration (optimizer, loss, and optianally metrics to monitor)
   - Using model.compile
 - Call history=model.fit() to train the model
   - It slices the data into "batches" of size "batch_size", and
   - It repeatedly iterate over the entire dataset for a given number of "epochs".
 - The returned "history" object holds a record of the loss values and metric values during training
   - In history.history
 - We evaluate the model on the test data via model.evaluate()
   - It returns loss and accuracy
 - We can obtain prediction by calling model.predict()

#### III.1.A Custom losses
[Yet to read]


#### III.1.B Custom metrics
[Yet to read]


#### III.1.C Handling losses and metrics that don't fit the standard signature
[Yet to read]


### III.2 Training and evaluation from tf.data Datasets
[Yet to read]


#### III.2.B Other input formats supported

Besides NumPy arrays, eager tensors, and TensorFlow Datasets, it's possible to train a Keras model using Pandas dataframes, or from Python generators that yield batches of data & labels.

In particular, the keras.utils.Sequence class offers a simple interface to build Python data generators that are multiprocessing-aware and can be shuffled.

In general, we recommend that you use:
 - NumPy input data if your data is small and fits in memory
 - Dataset objects if you have large datasets and you need to do distributed training
 - Sequence objects if you have large datasets and you need to do a lot of custom Python-side processing that cannot be done in TensorFlow (e.g. if you rely on external libraries for data loading or preprocessing).


### III.3. Using sample weighting and class weighting

Class weight is set by passing a dictionary to the class_weight argument to Model.fit(). This dictionary maps class indices to the weight that should be used for samples belonging to this class.

For fine grained control, or if you are not building a classifier, you can use "sample weights".
 - When training from NumPy data: Pass the sample_weight argument to Model.fit().
 - When training from tf.data or any other sort of iterator: Yield (input_batch, label_batch, sample_weight_batch) tuples.


### III.4. Passing data to multi-input, multi-output models
[Yet to read]


### III.5. Using callbacks

Callbacks in Keras are objects that are called at different points during training (at the start of an epoch, at the end of a batch, at the end of an epoch, etc.) and which can be used to implement behaviors such as:
 - Doing validation at different points during training (beyond the built-in per-epoch validation)
 - Checkpointing the model at regular intervals or when it exceeds a certain accuracy threshold
 - Changing the learning rate of the model when training seems to be plateauing
 - Doing fine-tuning of the top layers when training seems to be plateauing
 - Sending email or instant message notifications when training ends or where a certain performance threshold is exceeded
 - Etc.
 
Many built-in callbacks are available.


### III.6. Checkpointing models

When you're training model on relatively large datasets, it's crucial to save checkpoints of your model at frequent intervals.

The easiest way to achieve this is with the ModelCheckpoint callback:

### III.7. Using learning rate schedules



### III.8. Visualizing loss and metrics during training

The best way to keep an eye on your model during training is to use TensorBoard, a browser-based application that you can run locally that provides you with:
 - Live plots of the loss and metrics for training and evaluation
 - (optionally) Visualizations of the histograms of your layer activations
 - (optionally) 3D visualizations of the embedding spaces learned by your Embedding layers