# CS492 전산학특강<인공지능 산업 및 스마트에너지>
## Deep Learning Practice 
#### Prof. Ho-Jin Choi
#### School of Computing, KAIST

---

# Week 4 - Cumstomization 
## Schedule of this week
7. **Customization of your model**  <br>
    7-1. Functional API  <br>
    7-2. Cumstomization of loss and metric <br>
    7-3. Several options for optimizer and training <br>
    7-4. Subclassing and GradientTape <br>
    7-5. Custom layers and training <br>

---

## 7 - Customization of your models
### 7-1. Functional API in Tensorflow 
#### (Customization of sequence of forward passing)

In [None]:
from __future__ import absolute_import, division, print_function, unicode_literals

try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
    pass
import tensorflow as tf

tf.keras.backend.clear_session()  # For easy reset of notebook state.

#### Building a model using functional API 

You're already familiar with the use of `keras.Sequential()` to create models. The _Functional API_ is **a way to create models that is more flexible than _Sequential_: it can handle models with non-linear topology, models with shared layers, and models with multiple inputs or outputs.**

It's based on the idea that a deep learning model is usually a directed acyclic graph (DAG) of layers. The Functional API a set of tools for building graphs of layers.

Consider the following model:
``` python
(input: 784-dimensional vectors)
       ↧
[Dense (64 units, relu activation)]
       ↧
[Dense (10 units, softmax activation)]
       ↧
(output: probability distribution over 10 classes)
```

It's a simple graph of 3 layers.

To build this model with the functional API, you would start by creating an _input node_:

In [None]:
from tensorflow import keras
# Build an input node
inputs = 

if the input shape is a (32, 32, 3) image, we can build the input node as follows: 
``` python 
img_inputs = keras.Input(shape=(32, 32, 3))
```


Here we just specify the shape of our data: 784-dimensional vectors. Note that the batch size is always omitted, **we only specify the shape of each sample.**

In [None]:
inputs.shape

In [None]:
inputs.dtype

You create a new node in the graph of layers by calling a layer on this inputs object:

In [None]:
from tensorflow.keras import layers

# define a new dense layer
dense = 

# build the dense layer to the graph of layers
# by feeding outputs of input layer to the dense layer as an input 
# ==> Layer call 
x =

The _"layer call"_ action is like drawing an arrow from "inputs" to this layer we created. We're "passing" the inputs to the dense layer, and out we get x.

Let's add a few more layers to our graph of layers:

In [None]:
# build more layers 
x =
outputs =

At this point, we can create a Model by specifying its inputs and outputs in the graph of layers:

In [None]:
from tensorflow.keras import Model

# create a model 
model =

To recap, here is our full model definition process:

``` python
inputs = keras.Input(shape=(784,), name='img')
x = layers.Dense(64, activation='relu')(inputs)
x = layers.Dense(64, activation='relu')(x)
outputs = layers.Dense(10, activation='softmax')(x)

model = keras.Model(inputs=inputs, outputs=outputs, name='mnist_model')
```

Let's check out what the model summary looks like:

In [None]:
model.summary()

We can also plot the model as a graph:

In [None]:
tf.keras.utils.plot_model(model, 'my_first_model.png', show_shapes=True)

This figure and the code we wrote are virtually identical. In the code version, the connection arrows are simply replaced by the call operation.

A "graph of layers" is a very intuitive mental image for a deep learning model, and the functional API is a way to create models that closely mirrors this mental image.

#### Training, evaluation, and inference

Training, evaluation, and inference work exactly in the same way for models built using the Functional API as for Sequential models.

To simply demonstrate our model, we load MNIST image data, reshape it into vectors, fit the model on the data (while monitoring performance on a validation split), and finally we evaluate our model on the test data:

In [None]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape(60000, 784).astype('float32') / 255
x_test = x_test.reshape(10000, 784).astype('float32') / 255

model.compile(loss='sparse_categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
history = model.fit(x_train, y_train,
                    batch_size=64,
                    epochs=5,
                    validation_split=0.2)
test_scores = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', test_scores[0])
print('Test accuracy:', test_scores[1])

#### Using the same graph of layers to define multiple models
In the functional API, models are created by specifying their inputs and outputs in a graph of layers. **That means that a single graph of layers can be used to generate multiple models.**

**Simple autoencoder example:** <br>
In the example below, we use the same stack of layers to instantiate two models: an `encoder` model that turns image inputs into 16-dimensional vectors, and an end-to-end `autoencoder` model for training.
<img src=https://lilianweng.github.io/lil-log/assets/images/autoencoder-architecture.png>

[`tf.keras.layers.Conv2DTranspose`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2DTranspose): Transposed convolution layer (sometimes called Deconvolution). <br>
[`tf.keras.layers.UpSampling2D
`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/UpSampling2D): Upsampling layer for 2D inputs. (`size`: The upsampling factors for rows and columns.)

In [None]:
encoder_input = keras.Input(shape=(28, 28, 1), name='img')
# (28x28x1) -> Conv2D layer with filters(Channel) 16, kernel_size 3, relu activation
x = 
# (26x26x16) -> Conv2D layer with filters(Channel) 32, kernel_size 3, relu activation
x = 
# (24x24x32) -> MaxPooling2D layer with pooling size 3
x = 
# (8x8x32) -> Conv2D layer with filters(Channel) 32, kernel_size 3, relu activation
x = 
# (6x6x32) -> Conv2D layer with filters(Channel) 32, kernel_size 3, relu activation
x =
# (4x4x16) -> GlobalMaxPooling2D
encoder_output =
# (16,)

encoder = 
encoder.summary()

In [None]:
tf.keras.utils.plot_model(encoder, 'encoder.png', show_shapes=True)

In [None]:
# use the output of encoder model as an input of autoencoder model
x = 
x = 
x = 
x = 
x = 
decoder_output = 

# define the model using encoder_input and decoder_output
autoencoder =
autoencoder.summary()

In [None]:
tf.keras.utils.plot_model(autoencoder, 'autoencoder.png', show_shapes=True)

#### All models are callable, just like layers
You can treat any model as if it were a layer, by calling it on an Input or on the output of another layer. Note that by calling a model you aren't just reusing the architecture of the model, **you're also reusing its weights.**

Let's see this in action. Here's a different take on the autoencoder example that creates an encoder model, a decoder model, and chain them in two calls to obtain the autoencoder model:

In [None]:
encoder_input = keras.Input(shape=(28, 28, 1), name='original_img')
x = layers.Conv2D(16, 3, activation='relu')(encoder_input)
x = layers.Conv2D(32, 3, activation='relu')(x)
x = layers.MaxPooling2D(3)(x)
x = layers.Conv2D(32, 3, activation='relu')(x)
x = layers.Conv2D(16, 3, activation='relu')(x)
encoder_output = layers.GlobalMaxPooling2D()(x)

encoder = keras.Model(encoder_input, encoder_output, name='encoder')
encoder.summary()

In [None]:
decoder_input = keras.Input(shape=(16,), name='encoded_img')
x = layers.Reshape((4, 4, 1))(decoder_input)
x = layers.Conv2DTranspose(16, 3, activation='relu')(x)
x = layers.Conv2DTranspose(32, 3, activation='relu')(x)
x = layers.UpSampling2D(3)(x)
x = layers.Conv2DTranspose(16, 3, activation='relu')(x)
decoder_output = layers.Conv2DTranspose(1, 3, activation='relu')(x)

decoder = keras.Model(decoder_input, decoder_output, name='decoder')
decoder.summary()

In [None]:
autoencoder_input = 
encoded_img = 
decoded_img = 
autoencoder = 
autoencoder.summary()

As you can see, **model can be nested: a model can contain submodels (since a model is just like a layer).**

**A common use case for model nesting is _ensembling_.** As an example, here's how to ensemble a set of models into a single model that averages their predictions:

In [None]:
def get_model():
    inputs = keras.Input(shape=(128,))
    outputs = layers.Dense(1, activation='sigmoid')(inputs)
    return keras.Model(inputs, outputs)

model1 = 
model2 = 
model3 =

inputs = 
y1 =
y2 = 
y3 = 
outputs =
ensemble_model = keras.Model(inputs=inputs, outputs=outputs)

**Advantages of ensembling method:**
1. Reducing an overfitting by combining several models.
1. Improving the overall performance when each model performance is low

For more detailed explanation about ensembliig method, refer to this site: https://en.wikipedia.org/wiki/Ensemble_learning

#### Manipulating complex graph topologies
The **functional API makes it easy to manipulate multiple inputs and outputs**. This cannot be handled with the Sequential API.

Here's a simple example.

Let's say you're building a system for ranking custom issue tickets by priority and routing them to the right department.

You model will have 3 inputs:
- Title of the ticket (text input)
- Text body of the ticket (text input)
- Any tags added by the user (categorical input)

It will have two outputs:
- Priority score between 0 and 1 (scalar sigmoid output)
- The department that should handle the ticket (softmax output over the set of departments)

Let's built this model in a few lines with the Functional API.

In [None]:
num_tags = 12  # Number of unique issue tags
num_words = 10000  # Size of vocabulary obtained when preprocessing text data
num_departments = 4  # Number of departments for predictions

title_input = keras.Input(shape=(None,), name='title')  # Variable-length sequence of ints
body_input = keras.Input(shape=(None,), name='body')  # Variable-length sequence of ints
tags_input = keras.Input(shape=(num_tags,), name='tags')  # Binary vectors of size `num_tags`

# Embed each word in the title into a 64-dimensional vector
title_features = 
# Embed each word in the text into a 64-dimensional vector
body_features = 
# input shape of embeeding layer: (batch_size, input_length)
# output shape of embedding layer: (batch_size, input_length, embedding_dim)

# Reduce sequence of embedded words in the title into 
# a single 128-dimensional vector
title_features = 
# Reduce sequence of embedded words in the body into 
# a single 32-dimensional vector
body_features = 

# Merge all available features into a single large vector via concatenation
x = 

# Stick a logistic regression for priority prediction on top of the features
priority_pred = 
# Stick a department classifier on top of the features
department_pred =

# Instantiate an end-to-end model predicting both priority and department
model = 

In [None]:
keras.utils.plot_model(model, 'multi_input_and_output_model.png', show_shapes=True)

When compiling this model, we can **assign different losses to each output.** You can even **assign different weights to each loss**, to modulate their contribution to the total training loss. (e.g., Be able to adjust weight of important loss to affect learning more)

In [None]:
# specify the loss using the names of output layers

model.compile(optimizer=keras.optimizers.RMSprop(1e-3),
              loss={'priority': 'binary_crossentropy',
                    'department': 'categorical_crossentropy'},
              loss_weights=[1., 0.2])

"""
model.compile(optimizer=keras.optimizers.RMSprop(1e-3),
              loss=['binary_crossentropy', 'categorical_crossentropy'],
              loss_weights=[1., 0.2])
"""

We can train the model by passing lists of Numpy arrays of inputs and targets:

In [None]:
import numpy as np

# Dummy input data
title_data = np.random.randint(num_words, size=(1280, 10))
body_data = np.random.randint(num_words, size=(1280, 100))
tags_data = np.random.randint(2, size=(1280, num_tags)).astype('float32')

print(title_data.shape)
print(body_data.shape)
print(tags_data.shape)
print("-------------------\n")

print(title_data[0])
print(body_data[0])
print(tags_data[0])

In [None]:
# Dummy target data
priority_targets = np.random.random(size=(1280, 1))
dept_targets = np.random.randint(2, size=(1280, num_departments))

print(priority_targets[0])
print(dept_targets[0])

In [None]:
model.fit({'title': title_data, 'body': body_data, 'tags': tags_data}, # set training data
          {'priority': priority_targets, 'department': dept_targets},  # set target data (labels)
          epochs=2,
          batch_size=32)

When calling fit with a `Dataset` object, it should yield either a tuple of lists like `([title_data, body_data, tags_data], [priority_targets, dept_targets])` or a tuple of dictionaries like `({'title': title_data, 'body': body_data, 'tags': tags_data}, {'priority': priority_targets, 'department': dept_targets})`.

#### A toy resnet model
In addition to models with multiple inputs and outputs, **the Functional API makes it easy to manipulate non-linear connectivity topologies**, that is to say, models where layers are not connected sequentially. This also cannot be handled with the Sequential API (as the name indicates).

A common use case for this is **residual connections**.

<img src=https://miro.medium.com/proxy/1*rbhjv7ZdAgXM2MlBUL5Mmw.png>


Let's build a toy ResNet model for CIFAR10 to demonstrate this.

In [None]:
inputs = keras.Input(shape=(32, 32, 3), name='img')

## Normal connection
# Conv2D with filters 32, kernel_size 3, relu activation
x = 
# Conv2D with filters 64, kernel_size 3, relu activation
x = 
block_1_output =

# Conv2D with filters 64, kernel_size 3, relu activation, padding 'same' (with zero padding)
x = 
# Conv2D with filters 64, kernel_size 3, relu activation, padding 'same' (with zero padding)
x = 
# Use the block_1_output and x as an input
block_2_output = 

# Conv2D with filters 64, kernel_size 3, relu activation, padding 'same' (with zero padding)
x = 
# Conv2D with filters 64, kernel_size 3, relu activation, padding 'same' (with zero padding)
x = 
# Use the block_2_output and x as an input
block_3_output = 

# Conv2D with filters 64, kernel_size 3, relu activation
x = 
# GlobalAveragePooling2D
x = 
# Dense layer with 256 units
x =
# Dropout layer with 0.5 dropout rate
x = 
# Dense layer with 10 units, softmax function
outputs = 


model = 
model.summary()

In [None]:
keras.utils.plot_model(model, 'mini_resnet.png', show_shapes=True)

In [None]:
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

model.compile(optimizer=keras.optimizers.RMSprop(1e-3),
              loss='categorical_crossentropy',
              metrics=['acc'])

model.fit(x_train, y_train,
          batch_size=64,
          epochs=1,
          validation_split=0.2)