# CS492 전산학특강<인공지능 산업 및 스마트에너지>
## Deep Learning Practice 
#### Prof. Ho-Jin Choi
#### School of Computing, KAIST

---

### 7-2. Cumstomization of loss and metric
To train a model with `fit`, you need to specify a loss function, an optimizer, and optionally, some metrics to monitor.
You pass these to the model as arguments to the compile() method:
```python
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
              loss=keras.losses.SparseCategoricalCrossentropy(),
              metrics=[keras.metrics.SparseCategoricalAccuracy()])
```
If your model has multiple outputs, you can specify different losses and metrics for each output, and you can modulate to contribution of each output to the total loss of the model. 

Note that in many cases, the loss and metrics are specified via string identifiers, as a shortcut:
```python
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
              loss='sparse_categorical_crossentropy',
              metrics=['sparse_categorical_accuracy'])
```

For later reuse, let's put our model definition and compile step in functions; we will call them several times across different examples in this guide.

In [None]:
from __future__ import absolute_import, division, print_function, unicode_literals

try:
    %tensorflow_version 2.x
except Exception:
    pass
import tensorflow as tf

import numpy as np

In [None]:
from tensorflow import keras
from tensorflow.keras import layers

def get_uncompiled_model():
    inputs = keras.Input(shape=(784,), name='digits')
    x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
    x = layers.Dense(64, activation='relu', name='dense_2')(x)
    outputs = layers.Dense(10, activation='softmax', name='predictions')(x)
    model = keras.Model(inputs=inputs, outputs=outputs)
    return model

def get_compiled_model():
    model = get_uncompiled_model()
    model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),
                  loss='sparse_categorical_crossentropy',
                  metrics=['sparse_categorical_accuracy'])
    return model

#### Many built-in optimizers, losses, and metrics are available
In general, you won't have to create from scratch your own losses, metrics, or optimizers, because what you need is likely already part of the Keras API:

- `Optimizers`: - `SGD()` (with or without momentum) - `RMSprop()` - `Adam()` - etc.
- `Losses`: - `MeanSquaredError()` - `KLDivergence()` - `CosineSimilarity()` - etc.
- `Metrics`: - `AUC()` - `Precision()` - `Recall()` - etc.


#### Custom losses
There are two ways to provide **custom losses with Keras**. The first example creates a function that accepts inputs `y_true` and `y_pred`. The following example shows a loss function that computes the average distance between the real data and the predictions:

In [None]:
# Load a toy dataset for the sake of this example
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Preprocess the data (these are Numpy arrays)
x_train = x_train.reshape(60000, 784).astype('float32') / 255
x_test = x_test.reshape(10000, 784).astype('float32') / 255

y_train = y_train.astype('float32')
y_test = y_test.astype('float32')

# Reserve 10,000 samples for validation
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]

In [None]:
# define a custom loss function
def basic_loss_function(y_true, y_pred):
    return tf.math.reduce_mean(y_true - y_pred)

model = get_uncompiled_model()
# use the custom loss funciton we defined
model.compile(optimizer=keras.optimizers.Adam(),
              loss=basic_loss_function,
              metrics=['accuracy'])

model.fit(x_train, y_train, batch_size=64, epochs=3)

**You can subclass the `tf.keras.losses.Loss` class** and implement the following two methods:
- `__init__(self)`: Accept parameters to pass during the call of your loss function
- `call(self, y_true, y_pred)`: Use the targets (`y_true`) and the model predictions (`y_pred`) to compute the model's loss

The following example shows how to implement a `WeightedCrossEntropy` loss function that calculates a BinaryCrossEntropy loss, where the loss of a certain class or the whole function can be modified by a scalar.

In [None]:
class WeightedBinaryCrossEntropy(keras.losses.Loss):
    """
    Args:
      pos_weight: Scalar to affect the positive labels of the loss function.
      weight: Scalar to affect the entirety of the loss function.
      from_logits: Whether to compute loss form logits or the probability.
      reduction: Type of tf.keras.losses.Reduction to apply to loss.
      name: Name of the loss function.
    """
    def __init__(self, pos_weight, weight, from_logits=False,
                 reduction=keras.losses.Reduction.AUTO,
                 name='weighted_binary_crossentropy'):
        super(WeightedBinaryCrossEntropy, self).__init__(reduction=reduction,
                                                         name=name)
        self.pos_weight = pos_weight
        self.weight = weight
        self.from_logits = from_logits

    def call(self, y_true, y_pred):
        if not self.from_logits:
            # Manually calculate the weighted cross entropy.
            # Formula is qz * -log(sigmoid(x)) + (1 - z) * -log(1 - sigmoid(x))
            # where z are labels, x is logits, and q is the weight.
            # Since the values passed are from sigmoid (assuming in this case)
            # sigmoid(x) will be replaced by y_pred

            # qz * -log(sigmoid(x)) 1e-6 is added as an epsilon to stop passing a zero into the log
            x_1 = y_true * self.pos_weight * -tf.math.log(y_pred + 1e-6)

            # (1 - z) * -log(1 - sigmoid(x)). Epsilon is added to prevent passing a zero into the log
            x_2 = (1 - y_true) * -tf.math.log(1 - y_pred + 1e-6)

            return tf.add(x_1, x_2) * self.weight 

        # Use built in function
        return tf.nn.weighted_cross_entropy_with_logits(y_true, y_pred, self.pos_weight) * self.weight


# use the loss function you defined
model.compile(

)

model.fit(x_train, y_train, batch_size=64, epochs=3)

#### Custom metrics
If you need a metric that isn't part of the API, **you can easily create custom metrics by subclassing the Metric class**. You will need to implement 4 methods:
- `__init__(self)`: state variables for your metric.
- `update_state(self, y_true, y_pred, sample_weight=None)`: the targets `y_true` and the model predictions `y_pred` to update the state variables.
- `result(self)`: the state variables to compute the final results.
- `reset_states(self)`: reinitializes the state of the metric.

State update and results computation are kept separate (in `update_state()` and `result()`, respectively) because in some cases, results computation might be very expensive, and would only be done periodically.

Here's a simple example showing how to implement a `CatgoricalTruePositives` metric, that counts how many samples where correctly classified as belonging to a given class:

In [None]:
class CatgoricalTruePositives(keras.metrics.Metric):
    def __init__(self, name='categorical_true_positives', **kwargs):
        super(CatgoricalTruePositives, self).__init__(name=name, **kwargs)
        self.true_positives = self.add_weight(name='tp', initializer='zeros')
        
    def update_state(self, y_true, y_pred, sample_weight=None):
        y_pred = tf.reshape(tf.argmax(y_pred, axis=1), shape=(-1, 1))
        values = tf.cast(y_true, 'int32') == tf.cast(y_pred, 'int32')
        values = tf.cast(values, 'float32')
        if sample_weight is not None:
            sample_weight = tf.cast(sample_weight, 'float32')
            values = tf.multiply(values, sample_weight)
        self.true_positives.assign_add(tf.reduce_sum(values))
        
    def result(self):
        return self.true_positives
    
    def reset_states(self):
        # The state of the metric will be reset at the start of each epoch.
        self.true_positives.assign(0.)


        
model.compile(

)

model.fit(x_train, y_train,
          batch_size=64,
          epochs=3)

#### Using the loss and metric we defined to multi-input, multi-output models
Consider the following model, which has an image input of shape `(32, 32, 3)` (that's `(height, width, channels)`) and a timeseries input of shape `(None, 10)` (that's `(timesteps, features)`). Our model will have two outputs computed from the combination of these inputs: a "score" (of shape `(1,)`) and a probability distribution over five classes (of shape `(5,)`).

In [None]:
from tensorflow import keras
from tensorflow.keras import layers

image_input =
timeseries_input = 

x1 = layers.Conv2D(3, 3)(image_input)
x1 = layers.GlobalMaxPooling2D()(x1)

x2 = layers.Conv1D(3, 3)(timeseries_input)
x2 = layers.GlobalMaxPooling1D()(x2)

x = layers.concatenate([x1, x2])

score_output = 
class_output = 

model = 

Let's plot this model, so you can clearly see what we're doing here (note that the shapes shown in the plot are batch shapes, rather than per-sample shapes).

In [None]:
keras.utils.plot_model(model, 'multi_input_and_output_model.png', show_shapes=True)

At compilation time, **we can specify different losses to different ouptuts**, by passing the loss functions as a list:

``` python
model.compile(
    optimizer=keras.optimizers.RMSprop(1e-3),
    loss=[keras.losses.MeanSquaredError(),
          keras.losses.CategoricalCrossentropy()])
```

If we only passed a single loss function to the model, the same loss function would be applied to every output, which is not appropriate here.

Since we gave names to our output layers, we coud also specify per-output losses and metrics via a dict:

In [None]:
model.compile(

)

We recommend the **use of explicit names and dicts** if you have more than 2 outputs.

It's possible to give **different weights to different output-specific losses** (for instance, one might wish to privilege the "score" loss in our example, by giving to 2x the importance of the class loss), using the loss_weight argument:

In [None]:
model.compile(
    optimizer=keras.optimizers.RMSprop(1e-3),
    loss={'score_output': keras.losses.MeanSquaredError(),
          'class_output': keras.losses.CategoricalCrossentropy()},
    metrics={'score_output': [keras.metrics.MeanAbsolutePercentageError(),
                              keras.metrics.MeanAbsoluteError()],
             'class_output': [keras.metrics.CategoricalAccuracy()]},

)

In [None]:
# Generate dummy Numpy data
img_data = np.random.random_sample(size=(100, 32, 32, 3))
ts_data = np.random.random_sample(size=(100, 20, 10))
score_targets = np.random.random_sample(size=(100, 1))
class_targets = np.random.random_sample(size=(100, 5))


# Fit on lists
model.fit([img_data, ts_data], [score_targets, class_targets],
          batch_size=32,
          epochs=3)

"""
# Alernatively, fit on dicts
model.fit({'img_input': img_data, 'ts_input': ts_data},
          {'score_output': score_targets, 'class_output': class_targets},
          batch_size=32,
          epochs=3)
"""

Here's the `Dataset` use case: similarly as what we did for Numpy arrays, the `Dataset` should return a **tuple of dicts**.

In [None]:
train_dataset = tf.data.Dataset.from_tensor_slices(
    ({'img_input': img_data, 'ts_input': ts_data},
     {'score_output': score_targets, 'class_output': class_targets}))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)

model.fit(train_dataset, epochs=3)

### 7-3. Several options for optimizer and training
#### Using sample weighting and class weighting
Besides input data and target data, it is possible to pass **sample weights** or **class weights** to a model when using fit:

- When training from Numpy data: via the `sample_weight` and `class_weight` arguments.
- When training from Datasets: by having the Dataset return a tuple `(input_batch, target_batch, sample_weight_batch)`.

A **_"sample weights"_** array is an array of numbers that specify **how much weight each sample in a batch should have in computing the total loss.** **It is commonly used in imbalanced classification problems (the idea being to give more weight to rarely-seen classes)**. When the **weights used are ones and zeros**, the array can be used as a mask for the loss function (entirely discarding the contribution of certain samples to the total loss).

A **_"class weights"_** dict is a more specific instance of the same concept: it maps class indices to the sample weight that should be used for samples belonging to this class. For instance, if class "0" is twice less represented than class "1" in your data, you could use class_weight={0: 1., 1: 0.5}.

Here's a Numpy example where we use class weights or sample weights to give more importance to the correct classification of class #5 (which is the digit "5" in the MNIST dataset).

In [None]:
class_weight = {0: 1., 1: 1., 2: 1., 3: 1., 4: 1.,
                # Set weight "2" for class "5",
                # making this class 2x more important
                5: 2.,
                6: 1., 7: 1., 8: 1., 9: 1.}


model = get_compiled_model()

print('Fit with class weight')
model.fit(
    
)

# Here's the same example using `sample_weight` instead:
sample_weight = np.ones(shape=(len(y_train),))
sample_weight[y_train == 5] = 2.
print('\nFit with sample weight')

model = get_compiled_model()
model.fit(

)

Here's a matching `Dataset` example:

In [None]:
sample_weight = np.ones(shape=(len(y_train),))
sample_weight[y_train == 5] = 2.

# Create a Dataset that includes sample weights
# (3rd element in the return tuple).
train_dataset = tf.data.Dataset.from_tensor_slices(
    (x_train, y_train, sample_weight))

# Shuffle and slice the dataset.
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)

model = get_compiled_model()
model.fit(train_dataset, epochs=3)

#### Using callbacks 

Callbacks in Keras are objects that are called at different point during training (at the start of an epoch, at the end of a batch, at the end of an epoch, etc.) and which can be used to implement behaviors such as:

- Doing validation at different points during training (beyond the built-in per-epoch validation)
- Checkpointing the model at regular intervals or when it exceeds a certain accuracy threshold
- Changing the learning rate of the model when training seems to be plateauing
- Doing fine-tuning of the top layers when training seems to be plateauing
- Sending email or instant message notifications when training ends or where a certain performance threshold is exceeded
- Etc.

**Many built-in callbacks are available:**
- `ModelCheckpoint`: Periodically save the model.
- `EarlyStopping`: Stop training when training is no longer improving the validation metrics.
- `TensorBoard`: periodically write model logs that can be visualized in TensorBoard (more details in the section "Visualization").
- `CSVLogger`: streams loss and metrics data to a CSV file.
- etc.

Callbacks can be passed as a list to your call to `fit`:

In [None]:
model = get_compiled_model()

callbacks = [
    keras.callbacks.EarlyStopping(
        
    )
]


model.fit(

)

When you're training model on relatively large datasets, it's crucial to save checkpoints of your model at frequent intervals.

The easiest way to achieve this is with the ModelCheckpoint callback:

In [None]:
model = get_compiled_model()

callbacks = [
    keras.callbacks.ModelCheckpoint(
    
    )
]

model.fit(x_train, y_train,
          epochs=3,
          batch_size=64,
          callbacks=callbacks,
          validation_split=0.2)

**Writing your own callback** <br>
You can create a custom callback by extending the base class `keras.callbacks.Callback`. A callback has access to its associated model through the class property `self.model`.

Here's a simple example saving a list of per-batch loss values during training:

In [None]:
class LossHistory(keras.callbacks.Callback):
    def on_train_begin(self, logs):
        self.losses = []

    def on_batch_end(self, batch, logs):
        self.losses.append(logs.get('loss'))

You call also write your own callback for saving and restoring models.

#### Using learning rate schedules
A common pattern when training deep learning models is to gradually reduce the learning as training progresses. **This is generally known as "_learning rate decay_".**

The learning decay schedule could be **static** (fixed in advance, as a function of the current epoch or the current batch index), or **dynamic** (responding to the current behavior of the model, in particular the validation loss).

**Passing a schedule to an optimizer** <br>
You can easily use a _static learning rate decay_ schedule by passing a schedule object as the learning_rate argument in your optimizer:
[`tf.keras.optimizers.schedules.ExponentialDecay`](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/schedules/ExponentialDecay): A LearningRateSchedule that uses an exponential decay schedule.
- `initial_learning_rate`: A scalar float32 or float64 Tensor or a Python number. The initial learning rate.
- `decay_steps`: A scalar int32 or int64 Tensor or a Python number. Must be positive. See the decay computation above.
- `decay_rate`: A scalar float32 or float64 Tensor or a Python number. The decay rate.
- `staircase`: Boolean. If True decay the learning rate at discrete intervals
- `name`: String. Optional name of the operation. Defaults to 'ExponentialDecay'.

In [None]:
initial_learning_rate = 0.1
lr_schedule = keras.optimizers.schedules.ExponentialDecay(

)

# add the learning_rate we defeind to learning_rate argument in optimizer 
optimizer = keras.optimizers.RMSprop(

)

Several built-in schedules are available: `ExponentialDecay`, `PiecewiseConstantDecay`, `PolynomialDecay`, and `InverseTimeDecay`.

**Using callbacks to implement a dynamic learning rate schedule**
A _dynamic learning rate schedule_ (for instance, decreasing the learning rate when the validation loss is no longer improving) cannot be achieved with these schedule objects since the optimizer does not have access to validation metrics.

However, callbacks do have access to all metrics, including validation metrics! You can thus achieve this pattern **by using a callback that modifies the current learning rate on the optimizer.** In fact, this is even built-in as the `ReduceLROnPlateau` callback.

#### Visualizing loss and metrics during training
The best way to keep an eye on your model during training is to use [`TensorBoard`](https://www.tensorflow.org/tensorboard), a browser-based application that you can run locally that provides you with:

- Live plots of the loss and metrics for training and evaluation
- (optionally) Visualizations of the histograms of your layer activations
- (optionally) 3D visualizations of the embedding spaces learned by your Embedding layers

If you have installed TensorFlow with pip, you should be able to launch TensorBoard from the command line:

`tensorboard --logdir=/full_path_to_your_logs`

**Using the TensorBoard callback**
The easiest way to use TensorBoard with a Keras model and the fit method is the `TensorBoard` callback.

In the simplest case, just specify where you want the callback to write logs, and you're good to go:
```python
tensorboard_cbk = tf.keras.callbacks.TensorBoard(log_dir='/full_path_to_your_logs')
model.fit(dataset, epochs=10, callbacks=[tensorboard_cbk])
```

The `TensorBoard` callback has many useful options, including whether to log embeddings, histograms, and how often to write logs:
```python
keras.callbacks.TensorBoard(
  log_dir='/full_path_to_your_logs',
  histogram_freq=0,  # How often to log histogram visualizations
  embeddings_freq=0,  # How often to log embedding visualizations
  update_freq='epoch')  # How often to write logs (default: once per epoch)
```


**For more detailed usage about TensorBoard in your notebook, refer to this site:** https://www.tensorflow.org/tensorboard/r2/tensorboard_in_notebooks 

**If you use Google Colab, use the "TensorboardColab" package"** <br>
Installation: !pip install tensorboardcolab in your cell.