# CS492 전산학특강<인공지능 산업 및 스마트에너지>
## Deep Learning Practice 
#### Prof. Ho-Jin Choi
#### School of Computing, KAIST

---

### 7-4. Subclassing and GradientTape

In [None]:
from __future__ import absolute_import, division, print_function, unicode_literals

try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
    pass
import tensorflow as tf

tf.keras.backend.clear_session()  # For easy reset of notebook state.

#### Subclassing 

Building below model using `Sequential`.
``` python
(input: 784-dimensional vectors)
       ↧
[Dense (64 units, relu activation)]
       ↧
[Dense (10 units, softmax activation)]
       ↧
(output: probability distribution over 10 classes)
```

In [None]:
from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([

])

model.summary()

Building a model using functional API:

In [None]:
inputs =
x = 
outputs = 
model = 


model.summary()

Building a model using subclassing:
- `init`: definie the model structure 
- `call`: calcuate the forward passing

In [None]:
from tensorflow import keras
from tensorflow.keras import layers

class MyClassifier(tf.keras.Model):
    def __init__(self):
        super(MyClassifier, self).__init__()

        
    def call(self, x):


my_model = MyClassifier()

In [None]:
# Load a toy dataset for the sake of this example
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Preprocess the data (these are Numpy arrays)
x_train = x_train.reshape(60000, 784).astype('float32') / 255
x_test = x_test.reshape(10000, 784).astype('float32') / 255

y_train = y_train.astype('float32')
y_test = y_test.astype('float32')

my_model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

my_model.fit(x_train, y_train, epochs=3)

In [None]:
my_model.evaluate(x_test, y_test, verbose=0)

#### GradientTape
TensorFlow provides the [`tf.GradientTape`](https://www.tensorflow.org/api_docs/python/tf/GradientTape) API for _automatic differentiation_ - computing the gradient of a computation with respect to its input variables. 

Tensorflow "records" all operations executed inside the context of a `tf.GradientTape` onto a _"tape"_. Tensorflow then uses that tape and the gradients associated with each recorded operation to compute the gradients of a "recorded" computation using reverse mode differentiation.

For example:
- [`tf.GradientTape.watch(tensor)`](https://www.tensorflow.org/api_docs/python/tf/GradientTape#watch): Ensures that tensor is being traced by this tape.
- [`tf.GradientTape.gradient(target,source)`](https://www.tensorflow.org/api_docs/python/tf/GradientTape#gradient): Computes the gradient using operations recorded in context of this tape.
    - `target`: Tensor (or list of tensors) to be differentiated.
    - `source`: A list or nested structure of Tensors or Variables. `target` will be differentiated against elements in `sources`.

In [None]:
x = tf.ones((2, 2))
# x = [[1, 1]
#      [1, 1]]

with tf.GradientTape() as tape:
    tape.watch(x)
    y = tf.reduce_sum(x) # 4
    z = tf.multiply(y, y) # y^2 

# Use the tape to compute the derivative of z with respect to the
# intermediate value y.
# z = y^2
dz_dy =  # 8.0 (2y at y=4.0)
print(dz_dy)

assert dz_dy.numpy() == 8.0

By default, the resources held by a GradientTape are released as soon as GradientTape.gradient() method is called. To compute multiple gradients over the same computation, create a persistent gradient tape. This allows multiple calls to the `gradient()` method as resources are released when the tape object is garbage collected. For example:

In [None]:
x = tf.constant(3.0)
with tf.GradientTape(persistent=True) as tape:
    tape.watch(x)
    y = x * x # x^2
    z = y * y # x^4
    
dz_dx =   # 108.0 (4*x^3 at x = 3)
print("dz_dx: {}".format(dz_dx))

dy_dx =   # 6.0
print("dy_dx: {}".format(dy_dx))

del tape  # Drop the reference to the tape

#### Training the model with GradientTape
Calling a model inside a `GradientTape` scope **enables you to retrieve the gradients of the trainable weights** of the layer with respect to a loss value. Using an optimizer instance, you can **use these gradients to update these variables (which you can retrieve using model.trainable_weights)**.

Let's reuse our MNIST model using subclassing and let's train it using mini-batch gradient with a custom training loop.

In [None]:
from tensorflow import keras
from tensorflow.keras import layers

class MyClassifier(tf.keras.Model):
    def __init__(self):
        super(MyClassifier, self).__init__()
        self.input_layer = layers.Flatten()
        self.hidden_layer = layers.Dense(64, activation='relu', name='dense_1')
        self.output_layer = layers.Dense(10, activation='softmax', name='predictions')
        
    def call(self, x):
        x = self.input_layer(x)
        x = self.hidden_layer(x)
        outputs = self.output_layer(x)
        return outputs
    
my_model = MyClassifier()

In [None]:
# Load a toy dataset for the sake of this example
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Preprocess the data (these are Numpy arrays)
x_train = x_train.reshape(60000, 784).astype('float32') / 255
x_test = x_test.reshape(10000, 784).astype('float32') / 255

y_train = y_train.astype('float32')
y_test = y_test.astype('float32')

# Reserve 10,000 samples for validation
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]

In [None]:
# Prepare the training dataset.
batch_size = 64
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)

# Prepare the validation dataset.
val_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val))
val_dataset = val_dataset.batch(batch_size)

In [None]:
# Instantiate an optimizer.
optimizer = keras.optimizers.SGD(learning_rate=1e-3)
# Instantiate a loss function.
loss_fn = keras.losses.SparseCategoricalCrossentropy()

In [None]:
# Prepare the metrics.
train_acc_metric = 
train_loss = 

val_acc_metric = 
val_loss =

In [None]:
# Iterate over epochs.
for epoch in range(3):
    print('\n\nStart of epoch %d' % (epoch,))

    # Iterate over the batches of the dataset.
    for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):

        # Open a GradientTape to record the operations run
        # during the forward pass, which enables autodifferentiation.
        with tf.GradientTape() as tape:
            # Run the forward pass of the layer.
            # The operations that the layer applies
            # to its inputs are going to be recorded
            # on the GradientTape.
            logits =  # Logits for this minibatch

            # Compute the loss value for this minibatch.
            loss_value = 

        # Use the gradient tape to automatically retrieve
        # the gradients of the trainable variables with respect to the loss.
        grads = 
        
        # Run one step of gradient descent by updating
        # the value of the variables to minimize the loss.

        
        # Update training metric.
        

        # Log every 200 batches.
        if step % 200 == 0:
            print('Training loss (for one batch) at step %s: %s' % (step, float(loss_value)))
            print('Seen so far: %s samples' % ((step + 1) * 64))
            
    # Display metrics at the end of each epoch.
    train_acc = 
    print("-------------------------------------------")
    print('Training loss: %.3f | acc over epoch: %s' % (train_loss.result(), float(train_acc),))
        
    # Run a validation loop at the end of each epoch.
    for x_batch_val, y_batch_val in val_dataset:
        val_logits =
        v_loss = 
        
        
        val_loss(v_loss)
        val_acc_metric(y_batch_val, val_logits)
        

    print("-------------------------------------------")
    print('Validation avg loss: %.3f | acc: %s' % (val_loss.result(), float(val_acc_metric.result()),))
    
    # Reset the metrics for the next epoch
    train_acc_metric.reset_states()
    train_loss.reset_states()

    val_acc_metric.reset_states()
    val_loss.reset_states()  

Evaluate the model

In [None]:
batch_size = 64
test_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test))
test_dataset = test_dataset.batch(batch_size)

In [None]:
my_model.compile(optimizer=optimizer,
              loss=loss_fn,
              metrics=[keras.metrics.SparseCategoricalAccuracy()])

In [None]:
test_loss, test_acc = my_model.evaluate(test_dataset)
print('Loss: {}, Acc: {}'.format(test_loss, test_acc))