### How to subclass keras models

A lot of times, we may want to create our own complicated models. We can subclass the `keras.Model` class which exposes the `fit()`, `evaluate()`, and `predict()` methods. We can do things like get the layers in the model and also save it and load things. Let's get into it. 

A rule of thumb: If you need to call the `fit()` method on what you are making, then you should subclass the `Model` class. If not, then you should probably use a `Layer`. 

In [2]:
import tensorflow as tf 

from tensorflow import keras
from tensorflow.keras import layers

2023-07-11 23:41:22.579380: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-07-11 23:41:22.581232: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-07-11 23:41:22.616202: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-07-11 23:41:22.617009: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [3]:
class Linear(keras.layers.Layer): 

    def __init__(self, units = 32, input_dim = 32): 
        super().__init__()

        self.w = self.add_weight(
            shape = (input_dim, units), initializer='random_normal', trainable=True
        )

        self.b = self.add_weight(shape = (units, ), initializer='zeros', trainable=True)

    def call(self, inputs): 
        return tf.matmul(inputs, self.w) + self.b

In [4]:

x = tf.ones((2, 2))
linear_layer = Linear(units = 2, input_dim=2)
linear_layer(x)

<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[-0.08255447, -0.01942443],
       [-0.08255447, -0.01942443]], dtype=float32)>

In [5]:
class Sampling(layers.Layer):
    """Uses (z_mean, z_log_var) to sample z, the vector encoding a digit."""

    def call(self, inputs):
        z_mean, z_log_var = inputs
        batch = tf.shape(z_mean)[0]
        dim = tf.shape(z_mean)[1]
        epsilon = keras.backend.random_normal(shape=(batch, dim))
        return z_mean + tf.exp(0.5 * z_log_var) * epsilon

In [6]:
class Encoder(layers.Layer): 
    """Maps MNIST digits to a triplet of (z_mean, z_log_var, z)

    Args:
        layers (_type_): _description_
    """
    def __init__(self, latent_dim = 32, intermediate_dim = 64, name = 'encoder', **kwargs): 
        super().__init__(name = name, **kwargs)

        self.dense_proj = layers.Dense(intermediate_dim, activation = 'relu')
        self.dense_mean = layers.Dense(latent_dim)
        self.dense_log_var = layers.Dense(latent_dim)
        self.sampling = Sampling()

    def call(self, inputs): 
        x = self.dense_proj(inputs)
        z_mean = self.dense_mean(x)
        z_log_var = self.dense_log_var(x)
        z = self.sampling((z_mean, z_log_var))
        return z_mean, z_log_var, z
    
class Decoder(layers.Layer): 

    def __init__(self, original_dim, intermediate_dim = 64, name = 'decoder', **kwargs): 
        super().__init__(name = name, **kwargs)
        self.dense_proj = layers.Dense(intermediate_dim, activation = 'relu')
        self.dense_output = layers.Dense(original_dim, activation = 'sigmoid') # This is what we are outputting 

    def call(self, inputs): 
        x = self.dense_proj(inputs)
        return self.dense_output(x)
    
class VariationalAutoEncoder(keras.Model): 

    """
    Combines the encoder and decoder into one end-to-end model
    """

    def __init__(self, original_dim, intermediate_dim = 64, latent_dim = 32, name = 'autoencoder', **kwargs): 
        super().__init__(name = name, **kwargs)

        self.original_dim = original_dim
        self.encoder = Encoder(latent_dim=latent_dim, intermediate_dim=intermediate_dim)
        self.decoder = Decoder(original_dim=original_dim, intermediate_dim=intermediate_dim)

    def call(self, inputs): 
        z_mean, z_log_var, z = self.encoder(inputs)
        reconstructed = self.decoder(z)

        kl_loss = -0.5 * tf.reduce_mean(z_log_var - tf.square(z_mean) - tf.exp(z_log_var) + 1)
        self.add_loss(kl_loss)
        return reconstructed
    
    

#### Writing a training loop on mnist dataset

In [7]:
original_dim = 784
vae = VariationalAutoEncoder(original_dim=original_dim, intermediate_dim=64, latent_dim=32)

optimizer = keras.optimizers.Adam(learning_rate=0.001)
mse_loss_fn = keras.losses.MeanSquaredError()
loss_metric = keras.metrics.Mean() 

In [8]:
(x_train, _), _ = keras.datasets.mnist.load_data()
x_train = x_train.reshape(60000, 784).astype('float32') / 255

# We create a train dataset where we take the mnist data and batch it into 64
train_dataset = tf.data.Dataset.from_tensor_slices(x_train)
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)

In this dataset, we have 60000 examples of digits that are 28 by 28. 

In [9]:
EPOCHS = 2
for epoch in range(EPOCHS): 

    for step, x_batch_train in enumerate(train_dataset): 
        with tf.GradientTape() as tape: 
            reconstructed = vae(x_batch_train)

            loss = mse_loss_fn(x_batch_train, reconstructed)
            loss += sum(vae.losses)

        grads = tape.gradient(loss, vae.trainable_weights)
        optimizer.apply_gradients(zip(grads, vae.trainable_weights))
        loss_metric(loss)

        if step % 100 == 0: 
            print("step %d: mean loss = %.4f" % (step, loss_metric.result()))




2023-07-11 23:41:28.381410: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype float and shape [60000,784]
	 [[{{node Placeholder/_0}}]]
2023-07-11 23:41:28.381622: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype float and shape [60000,784]
	 [[{{node Placeholder/_0}}]]


step 0: mean loss = 0.3301
step 100: mean loss = 0.1255
step 200: mean loss = 0.0991
step 300: mean loss = 0.0892
step 400: mean loss = 0.0842
step 500: mean loss = 0.0809
step 600: mean loss = 0.0787
step 700: mean loss = 0.0771
step 800: mean loss = 0.0760
step 900: mean loss = 0.0749
step 0: mean loss = 0.0747
step 100: mean loss = 0.0740
step 200: mean loss = 0.0735
step 300: mean loss = 0.0730
step 400: mean loss = 0.0727
step 500: mean loss = 0.0723
step 600: mean loss = 0.0720
step 700: mean loss = 0.0717
step 800: mean loss = 0.0715
step 900: mean loss = 0.0712


In [10]:
reconstructed

<tf.Tensor: shape=(32, 784), dtype=float32, numpy=
array([[0.00273651, 0.00332379, 0.00437694, ..., 0.00553383, 0.00263036,
        0.00246718],
       [0.01085568, 0.01114361, 0.01539413, ..., 0.01720259, 0.01066354,
        0.00600129],
       [0.01805389, 0.01679887, 0.00758007, ..., 0.00942108, 0.00831249,
        0.00986585],
       ...,
       [0.02457168, 0.02050058, 0.03342368, ..., 0.02863369, 0.02739112,
        0.02686335],
       [0.00966561, 0.00497065, 0.00836083, ..., 0.00921608, 0.00758252,
        0.00725008],
       [0.00517703, 0.00732105, 0.00587739, ..., 0.00726794, 0.00401937,
        0.00605658]], dtype=float32)>

In [11]:
x_batch_train

<tf.Tensor: shape=(32, 784), dtype=float32, numpy=
array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)>

We could have done all of the above here by using the `fit` method which is exposed to us because we did the model subclassing. 

In [12]:
vae_2 = VariationalAutoEncoder(
    original_dim=original_dim, 
    intermediate_dim=10, 
    latent_dim=10, 
    name='autoencoder_2'
)

In [15]:
optimizer = keras.optimizers.Adam(learning_rate=0.001)
vae_2.compile(optimizer = optimizer, loss = keras.losses.MeanSquaredError())
vae_2.fit(x_train, x_train, epochs = 2, batch_size = 64)

Epoch 1/2
Epoch 2/2


<keras.callbacks.History at 0x7f21784e7a90>

### Conclusion 

There are a couple of things that we can conclude from this exercise 
* Subclassing layers should be done when you want to introduce a certain part of your network to be used for later
    * You are going to put in parameters in the `__init__` method and make sure you run `super().__init__(**kwargs)` at the top
    * You have to overwrite the `call()` method which takes some input and sends out some output. This output can be whatever you want it to be
* Subclassing models should be done when you want to build an entire end to end pipeline with smaller components
    * The `Sequential` model is a subclass of this general `Model` class
    * We also update the `call()` method of this class to use our individual units
    * The methods of `fit` and `save` methods are exposed for us here which is great. 