# Custom Models and Layers

Here we introduce how to write custom models and layers. 

In [40]:
import numpy as np
import tensorflow as tf
from pprint import pprint
from functools import reduce

## 1 Model class

Recall the earlier example of training a linear regression model in [notbook 1](./1_Tensors_Variables_Operations_and_AutoDiff.ipynb). In cell 24 we **defined and initialized the variables of the model**, and we also **specified the model's forward pass**(ie, given inputs, how the model computes the outputs using its variables.) 

These two can be better organized in a `model` class by subclass `tf.keras.Model`. 

In [2]:
true_weights = tf.constant([1,2,3,4,5], dtype=tf.float32)[:, tf.newaxis]
x = tf.constant(tf.random.uniform((5, 5)), dtype=tf.float32)
y = tf.constant(x @ true_weights, dtype=tf.float32)

LR = .1
MAX_EPOCHS = 5000

With custom model class, we initialize the variables of the model in the `__init__` constructor and implement the forward pass in the `call` method.

In [3]:
class LinearModel(tf.keras.Model):
    def __init__(self, input_dim, output_dim, **kwargs):
        super(LinearModel, self).__init__(**kwargs)
        self.w = tf.Variable(tf.random.uniform((input_dim, output_dim)), dtype=tf.float32)
    
    @tf.function
    def call(self, x):
        return tf.matmul(x, self.w)

We can access the variables of the model throught the `.variables` property.

In [4]:
model = LinearModel(input_dim=5, output_dim=1)    
model.variables

[<tf.Variable 'Variable:0' shape=(5, 1) dtype=float32, numpy=
 array([[0.13378656],
        [0.1457566 ],
        [0.3716122 ],
        [0.13667178],
        [0.28042758]], dtype=float32)>]

And we can do the forward pass on data with the model object, just as if the model is a function.

In [5]:
o = model(x)
print(o)

tf.Tensor(
[[0.44342452]
 [0.29675412]
 [0.55513775]
 [0.69912523]
 [0.34083742]], shape=(5, 1), dtype=float32)


With this model object, we need to modify the previous training loop to train it. Note that both `model.variables` and `gradients` are lists, so we are doing updates in a loop. 

This looping over the coupling of variables and gradients weill be more clear with next example.

In [6]:
def train(x, y, model, learning_rate=LR, max_epochs=MAX_EPOCHS, verbose=0):
    for it in range(max_epochs):
        with tf.GradientTape() as tape:
            y_hat = model(x)
            loss = tf.reduce_mean(tf.square(y - y_hat))
        if verbose and not (it % verbose): 
            print('mse loss at iteration {} is {:5.4f}'.format(it, loss))
        gradients = tape.gradient(loss, model.variables)
        for variables, grads in zip(model.variables, gradients):
            variables.assign_add(-learning_rate * grads)

Let's see if we recovered the ground truth.

In [7]:
train(x, y, model)
model.variables

[<tf.Variable 'Variable:0' shape=(5, 1) dtype=float32, numpy=
 array([[1.0523298],
        [2.1180644],
        [2.920983 ],
        [3.9075792],
        [4.990562 ]], dtype=float32)>]

To add a bit complexity, we now add a bias term to the model.

In [8]:
class LinearModel(tf.keras.Model):
    def __init__(self, input_dim, output_dim, **kwargs):
        super(LinearModel, self).__init__(**kwargs)
        self.w = tf.Variable(tf.random.uniform((input_dim, output_dim)), dtype=tf.float32)
        self.b = tf.Variable(0, dtype=tf.float32)
        
    @tf.function
    def call(self, x):
        return tf.matmul(x, self.w) + self.b

In [9]:
model = LinearModel(input_dim=5, output_dim=1)    
train(x, y, model, learning_rate=.1, max_epochs=2000, verbose=0)
pprint(model.variables)

[<tf.Variable 'Variable:0' shape=(5, 1) dtype=float32, numpy=
array([[1.099868 ],
       [2.1637936],
       [2.9128673],
       [3.8972185],
       [5.0019126]], dtype=float32)>,
 <tf.Variable 'Variable:0' shape=() dtype=float32, numpy=-0.044665385>]


And we see the training code works just fine, with now the model has two sets of variables(as a result two sets of gradients). 

## 2 Layer class  

The `Model` class is usually for the overall computation which combines many repeated small computational modules. It is ok to code up the small modules with `tf.keras.Model` and then combine them, but its better to use the `Layer` class. 

Lets say that we want to upgrade the linear regression model to be a composition of two linear transformations. Here is how we utilize the `Layer` class. 

In [10]:
class Linear(tf.keras.layers.Layer):
    def __init__(self, units, use_bias=True, **kwargs):
        super(Linear, self).__init__(**kwargs)
        self.units = units
        self.use_bias = use_bias
        
    def build(self, input_shape):
        self.w = self.add_weight(shape=(input_shape[-1], self.units))
        if self.use_bias:
            self.b = self.add_weight(shape=(self.units), initializer="zeros")
        super().build(input_shape)
        
    @tf.function
    def call(self, x):
        output = tf.matmul(x, self.w)
        if self.use_bias:
            output += self.b
        return output

Note that with the `Layer` class, we actually creates the Variables in the `build` method, rather than the constructor. This method is called the first time there are data going through the layer, which allows dynamic determination of the sizes of the variables. This makes `layers` more ideal when you try to combining them. 

Now, lets stack up some layers to make a bigger model. 

In [11]:
class LinearModel(tf.keras.Model):
    def __init__(self, num_units, use_bias=True, **kwargs):
        super(LinearModel, self).__init__(**kwargs)
        self.model = [Linear(units, use_bias) for units in num_units]
    
    def call(self, x):
        for layer in self.model:
            x = layer(x)
        return x

Let's see the variables of the model before and after we pass some actual data through it.

In [12]:
model = LinearModel(num_units=(3,1), use_bias=True)
print(model.variables)
print('---line break---')
o = model(x)
pprint(model.variables)
print(o)

[]
---line break---
[<tf.Variable 'linear_model_2/linear/Variable:0' shape=(5, 3) dtype=float32, numpy=
array([[ 0.2700736 , -0.11668235,  0.44956177],
       [ 0.64381224,  0.6111501 ,  0.40530437],
       [ 0.50504965,  0.6354911 , -0.8027457 ],
       [ 0.43543607,  0.46089858, -0.50693005],
       [ 0.4396996 ,  0.81468815, -0.20702368]], dtype=float32)>,
 <tf.Variable 'linear_model_2/linear/Variable:0' shape=(3,) dtype=float32, numpy=array([0., 0., 0.], dtype=float32)>,
 <tf.Variable 'linear_model_2/linear_1/Variable:0' shape=(3, 1) dtype=float32, numpy=
array([[-0.01050568],
       [-0.8907435 ],
       [ 0.3596264 ]], dtype=float32)>,
 <tf.Variable 'linear_model_2/linear_1/Variable:0' shape=(1,) dtype=float32, numpy=array([0.], dtype=float32)>]
tf.Tensor(
[[-1.1694874]
 [-0.4110697]
 [-1.1702682]
 [-1.7282639]
 [-0.4201784]], shape=(5, 1), dtype=float32)


Let's train a model with many linear layers stacked together.

In [26]:
model = LinearModel(num_units=(4,3,2,1), use_bias=False)
train(x, y, model, learning_rate=0.01, max_epochs=2000, verbose=200)

mse loss at iteration 0 is 46.6503
mse loss at iteration 200 is 0.0002
mse loss at iteration 400 is 0.0001
mse loss at iteration 600 is 0.0001
mse loss at iteration 800 is 0.0000
mse loss at iteration 1000 is 0.0000
mse loss at iteration 1200 is 0.0000
mse loss at iteration 1400 is 0.0000
mse loss at iteration 1600 is 0.0000
mse loss at iteration 1800 is 0.0000


In [31]:
pprint(model.variables)

[<tf.Variable 'linear_model_8/linear_14/Variable:0' shape=(5, 4) dtype=float32, numpy=
array([[-0.30366537, -0.16996984, -0.637681  ,  0.37072608],
       [-0.87186205,  0.07727206,  0.5329781 , -0.46546325],
       [-0.6101041 , -0.50876856, -0.6042015 ,  0.04365033],
       [ 0.15343004, -1.0391084 , -0.10087277, -0.88128865],
       [-0.8673619 , -0.14706558, -0.4924126 , -0.833977  ]],
      dtype=float32)>,
 <tf.Variable 'linear_model_8/linear_15/Variable:0' shape=(4, 3) dtype=float32, numpy=
array([[-0.3303018 ,  0.9024163 ,  0.4269654 ],
       [-0.18246803,  0.56032246,  0.7382356 ],
       [ 0.00510553,  0.81914425,  0.09659762],
       [ 0.6227235 ,  0.97610795,  0.51054686]], dtype=float32)>,
 <tf.Variable 'linear_model_8/linear_16/Variable:0' shape=(3, 2) dtype=float32, numpy=
array([[ 0.66775626, -0.23800902],
       [ 0.4104188 ,  1.0111378 ],
       [ 1.2080138 , -0.12563725]], dtype=float32)>,
 <tf.Variable 'linear_model_8/linear_17/Variable:0' shape=(2, 1) dtype=float3

It may seem to be a bit more interesting than our old linear regression model at first glance. But composition of linear transformations is still a linear transformation. With all these layers, we are essentially doing the same linear regression. 

In [51]:
reduced_model = reduce(tf.matmul, model.variables)
print(reduced_model)
print(tf.reduce_sum(tf.cast((model(x) - x @ reduced_model) < 1e-5, tf.float32)))

tf.Tensor(
[[0.9889582]
 [1.9750128]
 [3.0168242]
 [4.0194426]
 [5.0019574]], shape=(5, 1), dtype=float32)
tf.Tensor(5.0, shape=(), dtype=float32)
