# Custom Models and Layers

Here we introduce how to write custom models and layers. 

In [1]:
import numpy as np
import tensorflow as tf
from pprint import pprint
from functools import reduce

## 1 Model class

Recall the earlier example of training a linear regression model in [notbook 1](./1_Tensors_Variables_Operations_and_AutoDiff.ipynb). In cell 24 we **defined and initialized the variables of the model**, and we also **specified the model's forward pass**(ie, given inputs, how the model computes the outputs using its variables.) 

These two can be better organized in a `model` class by subclass `tf.keras.Model`. 

In [2]:
true_weights = tf.constant([1,2,3,4,5], dtype=tf.float32)[:, tf.newaxis]
x = tf.constant(tf.random.uniform((32, 5)), dtype=tf.float32)
y = tf.constant(x @ true_weights, dtype=tf.float32)

With custom model class, we initialize the variables of the model in the `__init__` constructor and implement the forward pass in the `call` method.

In [3]:
class LinearModel(tf.keras.Model):
    def __init__(self, input_dim, output_dim, **kwargs):
        super(LinearModel, self).__init__(**kwargs)
        self.w = tf.Variable(tf.random.uniform((input_dim, output_dim)), dtype=tf.float32)
    
    @tf.function
    def call(self, x):
        return tf.matmul(x, self.w)

We can access the variables of the model throught the `.variables` property.

In [4]:
model = LinearModel(input_dim=5, output_dim=1)    
model.variables

[<tf.Variable 'Variable:0' shape=(5, 1) dtype=float32, numpy=
 array([[0.7385192 ],
        [0.09546971],
        [0.5498091 ],
        [0.6976471 ],
        [0.6082227 ]], dtype=float32)>]

And we can do the forward pass on data with the model object, just as if the model is a function.

In [5]:
o = model(x)
print(o[:5])

tf.Tensor(
[[1.7299302]
 [1.3127615]
 [1.4112923]
 [1.5546852]
 [1.3384575]], shape=(5, 1), dtype=float32)


With this model object, we need to modify the previous training loop to train it. Note that both `model.variables` and `gradients` are lists, so we are doing updates in a loop. 

This looping over the coupling of variables and gradients weill be more clear with next example.

In [6]:
def train(x, y, model, learning_rate=.01, max_epochs=10000, min_tol=1e-3, verbose=1000):
    best_loss = None
    for it in range(max_epochs):
        with tf.GradientTape() as tape:
            y_hat = model(x)
            loss = tf.reduce_mean(tf.square(y - y_hat))
            
        if not best_loss or best_loss > loss.numpy():
            best_loss = loss.numpy()
        if best_loss < min_tol:
            if verbose: print('terminated as loss less than minimal tolerance')
            break
            
        if verbose and not (it % verbose): 
            print('mse loss at iteration {} is {:5.4f}'.format(it, loss))
        
        gradients = tape.gradient(loss, model.variables)
        for variables, grads in zip(model.variables, gradients):
            variables.assign_add(-learning_rate * grads)

Let's see if we recovered the ground truth.

In [7]:
train(x, y, model)
model.variables

mse loss at iteration 0 is 39.6477
mse loss at iteration 1000 is 0.0280
mse loss at iteration 2000 is 0.0015
terminated as loss less than minimal tolerance


[<tf.Variable 'Variable:0' shape=(5, 1) dtype=float32, numpy=
 array([[1.0662938],
        [2.0163817],
        [2.955478 ],
        [4.037556 ],
        [4.9171414]], dtype=float32)>]

To add a bit complexity, we now add a bias term to the model.

In [8]:
class LinearModel(tf.keras.Model):
    def __init__(self, input_dim, output_dim, **kwargs):
        super(LinearModel, self).__init__(**kwargs)
        self.w = tf.Variable(tf.random.uniform((input_dim, output_dim)), dtype=tf.float32)
        self.b = tf.Variable(0, dtype=tf.float32)
        
    @tf.function
    def call(self, x):
        return tf.matmul(x, self.w) + self.b

In [9]:
model = LinearModel(input_dim=5, output_dim=1)    
train(x, y, model)
pprint(model.variables)

mse loss at iteration 0 is 45.3978
mse loss at iteration 1000 is 0.1439
mse loss at iteration 2000 is 0.0343
mse loss at iteration 3000 is 0.0103
mse loss at iteration 4000 is 0.0033
mse loss at iteration 5000 is 0.0010
terminated as loss less than minimal tolerance
[<tf.Variable 'Variable:0' shape=(5, 1) dtype=float32, numpy=
array([[0.95385635],
       [1.92548   ],
       [2.9684424 ],
       [3.9102137 ],
       [4.9867525 ]], dtype=float32)>,
 <tf.Variable 'Variable:0' shape=() dtype=float32, numpy=0.13544115>]


And we see the training code works just fine, with now the model has two sets of variables(as a result two sets of gradients). 

## 2 Layer class  

The `Model` class is usually for the overall computation which combines many repeated small computational modules. It is ok to code up the small modules with `tf.keras.Model` and then combine them, but its better to use the `Layer` class. 

Lets say that we want to upgrade the linear regression model to be a composition of two linear transformations. Here is how we utilize the `Layer` class. 

In [10]:
class Linear(tf.keras.layers.Layer):
    def __init__(self, units, use_bias=True, **kwargs):
        super(Linear, self).__init__(**kwargs)
        self.units = units
        self.use_bias = use_bias
        
    def build(self, input_shape):
        self.w = self.add_weight(shape=(input_shape[-1], self.units))
        if self.use_bias:
            self.b = self.add_weight(shape=(self.units), initializer="zeros")
        super().build(input_shape)
        
    @tf.function
    def call(self, x):
        output = tf.matmul(x, self.w)
        if self.use_bias:
            output += self.b
        return output

Note that with the `Layer` class, we actually creates the Variables in the `build` method, rather than the constructor. This method is called the first time there are data going through the layer, which allows dynamic determination of the sizes of the variables. This makes `layers` more ideal when you try to combining them. 

Now, lets stack up some layers to make a bigger model. 

In [11]:
class LinearModel(tf.keras.Model):
    def __init__(self, num_units, use_bias=True, **kwargs):
        super(LinearModel, self).__init__(**kwargs)
        self.model = [Linear(units, use_bias) for units in num_units]
    
    def call(self, x):
        for layer in self.model:
            x = layer(x)
        return x

Let's see the variables of the model before and after we pass some actual data through it.

In [12]:
model = LinearModel(num_units=(3,1), use_bias=True)
print(model.variables)
print('---line break---')
o = model(x)
pprint(model.variables)
print(o[:5])

[]
---line break---
[<tf.Variable 'linear_model_2/linear/Variable:0' shape=(5, 3) dtype=float32, numpy=
array([[ 0.16933554, -0.5840391 ,  0.00741726],
       [-0.3860661 , -0.43541566,  0.23082441],
       [-0.28682524, -0.7829349 , -0.10946316],
       [-0.8584915 ,  0.43169802,  0.07589662],
       [ 0.26881176,  0.35085648,  0.16928047]], dtype=float32)>,
 <tf.Variable 'linear_model_2/linear/Variable:0' shape=(3,) dtype=float32, numpy=array([0., 0., 0.], dtype=float32)>,
 <tf.Variable 'linear_model_2/linear_1/Variable:0' shape=(3, 1) dtype=float32, numpy=
array([[ 0.193771 ],
       [ 1.0162126],
       [-1.2153293]], dtype=float32)>,
 <tf.Variable 'linear_model_2/linear_1/Variable:0' shape=(1,) dtype=float32, numpy=array([0.], dtype=float32)>]
tf.Tensor(
[[-1.1336187 ]
 [-0.91470057]
 [-1.1494504 ]
 [-0.7444345 ]
 [-0.8408779 ]], shape=(5, 1), dtype=float32)


Let's train a model with many linear layers stacked together.

In [13]:
model = LinearModel(num_units=(4,3,2,1), use_bias=False)
train(x, y, model, learning_rate=.005)
pprint(model.variables)

mse loss at iteration 0 is 65.6646
terminated as loss less than minimal tolerance
[<tf.Variable 'linear_model_3/linear_2/Variable:0' shape=(5, 4) dtype=float32, numpy=
array([[ 0.048112  , -0.27880564, -0.06568413, -0.03815908],
       [ 0.27987158, -0.08447357,  0.67776585, -0.8038371 ],
       [ 0.54184616, -0.5339778 , -0.4366298 , -0.6009581 ],
       [-0.3520833 , -0.6955261 , -0.7057435 , -0.65394557],
       [-0.15610582, -1.2245682 ,  0.72058666, -0.09791303]],
      dtype=float32)>,
 <tf.Variable 'linear_model_3/linear_3/Variable:0' shape=(4, 3) dtype=float32, numpy=
array([[-0.24497788, -0.30466115,  0.24863297],
       [-0.11622719, -1.2489358 , -0.50340974],
       [ 0.24869917, -0.35284668,  0.66910475],
       [ 0.694468  , -0.34810445, -0.77700156]], dtype=float32)>,
 <tf.Variable 'linear_model_3/linear_4/Variable:0' shape=(3, 2) dtype=float32, numpy=
array([[-0.99073446,  0.46119884],
       [-0.95910925, -1.1749245 ],
       [-0.5191465 , -0.78819185]], dtype=float32)>

It may seem to be a bit more interesting than our old linear regression model at first glance. But the composition of linear transformations is still a linear transformation. With all these layers, we are essentially doing the same linear regression. 

In [14]:
reduced_model = reduce(tf.matmul, model.variables)
print(reduced_model)
print(tf.reduce_sum(tf.cast((model(x) - x @ reduced_model) < 1e-5, tf.float32)))

tf.Tensor(
[[1.1024506]
 [1.9742372]
 [2.975999 ]
 [3.9605384]
 [5.001549 ]], shape=(5, 1), dtype=float32)
tf.Tensor(32.0, shape=(), dtype=float32)


There are many options for layers that are readily available from tensorflow, and there are pre-configured models too. We will get there in the future. 