# <p style="text-align:center;">Tensorflow - VI</p>
---
*<p style="text-align:right;">Reference: Tensorflow Official Docs</p>*


In [1]:
import tensorflow as tf
import numpy as np
import timeit
from datetime import datetime

# Introduction to Modules, Layers & Models
## Overview
To do machine learning in TensorFlow, you are likely to need to define, save, and restore a model.

A model is, abstractly:

* A function that computes something on tensors (a forward pass)
* Some variables that can be updated in response to training

In this guide, you will go below the surface of Keras to see how TensorFlow models are defined. This looks at how TensorFlow collects variables and models, as well as how they are saved and restored.

In [2]:
%load_ext tensorboard

## 1. Defining models and layers in Tensorflow

Most models are made of layers. Layers are functions with a known mathematical structure that can be reused and have trainable varaibles. In Tensorflow, most high-level implementations of layers and models, such as Keras or Sonnet, are built on the same foundational class: `tf.Module`

Here's an example of very simple `tf.Module` that operates on a scalar tensor:

In [3]:
class SimpleModule(tf.Module):
    def __init__(self,name=None):
        super().__init__(name=name)
        self.a_variable = tf.Variable(5.0, name = 'train_me')
        self.non_trainable_variable = tf.Variable(5.0, trainable = False, name = 'do_not_train_me')
    def __call__(self,x):
        return self.a_variable*x + self.non_trainable_variable #wx + b
simple_module = SimpleModule(name = 'simple')

simple_module(tf.constant(5.0))

<tf.Tensor: shape=(), dtype=float32, numpy=30.0>

Modules and, by extension, layers are deep-learning terminology for "objects": they have internal state, and methods that use that state.

There is nothing special about `__call__` except to act like a Python callable; you can invoke your models with whatever functions you wish.

You can set the trainability of variables on and off for any reason, including freezing layers and variables during fine-tuning.

By subclassing `tf.Module`, any `tf.Variable` ot `tf.Module` instances assigned to this object's properties are automatically collected. This allows you to save and load variables

In [4]:
print("trainable variables:", simple_module.trainable_variables)
print("all trainable variables:", simple_module.variables)

trainable variables: (<tf.Variable 'train_me:0' shape=() dtype=float32, numpy=5.0>,)
all trainable variables: (<tf.Variable 'train_me:0' shape=() dtype=float32, numpy=5.0>, <tf.Variable 'do_not_train_me:0' shape=() dtype=float32, numpy=5.0>)


This is an example of a two-layer linear layer model made out of modules.

First a dense (linear) layer:

In [5]:
# defining what a single perceptron would do -> does relu(w.T * X + b)
# and returns it 
class Dense(tf.Module):
    def __init__(self, in_features, out_features, name = None):
        super().__init__(name=name)
        self.w = tf.Variable(tf.random.normal([in_features,out_features]), name = 'w')
        self.b = tf.Variable(tf.zeros([out_features]), name = 'b')
    def __call__(self,x):
        y = tf.matmul(x, self.w) + self.b
        return tf.nn.relu(y)

**Understand the matrix operations**

`w` and `b` are initialised inside the `Dense` class as (`in_features`, `out_features`)  and (`out_features`,) matrix respectively, as defined in the class and gets multiplied by a array given by the user when creating an instance of the `SequentialModel` class. 

In this case for 1st dense layer
- we supply a costant [2,2,2]. Call it x
- `w` is initalised as a (3,3) matrix with random numbers from normal distribution
- b is a (3,1) bias vector filled with zeros
- we multiply x and w ((1,3) X (3,3)) to get (1,3) resulting array

In [6]:
W = tf.Variable(tf.random.normal([3,3]), name = 'W')
B = tf.Variable(tf.zeros([3]), name = 'B')
X = tf.constant([[2.0,2.0,2.0]])

imd = tf.matmul(X,W)

In [7]:
print(imd)

tf.Tensor([[-1.8021361  2.0239859  1.3632247]], shape=(1, 3), dtype=float32)


- This resulting array gets added with b ((1,3) + (3,)) to get (1,3) array whose relu is returned. This is `self.dense_1` 

In [8]:
Y = imd + B
print(Y)

tf.Tensor([[-1.8021361  2.0239859  1.3632247]], shape=(1, 3), dtype=float32)


In [9]:
fin_res = tf.nn.relu(Y)
print(fin_res)

tf.Tensor([[0.        2.0239859 1.3632247]], shape=(1, 3), dtype=float32)


For next layer, i.e., `self.dense_2` the o/p from `self.dense_1` is the i/p to it. See the `__call__` section. The o/p from `self.dense_1` (say `fin_res`) is a (1,3) vector. The `in_features` and `out_features` are (3,2). Thus for 2nd layer

- we supply `fin_res` as i/p. 
- `w` is initalised as a (3,2) matrix with random numbers from normal distribution
- b is a (2,1) bias vector filled with zeros
- we multiply `fin_res` and w ((1,3) X (3,2)) to get (1,2) resulting array

In [10]:
W_2 = tf.Variable(tf.random.normal([3,2]), name = 'W_2')
B_2 = tf.Variable(tf.zeros([2]), name = 'B_2')
X_2 = fin_res

imd_2 = tf.matmul(X_2,W_2)
Y_2 = imd_2 + B_2
fin_res_2 = tf.nn.relu(Y_2)
print(fin_res_2)

tf.Tensor([[0. 0.]], shape=(1, 2), dtype=float32)


Complete Model, makes 2 layer instances and applies them

In [11]:
class SequentialModel(tf.Module):
    def __init__(self,name=None):
        super().__init__(name=None)
        
        self.dense_1 = Dense(in_features = 3, out_features = 3)
        self.dense_2 = Dense(in_features = 3, out_features = 2)
        
    def __call__(self,x):
        x = self.dense_1(x)
        return self.dense_2(x)

my_model = SequentialModel(name = "the_model")
print("Model results:", my_model(tf.constant([[2.0, 2.0, 2.0]])))

Model results: tf.Tensor([[0. 0.]], shape=(1, 2), dtype=float32)


`tf.Module` instances will automatically collect, recursively, any `tf.Variable` or `tf.Module` instances assigned to it. This allows you to manage collections of `tf.Modules` with a single model instance, and save and load whole models.

In [12]:
print("Submodules:", my_model.submodules)

Submodules: (<__main__.Dense object at 0x00000217EF4422B0>, <__main__.Dense object at 0x00000217FA05FD90>)


In [13]:
for var in my_model.variables:
  print(var, "\n")

<tf.Variable 'b:0' shape=(3,) dtype=float32, numpy=array([0., 0., 0.], dtype=float32)> 

<tf.Variable 'w:0' shape=(3, 3) dtype=float32, numpy=
array([[ 0.01059214, -0.16848826,  0.74092203],
       [-2.0518894 ,  0.06184238, -0.5676788 ],
       [-1.7585919 ,  1.2362754 ,  0.3948643 ]], dtype=float32)> 

<tf.Variable 'b:0' shape=(2,) dtype=float32, numpy=array([0., 0.], dtype=float32)> 

<tf.Variable 'w:0' shape=(3, 2) dtype=float32, numpy=
array([[-1.0259029 ,  2.1589134 ],
       [-0.20522311,  0.16202818],
       [ 0.23325957, -0.9435805 ]], dtype=float32)> 



### 1.1.Waiting to create variables

It's convenient in many cases to wait to create variables until you are sure of the input shape. You may have noticed here that you have to define both input and output sizes to the layer. This is so the `w` variable has a known shape and can be allocated.

By deferring variable creation to the first time the module is called with a specific input shape, you do not need specify the input size up front.

In [16]:
class flexibleDenseModule(tf.Module):
    def __init__(self, out_features, name = None):
        super().__init__(name=name)
        self.is_built = False
        self.out_features = out_features
    
    def __call__(self,x):
        if not self.is_built:
            self.w = tf.Variable(tf.random.normal([x.shape[-1], self.out_features]), name = 'w')
            self.b = tf.Variable(tf.zeros([self.out_features]), name = 'b')
            self.is_built = True
        
        y = tf.matmul(x, self.w) + self.b
        return tf.nn.relu(y)

In [19]:
class MySequentialModule(tf.Module):
    def __init__(self, name = None):
        super().__init__(name = name)
        
        self.dense_1 = flexibleDenseModule(out_features = 3)
        self.dense_2 = flexibleDenseModule(out_features = 2)
    
    def __call__(self, x):
        x = self.dense_1(x)
        return self.dense_2(x)
    
my_model = MySequentialModule(name = "The_model")
print("Model results", my_model(tf.constant([[2.0,2.0,2.0]])))

Model results tf.Tensor([[0.       5.991182]], shape=(1, 2), dtype=float32)


This flexibility is why TensorFlow layers often only need to specify the shape of their outputs, such as in `tf.keras.layers.Dense`, rather than both the input and output size.

## 2. Saving Weights

You can save a `tf.Module` as both a checkpoint and a SavedModel.

Checkpoints are just the weights (that is, the values of the set of variables inside the module and its submodules):

In [21]:
chk_path = "my_checkpoint"
chkpnt = tf.train.Checkpoint(model = my_model)
chkpnt.write(chk_path)

'my_checkpoint'

Checkpoints consist of two kinds of files: the data itself and an index file for metadata. The index file keeps track of what is actually saved and the numbering of checkpoints, while the checkpoint data contains the variable values and their attribute lookup paths.

You can look inside a checkpoint to be sure the whole collection of variables is saved, sorted by the Python object that contains them.

In [23]:
tf.train.list_variables(chk_path)

[('_CHECKPOINTABLE_OBJECT_GRAPH', []),
 ('model/dense_1/b/.ATTRIBUTES/VARIABLE_VALUE', [3]),
 ('model/dense_1/w/.ATTRIBUTES/VARIABLE_VALUE', [3, 3]),
 ('model/dense_2/b/.ATTRIBUTES/VARIABLE_VALUE', [2]),
 ('model/dense_2/w/.ATTRIBUTES/VARIABLE_VALUE', [3, 2])]

During distributed (multi-machine) training they can be sharded, which is why they are numbered (e.g., '00000-of-00001'). In this case, though, there is only one shard.

When you load models back in, you overwrite the values in your Python object.

In [25]:
new_model = MySequentialModule()
new_checkpoint = tf.train.Checkpoint(model = new_model)
new_checkpoint.restore("my_checkpoint")

<tensorflow.python.checkpoint.checkpoint.CheckpointLoadStatus at 0x217f9fbd820>

## 3. Keras Models and Layers

Note that up until this point, there is no mention of Keras. You can build your own high-level API on top of `tf.Module`, and people have.

In this section, you will examine how Keras uses `tf.Module`.

### 3.1. Keras Layers

`tf.keras.layers.Layer` is the base class of all Keras layers, and it inherits from `tf.Module`.

You can convert a module into a Keras layer just by swapping out the parent and then changing `__call__` to `call`:

In [29]:
class MyDense(tf.keras.layers.Layer):
  # Adding **kwargs to support base Keras layer arguments
  def __init__(self, in_features, out_features, **kwargs):
    super().__init__(**kwargs)

    # This will soon move to the build step; see below
    self.w = tf.Variable(
      tf.random.normal([in_features, out_features]), name='w')
    self.b = tf.Variable(tf.zeros([out_features]), name='b')
  
    def call(self, x):
        y = tf.matmul(x, self.w) + self.b
        return tf.nn.relu(y)

simple_layer = MyDense(name="simple", in_features=3, out_features=3)

Keras layers have their own `__call__` that does some bookkeeping described in the next section and then calls `call()`. You should notice no change in functionality.

In [28]:
simple_layer([[2.0,2.0, 2.0]])

<tf.Tensor: shape=(1, 3), dtype=float32, numpy=array([[1.2344629 , 0.45570105, 5.897377  ]], dtype=float32)>

### 3.2. The `build` step

As noted, it's convenient in many cases to wait to create variables until you are sure of the input shape.

Keras layers come with an extra lifecycle step that allows you more flexibility in how you define your layers. This is defined in the build function.

`build` is called exactly once, and it is called with the shape of the input. It's usually used to create variables (weights).

You can rewrite `MyDense` layer above to be flexible to the size of its inputs:

In [30]:
class FlexibleDense(tf.keras.layers.Layer):
  # Note the added `**kwargs`, as Keras supports many arguments
    def __init__(self, out_features, **kwargs):
        super().__init__(**kwargs)
        self.out_features = out_features

    def build(self, input_shape):  # Create the state of the layer (weights)
        self.w = tf.Variable(
        tf.random.normal([input_shape[-1], self.out_features]), name='w')
        self.b = tf.Variable(tf.zeros([self.out_features]), name='b')

    def call(self, inputs):  # Defines the computation from inputs to outputs
        return tf.matmul(inputs, self.w) + self.b

# Create the instance of the layer
flexible_dense = FlexibleDense(out_features=3)

At this point, the model has not been built, so there are no variables:

In [31]:
flexible_dense.variables

[]

Calling the function allocates appropriately-sized variables:

In [32]:
# Call it, with predictably random results
print("Model results:", flexible_dense(tf.constant([[2.0, 2.0, 2.0], [3.0, 3.0, 3.0]])))

Model results: tf.Tensor(
[[-0.36299315 -0.9272399   4.8384423 ]
 [-0.5444897  -1.3908594   7.2576637 ]], shape=(2, 3), dtype=float32)


In [33]:
flexible_dense.variables

[<tf.Variable 'flexible_dense/w:0' shape=(3, 3) dtype=float32, numpy=
 array([[ 0.27201393, -2.1972196 ,  0.6269824 ],
        [-0.47587693,  0.8998526 ,  1.2342409 ],
        [ 0.02236642,  0.83374715,  0.557998  ]], dtype=float32)>,
 <tf.Variable 'flexible_dense/b:0' shape=(3,) dtype=float32, numpy=array([0., 0., 0.], dtype=float32)>]

Since `build` is only called once, inputs will be rejected if the input shape is not compatible with the layer's variables:

In [34]:
try:
    print("Model results:", flexible_dense(tf.constant([[2.0, 2.0, 2.0, 2.0]])))
except tf.errors.InvalidArgumentError as e:
    print("Failed:", e)

Failed: Exception encountered when calling layer "flexible_dense" "                 f"(type FlexibleDense).

{{function_node __wrapped__MatMul_device_/job:localhost/replica:0/task:0/device:CPU:0}} Matrix size-incompatible: In[0]: [1,4], In[1]: [3,3] [Op:MatMul]

Call arguments received by layer "flexible_dense" "                 f"(type FlexibleDense):
  • inputs=tf.Tensor(shape=(1, 4), dtype=float32)


Keras layers have a lot more extra features including:

- Optional losses
- Support for metrics
- Built-in support for an optional training argument to differentiate between training and inference use
- `get_config` and `from_config` methods that allow you to accurately store configurations to allow model cloning in Python

###  3.3. Keras Models

You can define your model as nested Keras layers.

However, Keras also provides a full-featured model class called `tf.keras.Model`. It inherits from `tf.keras.layers.Layer`, so a Keras model can be used, nested, and saved in the same way as Keras layers. Keras models come with extra functionality that makes them easy to train, evaluate, load, save, and even train on multiple machines.

You can define the SequentialModule from above with nearly identical code, again converting `__call__` to `call()` and changing the parent:

In [35]:
class MySequentialModel(tf.keras.Model):
    def __init__(self, name=None, **kwargs):
        super().__init__(**kwargs)

        self.dense_1 = FlexibleDense(out_features=3)
        self.dense_2 = FlexibleDense(out_features=2)
    def call(self, x):
        x = self.dense_1(x)
        return self.dense_2(x)

# You have made a Keras model!
my_sequential_model = MySequentialModel(name="the_model")

# Call it on a tensor, with random results
print("Model results:", my_sequential_model(tf.constant([[2.0, 2.0, 2.0]])))

Model results: tf.Tensor([[-2.784345  -0.9372672]], shape=(1, 2), dtype=float32)


All the same features are available, including tracking variables and submodules.

In [37]:
my_sequential_model.variables

[<tf.Variable 'my_sequential_model/flexible_dense_1/w:0' shape=(3, 3) dtype=float32, numpy=
 array([[ 0.5948258 ,  0.08509963,  0.8485259 ],
        [-0.40347844,  0.18530941, -1.0075909 ],
        [-0.4186937 ,  0.49814397, -0.12418467]], dtype=float32)>,
 <tf.Variable 'my_sequential_model/flexible_dense_1/b:0' shape=(3,) dtype=float32, numpy=array([0., 0., 0.], dtype=float32)>,
 <tf.Variable 'my_sequential_model/flexible_dense_2/w:0' shape=(3, 2) dtype=float32, numpy=
 array([[ 0.18443146, -1.071148  ],
        [-1.8971117 , -0.8341634 ],
        [-0.38054168,  0.25086123]], dtype=float32)>,
 <tf.Variable 'my_sequential_model/flexible_dense_2/b:0' shape=(2,) dtype=float32, numpy=array([0., 0.], dtype=float32)>]

In [41]:
my_sequential_model.submodules

(<__main__.FlexibleDense at 0x217f9fbd430>,
 <__main__.FlexibleDense at 0x217fa3b74f0>)

If you are constructing models that are simple assemblages of existing layers and inputs, you can save time and space by using the functional API, which comes with additional features around model reconstruction and architecture.

Here is the same model with the functional API:

In [42]:
inputs = tf.keras.Input(shape=[3,])

x = FlexibleDense(3)(inputs)
x = FlexibleDense(2)(x)

my_functional_model = tf.keras.Model(inputs=inputs, outputs=x)

my_functional_model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 3)]               0         
                                                                 
 flexible_dense_3 (FlexibleD  (None, 3)                12        
 ense)                                                           
                                                                 
 flexible_dense_4 (FlexibleD  (None, 2)                8         
 ense)                                                           
                                                                 
Total params: 20
Trainable params: 20
Non-trainable params: 0
_________________________________________________________________


In [43]:
my_functional_model(tf.constant([[2.0, 2.0, 2.0]]))

<tf.Tensor: shape=(1, 2), dtype=float32, numpy=array([[13.858565, -9.586419]], dtype=float32)>

The major difference here is that the input shape is specified up front as part of the functional construction process. The `input_shape` argument in this case does not have to be completely specified; you can leave some dimensions as `None`.

## 4. Saving Keras Models

Keras models can be checkpointed, and that will look the same as `tf.Module`.

Keras models can also be saved with `tf.saved_model.save()`, as they are modules. However, Keras models have convenience methods and other functionality.

In [44]:
my_sequential_model.save("exname_of_file")

INFO:tensorflow:Assets written to: exname_of_file\assets


Just as easily they canbe loaded back in:

In [45]:
reconstructed_model = tf.keras.models.load_model("exname_of_file")

