# Modules

Resource: [TensorFlow Guide](https://www.tensorflow.org/guide/intro_to_modules)

Most models are made of layers. Layers are functions with a known mathematical structure that can be reused and have trainable variables. In TensorFlow, most high-level implementations of layers and models, such as Keras or Sonnet, are built on the same foundational class: tf.Module.

In [1]:
import tensorflow as tf
from datetime import datetime

%load_ext tensorboard

## Building Modules

### Build a simple module

Modules and, by extension, layers are deep-learning terminology for "objects": they have internal state, and methods that use that state.

In [2]:
class SimpleModule(tf.Module):
  def __init__(self, name = None):
    super().__init__(name = name)
    self.a_variable = tf.Variable(5.0, name = "train_me")
    self.non_trainable_variable = tf.Variable(5.0,
                                              trainable = False,
                                              name = "do_not_train_me")

  def __call__(self, x):
    return self.a_variable * x + self.non_trainable_variable

In [3]:
simple_module = SimpleModule(name = "simple")
simple_module(tf.constant(5.0))

<tf.Tensor: shape=(), dtype=float32, numpy=30.0>

### Extracting all variables from the module

By subclassing tf.Module, any tf.Variable or tf.Module instances assigned to this object's properties are automatically collected. This allows you to save and load variables, and also create collections of tf.Module(s).

In [4]:
simple_module.variables

(<tf.Variable 'train_me:0' shape=() dtype=float32, numpy=5.0>,
 <tf.Variable 'do_not_train_me:0' shape=() dtype=float32, numpy=5.0>)

In [5]:
simple_module.trainable_variables

(<tf.Variable 'train_me:0' shape=() dtype=float32, numpy=5.0>,)

In [6]:
simple_module.non_trainable_variables

(<tf.Variable 'do_not_train_me:0' shape=() dtype=float32, numpy=5.0>,)

### Implementing a model with dense layers

In [7]:
# define dense layer
class Dense(tf.Module):
  def __init__(self, in_features, out_features, name = None):
    super().__init__(name = name)
    self.w = tf.Variable(
        tf.random.normal([in_features, out_features]),
                         name = "w")
    self.b = tf.Variable(tf.random.normal([out_features]), name = "b")

  def __call__(self, x):
    y = tf.matmul(x, self.w) + self.b
    return tf.nn.relu(y)

In [8]:
# define model architecture
class SequentialModule(tf.Module):
  def __init__(self, in_features, out_features, name = None):
    super().__init__(name = name)

    self.dense_1 = Dense(in_features = in_features, out_features = 10)
    self.dense_2 = Dense(in_features = 10, out_features = out_features)

  def __call__(self, x):
    x = self.dense_1(x)
    return self.dense_2(x)

In [9]:
test_model = SequentialModule(3, 1, "My_Model")
print("Model results:", test_model(tf.constant([[2.0, 2.0, 2.0]])).numpy())

Model results: [[10.9452305]]


In [10]:
my_model = SequentialModule(3, 5, "My_Model")
print("Model results:", my_model(tf.constant([[-3.0, 2.0, 12.0]])).numpy())
print("Model results:", my_model(tf.constant([[3.0, 12.0, 2.0]])).numpy())
print("Model results:", my_model(tf.constant([[-3.0, 2.0, 2.0]])).numpy())

Model results: [[35.862324  0.        0.       14.323252 24.277296]]
Model results: [[15.760409   0.         0.         2.4547026  7.96459  ]]
Model results: [[ 5.720687  0.        0.       10.972248 13.360544]]


### Accessing submodules and variables within a module

tf.Module instances will automatically collect, recursively, any tf.Variable or tf.Module instances assigned to it. This allows you to manage collections of tf.Modules with a single model instance, and save and load whole models.

In [11]:
for submodule in my_model.submodules:
  print("Submodule:", submodule)

Submodule: <__main__.Dense object at 0x7fc9ee3a04c0>
Submodule: <__main__.Dense object at 0x7fc9ee36ece0>


In [12]:
for var in my_model.variables:
  print(var, "\n")

<tf.Variable 'b:0' shape=(10,) dtype=float32, numpy=
array([ 0.4391686 ,  0.49363697,  0.08814096,  1.1443    , -0.7472049 ,
       -0.67132956,  0.13753696, -0.5086929 , -0.3485612 ,  0.5009082 ],
      dtype=float32)> 

<tf.Variable 'w:0' shape=(3, 10) dtype=float32, numpy=
array([[-0.47020775, -1.3438826 , -0.27380255, -0.23510106,  0.8166703 ,
         0.43490455, -0.23481819, -0.9769432 ,  0.98039985,  1.0066903 ],
       [-0.05855056, -0.51440233,  0.17947087,  0.6514524 ,  0.12257692,
        -0.27936143,  0.02550284, -0.94399124, -0.21225931, -1.015371  ],
       [-0.30464157,  0.34900215,  0.4863775 ,  0.9624898 ,  1.0194541 ,
        -0.22768553, -0.79427147,  0.48206282,  0.11618549, -0.27889112]],
      dtype=float32)> 

<tf.Variable 'b:0' shape=(5,) dtype=float32, numpy=
array([-0.3733548, -1.454579 ,  1.7649133,  0.4787972,  0.68699  ],
      dtype=float32)> 

<tf.Variable 'w:0' shape=(10, 5) dtype=float32, numpy=
array([[-0.09249494,  1.7658429 , -0.5362403 , -0.79304516

### Deffering to create variables

By deferring variable creation to the first time the module is called with a specific input shape, you do not need specify the input size up front.

This flexibility is why TensorFlow layers often only need to specify the shape of their outputs, such as in tf.keras.layers.Dense, rather than both the input and output size.

In [13]:
class FlexibleDense(tf.Module):
  def __init__(self, out_features, name = None):
    super().__init__(name = name)
    self.is_built = False
    self.out_features = out_features

  def __call__(self, x):
    if not self.is_built:
      self.w = tf.Variable(
          tf.random.normal([x.shape[-1], self.out_features]),
          name = "w"
      )
      self.b = tf.Variable(tf.zeros([self.out_features]), name = "b")
      self.is_built = True

    y = tf.matmul(x, self.w) + self.b
    return tf.nn.relu(y)

In [14]:
class FlexibleSequential(tf.Module):
  def __init__(self, out_features, name = None):
    super().__init__(name = name)
    self.dense_1 = FlexibleDense(out_features = 3)
    self.dense_2 = FlexibleDense(out_features = out_features)

  def __call__(self, x):
    x = self.dense_1(x)
    return self.dense_2(x)

In [15]:
new_model = FlexibleSequential(3, name = "the_model")
new_model(tf.constant([[1.39, 4.22, .23, .42, .76, .34, 5.43]])).numpy()

array([[2.0002193, 0.       , 6.246097 ]], dtype=float32)

## Saving Weights

### Saving as a checkpoint

Checkpoints are just the weights (that is, the values of the set of variables inside the module and its submodules).

Checkpoints consist of two kinds of files: the data itself and an index file for metadata. The index file keeps track of what is actually saved and the numbering of checkpoints, while the checkpoint data contains the variable values and their attribute lookup paths.



In [16]:
my_model = FlexibleSequential(3, name = "the_model")
my_model(tf.constant([[2.39, 4.22, .23, .42, .76, .34, 5.43]])).numpy()

array([[ 0.      , 12.247864,  4.819643]], dtype=float32)

In [17]:
chkp_path = "./my_checkpoint/checkpoint"
checkpoint = tf.train.Checkpoint(model=my_model)
checkpoint.write(chkp_path)

'./my_checkpoint/checkpoint'

You can look inside a checkpoint to be sure the whole collection of variables is saved, sorted by the Python object that contains them.

In [18]:
tf.train.list_variables(chkp_path)

[('_CHECKPOINTABLE_OBJECT_GRAPH', []),
 ('model/dense_1/b/.ATTRIBUTES/VARIABLE_VALUE', [3]),
 ('model/dense_1/w/.ATTRIBUTES/VARIABLE_VALUE', [7, 3]),
 ('model/dense_2/b/.ATTRIBUTES/VARIABLE_VALUE', [3]),
 ('model/dense_2/w/.ATTRIBUTES/VARIABLE_VALUE', [3, 3])]

In [19]:
new_model = FlexibleSequential(3)
new_checkpoint = tf.train.Checkpoint(model = new_model)
new_checkpoint.restore(chkp_path)

new_model(tf.constant([[2.39, 4.22, .23, .42, .76, .34, 5.43]])).numpy()

array([[ 0.      , 12.247864,  4.819643]], dtype=float32)

### Saving functions

TensorFlow can run models without the original Python objects, as demonstrated by TensorFlow Serving and TensorFlow Lite, even when you download a trained model from TensorFlow Hub.

TensorFlow needs to know how to do the computations described in Python, but without the original code. To do this, you can make a graph, which is described in the Introduction to graphs and functions guide.

This graph contains operations, or ops, that implement the function.

You can define a graph in the model above by adding the @tf.function decorator to indicate that this code should run as a graph.

In [20]:
class FlexibleSequential(tf.Module):
  def __init__(self, out_features, name = None):
    super().__init__(name = name)
    self.dense_1 = FlexibleDense(out_features = 3)
    self.dense_2 = FlexibleDense(out_features = out_features)

  @tf.function
  def __call__(self, x):
    x = self.dense_1(x)
    return self.dense_2(x)

In [21]:
my_model = FlexibleSequential(3)
print(my_model(tf.constant([[1., 2., 1., 3., 1., 2., 1.]])))
print(my_model([[2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]]))
print(my_model([
 [[2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0],
  [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]]]
))

tf.Tensor([[ 0.      31.62707  0.     ]], shape=(1, 3), dtype=float32)
tf.Tensor([[ 0.       31.389559  0.      ]], shape=(1, 3), dtype=float32)
tf.Tensor(
[[[ 0.       31.389559  0.      ]
  [ 0.       31.389559  0.      ]]], shape=(1, 2, 3), dtype=float32)


### Saving a SavedModel

The recommended way of sharing completely trained models is to use SavedModel. SavedModel contains both a collection of functions and a collection of weights.

In [22]:
tf.saved_model.save(my_model, "my_model")

In [23]:
new_model = tf.saved_model.load("my_model")

In [24]:
print(new_model(tf.constant([[1., 2., 1., 3., 1., 2., 1.]])))
print(new_model([[2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]]))
print(new_model([
 [[2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0],
  [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]]]
))

tf.Tensor([[ 0.      31.62707  0.     ]], shape=(1, 3), dtype=float32)
tf.Tensor([[ 0.       31.389559  0.      ]], shape=(1, 3), dtype=float32)
tf.Tensor(
[[[ 0.       31.389559  0.      ]
  [ 0.       31.389559  0.      ]]], shape=(1, 2, 3), dtype=float32)


In [25]:
type(new_model)

tensorflow.python.saved_model.load.Loader._recreate_base_user_object.<locals>._UserObject

## Keras Models and Layers

You can build your own high-level API on top of tf.Module, and people have. Keras layers and models have a lot more extra features. hese features allow for far more complex models through subclassing, such as a custom GAN or a Variational AutoEncoder (VAE) model. Keras models also come with extra functionality that makes them easy to train, evaluate, load, save, and even train on multiple machines.

### Inheriting from keras layers

You can convert a module into a Keras layer just by swapping out the parent and then changing __call__ to call.

In [26]:
class MyDense(tf.keras.layers.Layer):
  # Adding **kwargs to support base Keras layer arguments
  def __init__(self, in_features, out_features, **kwargs):
    super().__init__(**kwargs)

    # This will soon move to the build step; see below
    self.w = tf.Variable(
      tf.random.normal([in_features, out_features]), name='w')
    self.b = tf.Variable(tf.zeros([out_features]), name='b')
  def call(self, x):
    y = tf.matmul(x, self.w) + self.b
    return tf.nn.relu(y)

simple_layer = MyDense(name="simple", in_features=3, out_features=3)

In [27]:
simple_layer([[2.0, 2.0, 2.0]])

<tf.Tensor: shape=(1, 3), dtype=float32, numpy=array([[0.      , 0.      , 3.446586]], dtype=float32)>

### Utilising the build method

It's convenient in many cases to wait to create variables until you are sure of the input shape. Keras layers come with an extra lifecycle step that allows you more flexibility in how you define your layers. This is defined in the build function. build is called exactly once, and it is called with the shape of the input. It's usually used to create variables (weights).

In [28]:
class FlexibleDense(tf.keras.layers.Layer):
  # Note the added `**kwargs`, as Keras supports many arguments
  def __init__(self, out_features, **kwargs):
    super().__init__(**kwargs)
    self.out_features = out_features

  def build(self, input_shape):  # Create the state of the layer (weights)
    self.w = tf.Variable(
      tf.random.normal([input_shape[-1], self.out_features]), name='w')
    self.b = tf.Variable(tf.zeros([self.out_features]), name='b')

  def call(self, inputs):  # Defines the computation from inputs to outputs
    return tf.matmul(inputs, self.w) + self.b

In [29]:
# Create the instance of the layer
flexible_dense = FlexibleDense(out_features=3)

In [30]:
flexible_dense.variables

[]

Since build is only called once, inputs will be rejected if the input shape is not compatible with the layer's variables:

In [31]:
try:
  print("Model results:", flexible_dense(tf.constant([[2.0, 2.0, 2.0, 2.0]])))
except tf.errors.InvalidArgumentError as e:
  print("Failed:", e)

Model results: tf.Tensor([[4.4299755 6.7267056 4.680213 ]], shape=(1, 3), dtype=float32)


### Using Keras models

Keras also provides a full-featured model class called tf.keras.Model. It inherits from tf.keras.layers.Layer, so a Keras model can be used and nested in the same way as Keras layers. Keras models come with extra functionality that makes them easy to train, evaluate, load, save, and even train on multiple machines.

You can define the SequentialModule from above with nearly identical code, again converting __call__ to call() and changing the parent

In [32]:
class FlexibleSequential(tf.keras.Model):
  def __init__(self, name=None, **kwargs):
    super().__init__(**kwargs)
    self.dense_1 = FlexibleDense(out_features=3)
    self.dense_2 = FlexibleDense(out_features=2)

  def call(self, x):
    x = self.dense_1(x)
    return self.dense_2(x)

# You have made a Keras model!
my_sequential_model = FlexibleSequential(name="the_model")

# Call it on a tensor, with random results
print("Model results:", my_sequential_model(tf.constant([[2.0, 2.0, 2.0]])))

Model results: tf.Tensor([[ 6.0421724 -0.2327581]], shape=(1, 2), dtype=float32)


In [33]:
my_sequential_model.submodules

(<__main__.FlexibleDense at 0x7fc9ee3a1360>,
 <__main__.FlexibleDense at 0x7fc9eb269030>)

In [34]:
my_sequential_model.variables

[<tf.Variable 'flexible_sequential/flexible_dense_1/w:0' shape=(3, 3) dtype=float32, numpy=
 array([[-1.6878377 ,  0.46307197,  0.39740363],
        [ 1.5902584 ,  1.1351705 , -0.52632105],
        [ 1.5269663 , -1.0764346 , -0.50975937]], dtype=float32)>,
 <tf.Variable 'flexible_sequential/flexible_dense_1/b:0' shape=(3,) dtype=float32, numpy=array([0., 0., 0.], dtype=float32)>,
 <tf.Variable 'flexible_sequential/flexible_dense_2/w:0' shape=(3, 2) dtype=float32, numpy=
 array([[ 1.6678193 , -0.5416872 ],
        [ 0.54136133,  0.37101912],
        [-0.5552739 , -0.72697324]], dtype=float32)>,
 <tf.Variable 'flexible_sequential/flexible_dense_2/b:0' shape=(2,) dtype=float32, numpy=array([0., 0.], dtype=float32)>]

### Using Functional API to build a model

If you are constructing models that are simple assemblages of existing layers and inputs, you can save time and space by using the functional API, which comes with additional features around model reconstruction and architecture.

The major difference here is that the input shape is specified up front as part of the functional construction process. The input_shape argument in this case does not have to be completely specified; you can leave some dimensions as None.

In [35]:
inputs = tf.keras.Input(shape=[3,])

x = FlexibleDense(3)(inputs)
x = FlexibleDense(2)(x)

my_functional_model = tf.keras.Model(inputs=inputs, outputs=x)

my_functional_model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 3)]               0         
                                                                 
 flexible_dense_3 (FlexibleD  (None, 3)                12        
 ense)                                                           
                                                                 
 flexible_dense_4 (FlexibleD  (None, 2)                8         
 ense)                                                           
                                                                 
Total params: 20
Trainable params: 20
Non-trainable params: 0
_________________________________________________________________
