# Modules, Layers, and Models

In this session we will look at how the following differ in their functionality:
- tf.Module
- tf.keras.Layer
- tf.keras.Model


## tf.Module

The basic building block in Tensorflow to construct a neural network model is the [**Module**](https://www.tensorflow.org/api_docs/python/tf/Module). This is a low-level tensorflow class that itself does not provide all the functionality that comes with the high level abstraction library that is integrated into tensorflow: [Keras](keras.io) but is now also available as an API for other backends such as PyTorch and JAX. These higher level tools nevertheless build on the Module and only further extend its functionality.

What does a module do?

A neural network is both a **collection of computational operations on some inputs** and a **set of trainable or non-trainable parameters**. A module allows to organize this structure in a way that makes it **easy to access all variables** in a convenient way. To do so, we will use subclassing, creating a new class based on the base tf.Module class and inheriting its properties. 

It works like this:

```python
class MyNetwork(tf.Module):
    def __init__(self, architecture_arguments, name=None):
        super().__init__(name=name)
```
(Note how the name of the class and the first argument in super(..., self) have to match)

Now we can start to add variables to the module in the constructor (init). To do so we can use **tf.Variable** like this, while specifying whether it should be flagged as trainable or not:

```python
class MyNetwork(tf.Module):
    def __init__(self, architecture_arguments=None, name=None):
        super().__init__(name=name)
        
        self.my_variable = tf.Variable([1.,1.5,1.2,3.], trainable=True)
    
```


Here the variable is a rank one tensor of shape (4). Now that we have a non-empty model class, we can see what makes this module structure useful. We can collect and access all stored variables at once by calling "model.variables" on the model.

This makes updating variables (usually weights and biases) convenient, because variables can be contained in nested structures and sub-modules. We can also obtain only those variables flagged as trainable with "model.trainable_variables".

In [1]:
import tensorflow as tf
# set a global seed for the random number generator to make results reproducible
tf.random.set_seed(10)

class MyNetwork(tf.Module):
    def __init__(self, architecture_arguments=None, name=None):
        super().__init__(name=name)
        
        self.my_variable = tf.Variable([1.,1.5,1.2,3.], trainable=True)
        
        # any nested data structure containing tf.Variables works!
        self.variable_dict = {"A": ([tf.Variable([3.,2.], trainable=False)]), 
                              "B": [[[[[tf.Variable([1.21])]]]]], 
                              "C": {"B1": tf.Variable([9.,2.])}}
        
        
        
model = MyNetwork()

model.variables
model.trainable_variables

(<tf.Variable 'Variable:0' shape=(4,) dtype=float32, numpy=array([1. , 1.5, 1.2, 3. ], dtype=float32)>,
 <tf.Variable 'Variable:0' shape=(2,) dtype=float32, numpy=array([3., 2.], dtype=float32)>,
 <tf.Variable 'Variable:0' shape=(1,) dtype=float32, numpy=array([1.21], dtype=float32)>,
 <tf.Variable 'Variable:0' shape=(2,) dtype=float32, numpy=array([9., 2.], dtype=float32)>)

By itself this model does not yet do anything with the variables. All the computational structure arises in a separate call method. There we specify how the inputs and the variables are used in a sequence of computations such that an output is produced. A simple call method could look like this:

```python
class MyNetwork(tf.Module):
    def __init__(self, architecture_arguments=None, name=None):
        super().__init__(name=name)
        
        self.my_variable = tf.Variable([1.,1.5,1.2,3.], trainable=True)
        
    @tf.function    
    def call(self, inputs, training=False):
        
        x = inputs @ tf.transpose(self.my_variable)
        
        return tf.nn.relu(x)
        
```

Here we've set a standard "training" argument in the call method to default to False. This is because later on we will want the computations to differ between training and inference time.

We have also used [**tf.function**](https://www.tensorflow.org/api_docs/python/tf/function) as a decorator. This makes the computations run on a graph which is more efficient.

Let's test the model on randomly generated input:

In [None]:
class MyNetwork(tf.Module):
    def __init__(self, architecture_arguments=None, name=None):
        super().__init__(name=name)

        self.my_variable = tf.Variable([[1., 1.5, 1.2, 3.]], trainable=True)

    @tf.function
    def __call__(self, inputs, training=False):
        
        x = inputs @ tf.transpose(self.my_variable)

        return tf.nn.relu(x)

model = MyNetwork()

# a "batch size" of 1 example with 4 features
input_data = tf.random.uniform((1,4)) 

out = model(input_data)

Any Module can encapsulate other modules and we can still directly access all variables, trainable or non_trainable in the same way. To see this, let's construct the same module except this time, the output shape should be the input shape.

In [None]:
class MyModule(tf.Module):
    def __init__(self, input_dim, output_dim, name=None):
        super().__init__(name=name)

        self.my_variable = tf.Variable(tf.random.normal(shape=(input_dim, output_dim)), trainable=True, name=name)

    @tf.function
    def __call__(self, inputs, training=False):
        
        x = inputs @ self.my_variable

        return tf.nn.relu(x)

# keep input dimensionality for output
model = MyModule(input_dim=4, output_dim=4)

input_data = tf.random.uniform((1,4))

# apply the model multiple times
model(model(model(model(model(input_data)))))

We can now build a multi layer perceptron (without a bias)

In [None]:
class MyNetwork(tf.Module):
    def __init__(self, input_dim, output_dim, name=None):
        super().__init__(name=name)
        
        self.submodule_1 = MyModule(input_dim = input_dim, output_dim = 10, name="layer1")
        self.submodule_2 = MyModule(input_dim = 10, output_dim = 20, name="layer2")
        self.submodule_3 = MyModule(input_dim = 20, output_dim = output_dim, name="layer3")
    
    @tf.function
    def __call__(self, inputs, training=False):
        
        x = self.submodule_1(inputs, training)
        x = self.submodule_2(x, training)
        output = self.submodule_3(x, training)
        
        return output

Here we have passed down the training argument to the submodules in the call method. It does not yet do anything but we will need it for later. We also created the class in a way that allows to build different MLPs for different input and output sizes. Additionally, since it can become very difficult to know which variable belongs to which submodule, a convention is to give them **names**.

In [None]:
kwargs = {"input_dim" : 4,
          "output_dim": 1}

model = MyNetwork(**kwargs)

input_data = tf.random.uniform((1,4)) 

print(model(input_data))

print(model.trainable_variables)

One more feature that comes with the tf.Module class is to use a naming context manager that will automatically tag all variables with the same module name, making it easier for modules with multiple variable tensors. It can be used as follows:

In [None]:
class MyModule(tf.Module):
    def __init__(self, input_dim, output_dim, name=None):
        super().__init__(name=name)
        
        if not name:
            name="noname_module"
            
        with tf.name_scope(name) as scope:

            self.weights = tf.Variable(tf.random.normal(shape=(input_dim, output_dim)), trainable=True, 
                                           name="weights")
            self.bias = tf.Variable(tf.zeros(shape=(1,output_dim)), trainable=True,
                                    name="bias")

    @tf.function
    def __call__(self, inputs, training=False):
        
        x = inputs @ self.my_variable + self.bias

        return tf.nn.relu(x)
    
dense = MyModule(input_dim=4, output_dim=1, name="dense_layer")

dense.trainable_variables

By subclassing the tf.Module class and building your own submodules, you can build any neural network architecture that you like. There is no _need_ to use predefined Keras layers or the Model class since these only provide extra functionality on top of what tf.Module provides.

# The Layer Class

The tf.keras.layers.Layer base class inherits from tf.Module but extends its functionality in several ways.

Perhaps the most important extension is a separate [**build method**](https://www.tensorflow.org/guide/intro_to_modules#keras_models_and_layers) which allows to keep a module and its variables agnostic to the input shape until it is first called on a specific input. This means that the constructor no longer instantiates the variables but only the general structure of the module (e.g. which submodules are part of it).

In [None]:
class Dense(tf.keras.layers.Layer):
    def __init__(self, n_units, activation_function, **kwargs):
        super().__init__(**kwargs)
        # no variables created
        self.n_units = n_units
        self.activation_function = activation_function
        
    def build(self, input_shape):
        self.w = tf.Variable(tf.random.normal([input_shape[-1], self.n_units]), name='weights')
        self.b = tf.Variable(tf.zeros([self.n_units]), name='bias')

    def call(self, inputs):
        x = inputs @ self.w + self.b
        return self.activation_function(x)

# instantiate the layer
dense_layer = Dense(n_units=10, activation_function=tf.nn.relu)

# it has no variables
print(f"dense_layer variables:{dense_layer.variables}")

# call it on an input to create weights suitable for this input
dense_layer(tf.random.uniform(shape=(32,16)))


Besides the build method, keras layers come with a number of extra functionality that we will explore later in the course:

- adding and collecting regularization losses
- metrics and bookkeeping
- handling masking and training arguments
- access submodules in a structured way

# The Model Class

The Keras Model class is a very useful tool that inherits from the Layer class but adds even more functionality to it for much more convenient use, which is great for standard applications.

It allows to:

- Use a fit method to do the training
- compile the model with metrics, a loss and an optimizer
- Save the model's weights
- Pre-defined methods for inference and for evaluation
- Models can also be submodules of further modules.
- Access all layers in the model.
- Contain optimizer and training logic within the model
- Contain training and validation metrics within the model
- Additionally we can create models without subclassing with the [Functional API](https://www.tensorflow.org/guide/keras/functional?hl=en)

In short: You can have everything you need for training in one object (except for the data itself) and make many routine things faster.

# Why use the low level features of tensorflow if you have the convenient tools that Keras comes with?

It seems very convenient to just use the high level Model class and reduce deep learning to a few lines of code and a call to model.fit(). While this is certainly true, it is very important to know what is going on at a lower level such that you can implement non-standard applications. 

Keras is developed in a way that allows many of these things by tightly integrating the high level features with low level control. For instance you can write a custom tf.Module and use it as a submodule in a tf.keras.layers.Layer or tf.keras.Model. You could even use a model that you have written at a very low level and encapsulate it in the Model class to obtain convenient training and evaluation methods.

Since you will not need most of the functionality provided by the layer and model classes, you will learn the most by restricting yourself to the tf.Module class. You can still include Keras layers such as the Dense layer or the Conv2D layer within your tf.Module model.

Focusing on the tf.Module class (or at least restricting your use of the keras model class to its functionalities) at first also allows you to understand code in other frameworks such as [PyTorch](https://pytorch.org/) which do not have the layer and model abstractions. For instance, in PyTorch you are required to always specify input and output dimensions on your layers explicitly. When using Keras Layers, this is automatically inferred in the build method. This can lead to more confusion when required to fix non-matching shapes in a model because you become reliant on Keras determining the required dimensions of variables.

For this reason, having some experience without the nice tools that Keras provides will equip you with the right perspective for when you will be working with much more advanced models in the future and need to debug them.


**TL;DR: Keras is great and we love it but you need to work with the low level features to learn how to debug complex models (important for your projects).**