# References

1. Intro to modules, layers and models:<br>https://www.tensorflow.org/guide/intro_to_modules
2. TensorFlow layers:<br>https://www.tensorflow.org/api_docs/python/tf/keras/layers/Layer
3. tf.random.normal function:<br>https://www.tensorflow.org/api_docs/python/tf/random/normal
4. Rectified linear unit:<br>https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/
5. Sequential model:<br>https://keras.io/guides/sequential_model/#:~:text=A%20Sequential%20model

# Necessary libraries

In [2]:
import tensorflow as tf # For accessing TensorFlow library
import numpy as np # For accessing NumPy library

# Overview

In TensorFlow, most implementations of layers and models, such as Keras or Sonnet, are built on the same foundational class defined in TensorFlow i.e. **Module**. Modules, and by extension, layers and models, are deep-learning terminology for objects. They have internal state, and methods that use that state.

# Definitions

## Layer

A layer is a function with a known mathematical structure that
- can be reused
- have trainable variables

In other terms, a layer is a callable object that takes one or more tensors as input and outputs one or more tensors. It involves
- a computation process (defined in the call( ) method)
- a state (a set of weight variables assigned certain values)

## Models

Machine learning in TensorFlow requires the definition, creation, storage and restoration of models. Practically speaking, a model is usually the implementation of some machine learning algorithm. Generally speaking, a model can be defined in one of the two following ways:
- A function that computes something on tensors (i.e. a forward pass)
- A collection of variables that can be updated in response to training

# Creation

Both layers and models are subclasses of the **Module** class. The following shows the definition of a simple **Module** subclass.

In [3]:
class XYZ(tf.Module):
    # Subclass constructor
    def __init__(self, name):
        # Applying the parent class' constructor i.e. Module's constructor
        super().__init__(name=name)
        
        # Trainable variables
        self.trainable1 = tf.Variable([1, 2], name="train me 1")
        self.trainable2 = tf.Variable([1, 2], name="train me 2")
        
        # Non-trainable variable
        self.nontrainable1 = tf.Variable(3, trainable=False, name="don't train me 1")
        self.nontrainable2 = tf.Variable(3, trainable=False, name="don't train me 2")
    #------------------------
    # Defining object behaviour as a callable object
    """
    Note that this is not strictly necessary.
    It simply enables us to define the function or mathematical structure
    of a layer or model in a manner that is very convenient to call and use,
    since the object becomes directly callable.
    """
    def __call__(self, x):
        return self.trainable1*x + self.nontrainable2

**Note on '\_\_call\_\_'**<br>
**\_\_call\_\_** is a built-in method provided by Python, like **\_\init\_\_**. The **\_\_call\_\_** method defined in the subclass defines the behaviour of this subclass as a callable object. In other words, having defined the **\_\_call\_\_** method in the subclass, you can call any instance of the subclass as a function, and the behaviour of this function is defined by the **\_\_call\_\_** method. Hence, if **m** is an instance of the subclass, **m( )** will be equivalent to **m.\_\_call\_\_( )**. Note that the instance has all the functionalities of a normal object as well.

In [4]:
# Instantiating the subclass
simpleModule = XYZ(name = 'owo')

# Calling the object
simpleModule(3)

<tf.Tensor: shape=(2,), dtype=int32, numpy=array([6, 9], dtype=int32)>

Hence, we see the computation defined in the **\_\_call\_\_** method being performed with the argument **x** being 3. The return value is a tensor, since the result of operation on variables is a tensor.

**Note on naming module subclasses**<br>
Note that when providing a name for a **Module** subclass, you must ensure the name is a valid Python identifier i.e. the name should begin with an alphabet and only contain alphanumeric characters or underscore, and must not contain special characters or whitespaces.

## Layers within models

### Creating a rudimentary neural network layer

In [5]:
class MyLayer(tf.Module):
    def __init__(self, nrow, ncol, name):
        super().__init__(name=name)
        
        # Normally distributed weights...
        self.w = tf.Variable(
            tf.random.normal([nrow, ncol], 2, 4),
            # => tf.random.normal(shape=[nrow, ncol], mean=2, stddev=4)
            name='weights')
        
        # Biases (initialized as zero)...
        self.b = tf.Variable(tf.zeros([nrow, ncol]), name='biases')
    #------------------------
    # Defining layer behaviour as a callable object
    def __call__(self, x):
        y = tf.matmul(x, self.w) + self.b
        # matmul => matrix multiplication
        return tf.nn.relu(y)
        # The above applies the ReLu function to the weighted and biased aggregate y.

**Note on tf.random.normal**<br>
A TensorFlow function that outputs a tensor containing random values from a normal distribution. By default, the normal distribution has _mean = 0, standard deviation = 1_. You can specify the shape of the output, the mean and standard deviation of the distribution, the data type of the values (**dtype**), the seed (for pseudo-random number generation) and the name of the output tensor.
<br><br>
**Note on ReLU**<br>
ReLU i.e. rectified linear unit is a piecewise linear function that will output the input directly if it is positive, and will output zero otherwise. Note that in a neural network, the activation function is responsible for transforming the summed weighted input from the node into the either
- the activation status of the node
- output of the node for the given input

### Creating a sequential model

In [6]:
class MyModel(tf.Module):
    def __init__(self, name):
        super().__init__(name=name)
        self.layer1 = MyLayer(nrow=3, ncol=3, name=name)
        self.layer2 = MyLayer(nrow=3, ncol=2, name=name)
    #------------------------
    # Defining model behaviour as a callable object
    def __call__(self, x):
        x = self.layer1(x)
        return self.layer2(x)

**Note on sequential model**<br>
A sequential model is a model containing a simple stack of layers, where each layer has exactly one input tensor and one output tensor, and where the output of one layer becomes the input of the next layer in the stack in a sequential manner.

In [7]:
# Instantiating the subclass i.e. creating the model
m = MyModel(name="model")

**Note on input & output of the model**<br>
Note that in the above case, we cannot give an argument of any shape when calling the model. This is because the model calls the layer instances, which in turn apply matrix multiplication
between the argument and the layer's weights that are arranged according to a specified shape. We must consider how matrix addition with the biases affects the result of the matrix multiplication. We must also consider that the output from one layer must be compatible as the input for the net layer. Furthermore, the data types of the argument and the weights must match. Since the weights are of float data type, the argument passed should also be of float data type.
<br><br>
Now, note that each layer instantiated within the model has 3 rows, which means any input we provide to the model must have exactly 3 columns, so that matrix multiplication is possible between the argument and the matrix of weights. Now, note that matrix addition with biases will broadcast the result of the matrix multiplication, but only if the result of the matrix multiplication is one of the following:
1. Single-valued constant
2. Has same number of columns, and perfectly divides the number of rows in the bias matrix
3. Has same number of rows, and perfectly divides the number of columns in the bias matrix

_(Broadcasting is discussed in the document on tensors)_<br>
The biases of the first layer is in a 3x3 matrix. Since the input must have exactly 3 columns, it can have either 1 row or 3 rows, by the 3rd condition. In any case, the output of layer 1 will be a 3x3 matrix, since any 1x3 matrix would be broadcasted into a 3x3 matrix due to the matrix addition with the 3x3 bias matrix (for a demonstration, check the demo code section at the end of the document). Any 3x3 matrix input to layer 2 will be converted to a 3x2 matrix through matrix multiplication with the weights of layer 2, which are in a 3x2 matrix. This can be neatly added to the biases of layer 3, which is also in a 3x2 matrix.
<br><br>
_Hence, for the model, we can only input collections of shapes 1x3 or 3x3. The final output will be a tensor of shape 3x2._

In [8]:
# Calling the model...
print("Result 1:", m(tf.constant([[2.3, 4.1, 5.4]]))) # Tensor with 1 row and 3 columns
print("Result 2:", m(tf.constant([[1, 2, 3],
                                  [1, 1, 1],
                                  [4, 0, 5]], dtype=tf.float32)))

Result 1: tf.Tensor(
[[ 62.47554 395.267  ]
 [ 62.47554 395.267  ]
 [ 62.47554 395.267  ]], shape=(3, 2), dtype=float32)
Result 2: tf.Tensor(
[[ 39.805847 225.71663 ]
 [  6.347664  68.57765 ]
 [113.83667  461.36655 ]], shape=(3, 2), dtype=float32)


In [9]:
# Calling a model with the wrong shape of arguments...
try: print("Result 3:", m(tf.constant(np.matrix([[1, 2, 3], [4, 0, 5]]), dtype=tf.float32)))
except Exception as e: print(e)

Incompatible shapes: [2,3] vs. [3,3] [Op:AddV2]


## Automatically generated collections

In any subclass of **Module**, any **Variable** or **Module** instances assigned as the properties of the subclass (hence of its instances) are automatically collected for any instance of the subclass. In other words, **Module** instances will automatically and recursively collect any **Variable** or **Module** instances assigned to it. These collections include _trainable\_variables_, _variables_ and _submodules_...

In [10]:
print("From 'simpleModule' i.e. instance of 'XYZ':\n------------")
print("\nTrainable variables:")
for x in simpleModule.trainable_variables: print(x)
print("\nAll variables:")
for x in simpleModule.variables: print(x)
# (simpleModule has variables but no submodules)
print("========================")
print("From 'm' i.e. instance of 'MyModel':\n------------")
print("\nAll submodules:")
for x in m.submodules: print(x)
# (m has submodules but no variables)

From 'simpleModule' i.e. instance of 'XYZ':
------------

Trainable variables:
<tf.Variable 'train me 1:0' shape=(2,) dtype=int32, numpy=array([1, 2], dtype=int32)>
<tf.Variable 'train me 2:0' shape=(2,) dtype=int32, numpy=array([1, 2], dtype=int32)>

All variables:
<tf.Variable 'don't train me 1:0' shape=() dtype=int32, numpy=3>
<tf.Variable 'don't train me 2:0' shape=() dtype=int32, numpy=3>
<tf.Variable 'train me 1:0' shape=(2,) dtype=int32, numpy=array([1, 2], dtype=int32)>
<tf.Variable 'train me 2:0' shape=(2,) dtype=int32, numpy=array([1, 2], dtype=int32)>
From 'm' i.e. instance of 'MyModel':
------------

All submodules:
<__main__.MyLayer object at 0x1450849a0>
<__main__.MyLayer object at 0x109e56070>


 This allows you to save and load variables, and also create collections of **Modules**. This opens doors to features like:
- Managing collections of **Modules** with a single model instance
- Saving whole models and model states
- Adding layers to models

# Demo codes

## Broadcasting by matrix addition

In [11]:
x = tf.constant(np.matrix([[1, 2.0, 3]]), dtype=tf.float32)
w = tf.Variable(tf.random.normal([3, 3], 2, 4))
b = tf.zeros([3, 3])
#------------------------
print("\nBefore adding b:")
print(tf.matmul(x, w))
print("\nAfter adding b: ")
print(tf.matmul(x, w) + b)


Before adding b:
tf.Tensor([[17.946648    0.02306306 13.761355  ]], shape=(1, 3), dtype=float32)

After adding b: 
tf.Tensor(
[[17.946648    0.02306306 13.761355  ]
 [17.946648    0.02306306 13.761355  ]
 [17.946648    0.02306306 13.761355  ]], shape=(3, 3), dtype=float32)
