# Multiplying 2 numbers with a custom layer

In this notebook I create a simple 3-node network that multiplies 2 numbers. This implements an actual multiplication operation in a custom layer (as opposed to mimicing multiplication with a deep network). To make the layer more flexible, I give it trainable weights that are used as exponents for each of the input values. Thus, the single output from the layer is given by:

$$
y = b + \prod_{i=0}^{N-1} x_{i}^{w_{i}}
$$

where:<br>
$b$ is the bias (which I fix at 0 for this example)<br>
$x_{i}$ are the inputs<br>
$w_{i}$ are the weights<br>

The custom layer accepts arguments for the number of inputs, an optional "trainable" argument for whether the weights should be adjusted during the training of the full model, and another optional "initial_exponent" argument so the caller can set the initial value of the exponents (i.e. weights). This is all done in the ProductLayer class defined in the following cell.

In addition to defining the custom layer, I define a simple model that uses it and then tests it with a small set of inputs right at the end of the cell.

In [1]:
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.optimizers import Adadelta
import tensorflow.keras.backend as K
import tensorflow as tf

NINPUTS = 2

#-----------------------------------------------------
# ProductLayer
#-----------------------------------------------------
# This defines a layer that takes the product of the inputs,
# each raised to the power of its weight. The trainable
# parameter can be set to False to make it non-trainable.
# n.b. If you make this trainable, the inputs cannot be
# negative numbers!
# See details on this in the following cell.
class ProductLayer(tf.keras.layers.Layer):
    def __init__(self, units=1, trainable=True, initial_exponent=2.01):
        super(ProductLayer, self).__init__()
        self.units            = units
        self.trainable        = trainable
        self.initial_exponent = initial_exponent
        
    def build(self, input_shape):
        print('input_shape='+str(input_shape))
        myinitializer   = tf.keras.initializers.Constant(self.initial_exponent)
        self.w = self.add_weight(
            shape       = (self.units, input_shape[-1]),
            initializer = myinitializer,
            trainable   = self.trainable,
        )
        self.b = self.add_weight(
            shape       = (self.units,),
            initializer = "zeros",
            trainable   = False
        )

    def call(self, inputs):
        # inputs has shape (None, 2)
        # self.w has shape (1, 2)
        # tmp has shape (None, 2)
        # output has shape (None, 1)
        tmp = K.pow(inputs, self.w)
        myout = K.prod(tmp, keepdims=True, axis=1) + self.b
        print('inputs.shape: ' + str(inputs.shape))
        print('self.w.shape: ' + str(self.w.shape))
        print('   tmp.shape: ' + str(tmp.shape))
        print(' myout.shape: ' + str(myout.shape))
        return myout

    def get_config(self):
        config = super(ProductLayer, self).get_config()
        config.update({"units": self.units, "trainable": self.trainable, "initial_exponent": self.initial_exponent})
        return config

#-----------------------------------------------------
# DefineModel
#-----------------------------------------------------
# This is used to define the model. It is only called if no model
# file is found in the model_checkpoints directory.
def DefineModel():

    # Build the network model with 2 inputs and one output.
    inputs = Input(shape=(NINPUTS,), name='inputs')
    output = ProductLayer(1)(inputs)
    model  = Model(inputs=inputs, outputs=output)
    
    opt = Adadelta(clipnorm=1.0)
    model.compile(loss='mse', optimizer=opt, metrics=['mae', 'mse', 'accuracy'])

    return model


model = DefineModel()
model.summary()

x = tf.ones((3,2), dtype=tf.dtypes.float32)*(3.0001)
y = model(x)
print(y)

input_shape=(None, 2)
inputs.shape: (None, 2)
self.w.shape: (1, 2)
   tmp.shape: (None, 2)
 myout.shape: (None, 1)
Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
inputs (InputLayer)          [(None, 2)]               0         
_________________________________________________________________
product_layer (ProductLayer) (None, 1)                 3         
Total params: 3
Trainable params: 2
Non-trainable params: 1
_________________________________________________________________
inputs.shape: (3, 2)
self.w.shape: (1, 2)
   tmp.shape: (3, 2)
 myout.shape: (3, 1)
tf.Tensor(
[[82.810524]
 [82.810524]
 [82.810524]], shape=(3, 1), dtype=float32)


## Breakdown of ProductLayer

The ProductLayer class implements a custom layer in Keras. It has 4 methods described below. 

1. **\_\_init\_\_()**: This is the standard python constructor which gets called when the object is created. If there are any options given when the object is instantiated, they will be passed in here. This basically needs to save them as part of the object so they can be used in the other callback methods where the actual work is done.

2. **build()**: This is called automatically the first time the call() method is called. This is all handled by Keras as it is what actually calls "call()" and recogonizes it needs to call "build()" first. This method is responsible for declaring any "weights" in the layer. I quote "weights" since you may notice that the same add_weight() method is called to add the bias values. It is really a way of declaring to Keras the trainable values in the layer. The build() method is actually optional since you could define a layer that does not have any trainable weights. Note that in this example I initialize the bias weights to 0 and then set them as non-trainable. This really is really overkill as I could get the same affect by not adding any bias weights at all. I left it in though to emphasize that any number of "weights" could be added here.

3. **call()**: This is called to create an output Tensor (with a capital-T) based on some given inputs. This is where I actually implement what math the layer will do. This is only called once when the model is compiled. It uses Keras backend functions to define the set of operations that should be performed on the inputs in order to produce the outputs. It does not actually perform those operations during the call. Since Keras+Tensorflow know the operations, it also knows their derivatives which it can chain together to backpropagate during training. Most of my time here is spent on checking the shapes of the inputs to each operation to make sure the automatic looping over nodes is done correctly. See more details below.

4. **config()**: This is needed when saving the model so it can get the parameters needed to configure the layer when it is loaded later.

  ### Explanation of lines in call()
  
  The call method() has only two lines of real content that define the operation. The only important thing in the *inputs* valriable is its shape. In the comments the shapes of the various variables are given. Some of these contain *None* which acts as a placeholder. When the training is done and an actual set of values is given, the *None* value will be replaced by the batch size. Thus, the *(None, 2)* shape of *inputs* indicates some arbitrary number of sets of inputs with each set containing 2 numbers. The *2* is because we defined the layer to have only 2 inputs in the model (see NINPUTS=2 at top). This meaningful line is:
  
    tmp = K.pow(inputs, self.w)

This line takes the inputs and raises them to the powers given by the weights. Since this is a backend function, it will automatically do this for all sets of inputs in the batch. The variable *tmp* therefore has a shape *(None, 2)* where again, *None* is the placeholder for the batch size.

    myout = K.prod(tmp, keepdims=True, axis=1) + self.b

This line takes the product of each value in a set and then adds the bias. The two arguments *keepDims=True* and *axis=1* say not to multiply **ALL** of the numbers together, but only those within the same batch element. Specifically, only multiply values on the *1-th* axis and not on the *0-th* axis. The the output of the K.prod() call will have a shape of *(None, 1)*. The shape of self.b does not ha

ve a *None* dimension, but Keras is smart enough to know that you want to add this one number to all of the values in the *(None,1)* Tensor returned by K.prod().

The value of *myout* is returned and indeed, you can see that the output shape of the model summary is *(None, 1)*.

## Fitting the model

Below is some pretty standard code for generating a set of inputs(aka *features*) and labels. Note that for this example, the range of values of the inputs are all positive. This is because I allow the weights to be trained as floating point numbers and taking a negative number to a fractional power is undefined.

In [2]:
# 
# Generate dataframes for features and labels
#
import pandas as pd
import numpy as np

X = []
Z = []
for x in np.arange(0.0, 10.1, 0.1):
    for y in np.arange(0.0, 10.1, 0.1):
        z = x*y
        X.append([x,y])  # features
        Z.append([z])    # labels

df = pd.DataFrame(X, columns=['x', 'y'])
labelsdf = pd.DataFrame(Z, columns=['z'])

In [4]:
EPOCHS = 100  # (in addition to anything already done)
BS     = 1

# Fit the model
history = model.fit(
    x = df,
    y = labelsdf,
    batch_size = BS,
    epochs=EPOCHS,
    #validation_split=0.2,
    shuffle=True,
    verbose=1,
    use_multiprocessing=False
)

model.save('multiply_model_customLayer01.h5')

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

RuntimeError: Unable to create link (name already exists)

## Check the result

For this simple example, the exponents are initialized to 2.01, but the labels are generated assuming all are 1.0. This lets us verify that the training actually worked and found the correct values. Printing the weights below shows they both came out pretty close to 1.0. I should note that if I allow the bias to train as well, it will cause the training to take longer, but it will eventually get down close to zero.

In [None]:
for layer in model.layers: print(layer.get_weights())

# An even simpler example using a Lambda layer

The above is doing something pretty simple, but the definition of the ProductLayer class does seem kind of large for something that is essentially $x_{0}\times x_{1}$. It has the benefit though of being something that could be expanded to a pretty complex formula. Suppose though that we wanted to instead create a model that did the same thing, but did not have any trainable weights. In this case we could use a Keras *Lambda* layer. For this, we just need to define a procedure that is essentially the *call()* method of *ProductLayer*.

In the following cell, I define such a routine. Here, the exponents are hardcoded as 0.5, 2.0 just by way of example. This means the output of the layer will be the product of the square root of the first input and the square of the second.

In [None]:
from tensorflow.keras.layers import Lambda
#-----------------------------------------------------
# MyProductLambda
#-----------------------------------------------------
def MyProductLambda(inputs):
    tmp = K.pow(inputs, (0.5, 2.0))
    return K.prod(tmp, keepdims=True, axis=1)

#-----------------------------------------------------
# DefineModelLambda
#-----------------------------------------------------
def DefineModelLambda():

    # Build the network model with 2 inputs and one output.
    inputs = Input(shape=(NINPUTS,), name='inputs')
    output = Lambda(MyProductLambda, output_shape=(1,))(inputs)
    model  = Model(inputs=inputs, outputs=output)
    
    opt = Adadelta(clipnorm=1.0)
    model.compile(loss='mse', optimizer=opt, metrics=['mae', 'mse', 'accuracy'])

    return model


model_lambda = DefineModelLambda()
model_lambda.summary()

In [None]:
# Test the model. We give all of the inputs as "4" so sqrt(4)*4^2 = 32 which you can see 
x_lambda = tf.ones((3,2), dtype=tf.dtypes.float32)*(4.0)
y_lambda = model_lambda(x_lambda)
print(y_lambda)