# Custom layer with `Keras`

In this notebook, we implement custom layer in `keras` that performs layer normalization.

`keras` has a `keras.layers.LayerNormalization` class that we will use to check our result.

In [17]:
import keras
import tensorflow as tf
import numpy as np

## Custom normalization layer

In [18]:
class MyNormLayer(keras.layers.Layer):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
    
    def build(self, input_shape):
        shape = input_shape[:-1] + (1,) #keeps the input shape, but the last axis (features) is collapsed to a singleton
        self.alpha = self.add_weight(shape=shape, dtype=tf.float32, initializer=keras.initializers.Constant(value=1.0))
        self.beta  = self.add_weight(shape=shape, dtype=tf.float32, initializer=keras.initializers.Constant(value=0.0))

    def call(self, inputs):
        mu = tf.reduce_mean(inputs, axis=-1, keepdims=True)
        std = tf.sqrt(tf.reduce_mean(tf.square(inputs - mu), axis=-1, keepdims=True))
        epsilon = 1e-3
        return self.alpha * (inputs - mu) / (std + epsilon) + self.beta
    
    def compute_output_shape(self, input_shape): # input and output of this layer have the same shape
        return input_shape

## Test with a simple input

In [19]:
inputs = tf.constant([[[5, 5, 5, 3], [6, 5, 3, 2], [1, 3, 3, 2]],[[1, 7, 2, 1], [8, 8, 2, 2], [7, 3, 4, 2]]], dtype=tf.float32)
my_layer = MyNormLayer()
my_layer(inputs)

<tf.Tensor: shape=(2, 3, 4), dtype=float32, numpy=
array([[[ 0.5766844 ,  0.5766844 ,  0.5766844 , -1.7300532 ],
        [ 1.2641115 ,  0.63205576, -0.63205576, -1.2641115 ],
        [-1.5057408 ,  0.9034444 ,  0.9034444 , -0.30114815]],

       [[-0.7032438 ,  1.7078779 , -0.3013902 , -0.7032438 ],
        [ 0.9996668 ,  0.9996668 , -0.9996668 , -0.9996668 ],
        [ 1.6027107 , -0.5342369 ,  0.        , -1.0684738 ]]],
      dtype=float32)>

In [20]:
keras_layer = keras.layers.LayerNormalization(epsilon=1e-3)
keras_layer(inputs)

<tf.Tensor: shape=(2, 3, 4), dtype=float32, numpy=
array([[[ 0.5769658 ,  0.5769658 ,  0.5769658 , -1.7308974 ],
        [ 1.2646582 ,  0.632329  , -0.6323291 , -1.2646582 ],
        [-1.5064616 ,  0.9038768 ,  0.9038768 , -0.30129242]],

       [[-0.70346963,  1.7084262 , -0.30148697, -0.70346963],
        [ 0.99994445,  0.99994445, -0.9999444 , -0.9999444 ],
        [ 1.6033385 , -0.53444624,  0.        , -1.0688924 ]]],
      dtype=float32)>

As the above example shows, my custom normalization layer produces an output that is close to the keras implementation (they are identical up to the third decimal place).

## Some code for testing

In [21]:
import inspect
print(inspect.getsource(keras.layers.LayerNormalization))

@keras_export("keras.layers.LayerNormalization")
class LayerNormalization(Layer):
    """Layer normalization layer (Ba et al., 2016).

    Normalize the activations of the previous layer for each given example in a
    batch independently, rather than across a batch like Batch Normalization.
    i.e. applies a transformation that maintains the mean activation within each
    example close to 0 and the activation standard deviation close to 1.

    If `scale` or `center` are enabled, the layer will scale the normalized
    outputs by broadcasting them with a trainable variable `gamma`, and center
    the outputs by broadcasting with a trainable variable `beta`. `gamma` will
    default to a ones tensor and `beta` will default to a zeros tensor, so that
    centering and scaling are no-ops before training has begun.

    So, with scaling and centering enabled the normalization equations
    are as follows:

    Let the intermediate activations for a mini-batch to be the `inputs`.

    Fo

In [22]:
inputs = tf.constant([[[5, 5, 5, 3], [6, 5, 3, 2], [1, 3, 3, 2]],[[1, 7, 2, 1], [8, 8, 2, 2], [7, 3, 4, 2]]], dtype=tf.float32)

In [23]:
mu = tf.reduce_mean(inputs, axis=-1, keepdims=True)

In [24]:
inputs - mu

<tf.Tensor: shape=(2, 3, 4), dtype=float32, numpy=
array([[[ 0.5 ,  0.5 ,  0.5 , -1.5 ],
        [ 2.  ,  1.  , -1.  , -2.  ],
        [-1.25,  0.75,  0.75, -0.25]],

       [[-1.75,  4.25, -0.75, -1.75],
        [ 3.  ,  3.  , -3.  , -3.  ],
        [ 3.  , -1.  ,  0.  , -2.  ]]], dtype=float32)>

In [25]:
var = tf.reduce_mean(tf.square(inputs - mu), axis=-1, keepdims=True)

In [26]:
var

<tf.Tensor: shape=(2, 3, 1), dtype=float32, numpy=
array([[[0.75  ],
        [2.5   ],
        [0.6875]],

       [[6.1875],
        [9.    ],
        [3.5   ]]], dtype=float32)>

In [27]:
np.var([6, 5, 3, 2])

np.float64(2.5)

In [28]:
output = keras_layer(inputs)

In [29]:
np.array(output)

array([[[ 0.5769658 ,  0.5769658 ,  0.5769658 , -1.7308974 ],
        [ 1.2646582 ,  0.632329  , -0.6323291 , -1.2646582 ],
        [-1.5064616 ,  0.9038768 ,  0.9038768 , -0.30129242]],

       [[-0.70346963,  1.7084262 , -0.30148697, -0.70346963],
        [ 0.99994445,  0.99994445, -0.9999444 , -0.9999444 ],
        [ 1.6033385 , -0.53444624,  0.        , -1.0688924 ]]],
      dtype=float32)

In [30]:
input_shape = inputs.shape
(input_shape[:-1],1)

(TensorShape([2, 3]), 1)

In [31]:
input_shape[:-1] + 1


TensorShape([2, 3, 1])

In [32]:
tup = (1, 4, 5)