12. Implement a custom layer that performs Layer Normalization (we will use this type of layer in Chapter 15 ):

The build() method should define two trainable weights α and β , both of shape input_shape[-1:] and data type tf.float32 . α should be initialized with 1s, and β with 0s.

The call() method should compute the mean μ and standard deviation σ of each instance’s features. For this, you can use tf.nn.moments(inputs, axes=-1, keepdims=True) , which returns the mean μ and the variance σ 2 of all instances (compute the square root of the variance to get the standard deviation). Then the function should compute and return α ⊗( X μ )/( σ + ε ) + β , where ⊗ represents itemwise multiplication ( * ) and ε is a smoothing term (small constant to avoid division by zero, e.g., 0.001).

Ensure that your custom layer produces the same (or very nearly the same) output as the keras.layers.LayerNormalization layer.


In [9]:
import tensorflow as tf
import tensorflow.keras as keras

In [14]:
class LayerNormalization(keras.layers.Layer):
  def build(self, input_shape, eps=None):
    self.alpha = self.add_weight(shape=input_shape[-1:], initializer=tf.ones_initializer(), trainable=True)
    self.beta = self.add_weight(shape=input_shape[-1:], initializer=tf.zeros_initializer(), trainable=True)
    self.eps = eps or keras.backend.epsilon()

  def call(self, inputs):
    mean, variance = tf.nn.moments(inputs, axes=-1, keepdims=True)
    stddev = tf.sqrt(variance)
    normalized = (inputs - mean) / (stddev + self.eps)
    return tf.multiply(self.alpha, normalized) + self.beta


In [20]:
X = tf.random.normal((100, 10), mean=10, stddev=5.0)


custom_normalization = LayerNormalization()
custom_normalization.build(X.shape)
X_custom_normalized = custom_normalization.call(X)

keras_normalization = keras.layers.LayerNormalization()
keras_normalization.build(X.shape)
X_keras_normalized = keras_normalization.call(X)

tf.debugging.assert_near(X_custom_normalized, X_keras_normalized, atol=0.001, rtol=0.)