# Notes on Chapter 11 of *Hands-On Machine Learning with Scikit-Learn, Keras, & TensorFlow*, 3rd edition, by Aurélien Géron

Reduce the amount of logging messages displayed by TensorFlow

In [1]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

In [2]:
import tensorflow as tf

It's important to choose a reasonable initialization of weights that won't lead to explosion of gradients in either the forward or backpropagation steps.

In general, Glorot initialization ($\sigma^2 = 1/n_{\textrm{avg}}$) should be used for identity/sigmoid/softmax/tanh activation functions, He initialization ($\sigma^2 = 2/n_{\textrm{in}}$) for ReLU derivatives, and LeCun initialization ($\sigma^2 = 1/n_{\textrm{in}}$) for SELU.

In keras, layer initialization can be specified when creating the layer, e.g.

In [4]:
layer1 = tf.keras.layers.Dense(64, activation="relu", kernel_initializer="he_normal")

It's also easy to set a custom variance scaling method, e.g.

In [5]:
my_initializer = tf.keras.initializers.VarianceScaling(scale=2., mode="fan_avg", distribution="uniform")
layer2 = tf.keras.layers.Dense(64, activation="leaky_relu", kernel_initializer=my_initializer)

Multiple activation functions are available; the author recommends `relu` or `leaky_relu` for shallow networks, and `swish` ($z \sigma(\beta z)$) for deep networks.

Batch normalization is another approach, i.e.

$$
z_{ij} = \gamma_j \frac{ x_{ij} - \bar x_j }{\sqrt{\sigma_j^2 + \epsilon}} + \beta_j
$$

where $\bar x$ is the vector mean of $x$ (estimated over the batch or as a running average), $\sigma_j$ is the (estimated) variance of $x_j$, $\epsilon$ is a smoothing constant (typically $10^{-5}$), and $\gamma$ and $\beta$ are (learnable) parameters.

Typically training will be slower per epoc due to the extra computation, but convergence will be faster leading to shorter overall training times. Adding batch normalization is as easy as just adding another layer (typically before or after each hidden layer's activation function:

In [6]:
layer3 = tf.keras.layers.BatchNormalization()