# Additional techniques for building neural networks with TensorFlow

This notebook explores additional techniques to enhance neural networks' performance using TensorFlow.

In [1]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization, LayerNormalization, GaussianNoise
from tensorflow.keras.regularizers import l1, l2
from tensorflow.keras.initializers import GlorotUniform, HeUniform
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, LearningRateScheduler, ReduceLROnPlateau, TensorBoard
import numpy as np

## Regularization techniques
Regularization is a technique to prevent overfitting, which occurs when a model learns the training data too well and performs poorly on new, unseen data. By introducing a penalty on the complexity of the model, regularization encourages simpler models that generalize better to new data. Regularization adds a penalty to the loss function of a neural network. This penalty discourages the network from assigning too much importance to any single weight, which can help in generalizing the model to unseen data.

During training, the loss function of the model is modified to include the regularization penalty. The goal is to minimize both the error on the training data and the regularization penalty. This helps the model to not only fit the training data but also to maintain a simpler form that is less likely to overfit.

#### L1 and L2 regularization
- **L1 regularization** adds a penalty equal to the absolute value of the magnitude of coefficients. It can drive some weights to zero, effectively performing feature selection. This means L1 regularization can make the model sparse and reduce the complexity of the model. Mathematically: for a weight $w$, the penalty is proportional to $|w|$.
- **L2 regularization** adds a penalty equal to the square of the magnitude of coefficients. It tends to shrink the weights evenly without driving them to zero, which helps in stabilizing the model. Mathematically: for a weight $w$, the penalty is proportional to $w^2$.

In [2]:
# Generate dummy data
X_train = np.random.rand(100, 10)
y_train = np.random.rand(100, 1)

# Build a simple FFNN with L1 and L2 regularization

# Model with L1 regularization
model_l1 = Sequential()
model_l1.add(Dense(64, activation='relu', kernel_regularizer=l1(0.01), input_shape=(10,)))
model_l1.add(Dense(32, activation='relu', kernel_regularizer=l1(0.01)))
model_l1.add(Dense(1, activation='linear'))

model_l1.compile(optimizer=Adam(), loss='mean_squared_error')
print("Training model with L1 regularization:")
model_l1.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2)

# Model with L2 regularization
model_l2 = Sequential()
model_l2.add(Dense(64, activation='relu', kernel_regularizer=l2(0.01), input_shape=(10,)))
model_l2.add(Dense(32, activation='relu', kernel_regularizer=l2(0.01)))
model_l2.add(Dense(1, activation='linear'))

model_l2.compile(optimizer=Adam(), loss='mean_squared_error')
print("\nTraining model with L2 regularization:")
model_l2.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2)

Training model with L1 regularization:
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10

Training model with L2 regularization:
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x1355784a160>

**Explanation:**
- L1 regularization: (`kernel_regularizer=l1(0.01)`) - Adds L1 regularization with a penalty of `0.01`.
- L2 regularization: (`kernel_regularizer=l2(0.01)`) - Adds L2 regularization with a penalty of `0.01`.

#### Dropout
Dropout is a regularization technique used in neural networks to prevent overfitting by randomly setting a fraction of input units to zero during training. By doing this, dropout forces the network to be more robust and prevents it from becoming overly dependent on any particular feature. This technique helps in making the model generalize better to unseen data.

During each training iteration, dropout randomly "drops out" (i.e., sets to zero) a specified percentage of neurons in a layer. This means that at each update, only a subset of neurons is used to compute the forward pass and backpropagation. By dropping out neurons, the network is forced to learn redundant representations, making it more resilient and less likely to overfit. The stochastic behavior of dropout ensures that different neurons are dropped each time, which effectively creates an ensemble of networks during training.

In [3]:
# Build a simple FFNN with dropout
model = Sequential()
model.add(Dense(64, activation='relu', input_shape=(10,)))
model.add(Dropout(0.5))
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(1, activation='linear'))

model.compile(optimizer='adam', loss='mean_squared_error')

model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x13559c768b0>

**Explanation:**
- `Dropout(0.5)`: Sets 50% of the input units to 0 at each update during training. Similarly with `Dropout(0.2)`

#### Layer normalization
Layer normalization normalizes the inputs to a layer, which helps in stabilizing and accelerating the training process. Unlike batch normalization, which normalizes across the batch, layer normalization normalizes across the features for each training example independently. Layer normalization computes the mean and variance for each feature vector, rather than across the batch. It then uses these statistics to normalize the feature vector for each individual sample. It applies a linear transformation to the normalized values to restore the representational capacity of the network.

In simple words, let's say we have a layer in a neural network with many neurons. Each neuron processes a different feature of the input data. As the data passes through this layer, the values can vary a lot from one feature to another. This variation can make training unstable and slow. Layer normalization helps to fix this problem by ensuring that the activations (outputs) of a layer are standardized. Specifically, it adjusts the activations so that they have a mean of zero and a variance of one for each individual sample.

In [4]:
# Build a simple FFNN with layer normalization
model = Sequential()
model.add(Dense(64, activation='relu', input_shape=(10,)))
model.add(LayerNormalization())
model.add(Dense(32, activation='relu'))
model.add(LayerNormalization())
model.add(Dense(1, activation='linear'))

model.compile(optimizer='adam', loss='mean_squared_error')

model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x1355ae0f880>

**Explanation:**
- `LayerNormalization()`: Applies layer normalization to the inputs of the layer.

#### Gaussian noise
Gaussian noise adds random noise to the inputs or weights during training, which can help the model become more robust by preventing it from learning spurious patterns specific to the training data. It is applied by adding a random Gaussian distribution (mean = 0, variance = specified) to the inputs of a layer. The noise helps in regularizing the network by making it less sensitive to the exact values of the inputs.

In [5]:
# Build a simple FFNN with gaussian noise
model = Sequential()
model.add(Dense(64, activation='relu', input_shape=(10,)))
model.add(GaussianNoise(stddev=0.1))
model.add(Dense(32, activation='relu'))
model.add(GaussianNoise(stddev=0.2))
model.add(Dense(1, activation='linear'))

model.compile(optimizer='adam', loss='mean_squared_error')

model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x1355c0413d0>

***Explanation:***
- `GaussianNoise(stddev=0.1)`: Adds Gaussian noise with a standard deviation of 0.1 to the inputs of the layer. The added noise forces the network to become more resilient to minor variations in the input data.

## Weight initialization strategies
Weight initialization sets the initial values of the weights of the network's layers before training begins. Proper initialization can significantly affect the speed of convergence and the success of the training process. Proper weight initialization helps in avoiding problems like vanishing or exploding gradients, which can make training inefficient or even impossible, especially in deep networks.

***Glorot (Xavier) initialization***
Glorot initialization is suitable for layers with a linear or tanh activation function. It designed to keep the scale of the gradients roughly the same in all layers. Weights are initialized from a distribution with zero mean and a variance of 2/(input units + output units).

***He initialization***
He initialization is suitable for layers with ReLU or its variants as activation functions. It helps mitigate the vanishing gradient problem in networks using ReLU activations. Weights are initialized from a distribution with zero mean and a variance of 2/(input units).

In [6]:
# Build a simple FFNN with Glorot Initialization
model_glorot = Sequential()
model_glorot.add(Dense(64, activation='relu', kernel_initializer=GlorotUniform(), input_shape=(10,)))
model_glorot.add(Dense(32, activation='relu', kernel_initializer=GlorotUniform()))
model_glorot.add(Dense(1, activation='linear'))

model_glorot.compile(optimizer=Adam(), loss='mean_squared_error')
print("Training model with Glorot initialization:")
model_glorot.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2)


# Build a simple FFNN with He Initialization
model_he = Sequential()
model_he.add(Dense(64, activation='relu', kernel_initializer=HeUniform(), input_shape=(10,)))
model_he.add(Dense(32, activation='relu', kernel_initializer=HeUniform()))
model_he.add(Dense(1, activation='linear'))

model_he.compile(optimizer=Adam(), loss='mean_squared_error')
print("\nTraining model with He initialization:")
model_he.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2)

Training model with Glorot initialization:
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10

Training model with He initialization:
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x1355ae0d520>

***Explanation***
- Glorot initialization (`kernel_initializer=GlorotUniform()`): Initializes the weights with a Glorot (Xavier) uniform distribution.
- He initialization (`kernel_initializer=HeUniform()`): Initializes the weights with a He uniform distribution.

## Batch normalization
Batch normalization is a technique used to improve the training of deep neural networks. It helps to stabilize the learning process and significantly reduces the number of training epochs required to train deep networks. By normalizing the input of each layer, batch normalization helps in reducing the sensitivity to the scale of parameters or initialization, thereby stabilizing the learning process. It reduces the internal covariate shift by normalizing the layer inputs, making training faster and more stable. Batch normalization has a slight regularizing effect, which can sometimes eliminate the need for Dropout. With batch normalization, we can often use a higher learning rate, which can speed up the training process.

Batch normalization normalizes the output of a previous activation layer by subtracting the batch mean and dividing by the batch standard deviation. It then applies a learned linear transformation. This transformation allows the network to undo any normalization if it wants to, thus preserving the representational capacity of the network.

In [7]:
# Build a simple FFNN with Batch Normalization
model = Sequential()
model.add(Dense(64, activation='relu', input_shape=(10,)))
model.add(BatchNormalization())
model.add(Dense(32, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(1, activation='linear'))

model.compile(optimizer='adam', loss='mean_squared_error')

model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x1355e3ed4f0>

***Explanation***
- Batch normalization layer (`BatchNormalization()`) - is added after the Dense layer to normalize the outputs of the previous layer. This layer normalizes each batch's activations and rescales them using two learnable parameters: a scale factor and a shift value.

## Callbacks
Callbacks in TensorFlow allow you to intervene during the training process of a neural network. They provide a way to monitor the training and validation metrics, alter the learning rate, save the model at specific intervals, and even stop the training process based on certain conditions. These automated processes help improve model performance, make training more efficient, and prevent overfitting. Callbacks are functions or objects that are called at specific points during the training process. They allow us to perform actions at the beginning or end of each epoch, batch, or even when the training process starts or ends.

Let's explore some of the built-in callbacks provided by TensorFlow:

#### Model checkpoint
The `ModelCheckpoint` callback is used to save the model or model weights at specific intervals during training. This is useful for long training sessions where you want to ensure that you don't lose progress if training is interrupted. It saves the model at the end of an epoch if the monitored metric (e.g., validation loss) has improved.

In [8]:
# Build a simple FFNN
model = Sequential()
model.add(Dense(32, activation='relu', input_shape=(10,)))
model.add(Dense(16, activation='relu'))
model.add(Dense(1, activation='linear'))

model.compile(optimizer='adam', loss='mean_squared_error')

# Define the ModelCheckpoint callback
checkpoint_callback = ModelCheckpoint(
    filepath='best_model.h5',  # Path to save the model file
    save_best_only=True,       # Save only the best model based on monitored metric
    monitor='val_loss',        # Metric to monitor
    mode='min'                 # Save the model when val_loss is minimized
)

# Train the model with the callback
model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2, callbacks=[checkpoint_callback])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x1355e605ee0>

When using the ModelCheckpoint callback in TensorFlow, you have several options for saving the model file:
- HDF5 format - `filepath='best_model.h5'`
- SavedModel format -  `filepath='saved_model/'`

#### Early stopping
`EarlyStopping` stops the training when the monitored metric has stopped improving. This helps prevent overfitting by stopping training once performance plateaus. It monitors a metric and stops training if no improvement is seen for a specified number of epochs (patience).

In [9]:
# Build a simple FFNN
model = Sequential()
model.add(Dense(32, activation='relu', input_shape=(10,)))
model.add(Dense(16, activation='relu'))
model.add(Dense(1, activation='linear'))

model.compile(optimizer='adam', loss='mean_squared_error')

# Define the EarlyStopping callback
early_stopping_callback = EarlyStopping(
    monitor='val_loss',  # Metric to monitor
    patience=3,          # Number of epochs with no improvement after which training will be stopped
    mode='min'           # Stop when the monitored metric is minimized (can be min or max)
)

# Train the model with the callback
model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2, callbacks=[checkpoint_callback])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x13560796490>

#### Learning rate scheduler
`LearningRateScheduler` dynamically adjusts the learning rate according to a specified schedule or function. It applies a user-defined function to update the learning rate at each epoch.

In [10]:
# Build a simple FFNN
model = Sequential()
model.add(Dense(32, activation='relu', input_shape=(10,)))
model.add(Dense(16, activation='relu'))
model.add(Dense(1, activation='linear'))

model.compile(optimizer='adam', loss='mean_squared_error')

# Define a function to update the learning rate
def scheduler(epoch, lr):
    if epoch % 2 == 0 and epoch:
        return lr * 0.5
    return lr

# Define the LearningRateScheduler callback
lr_scheduler_callback = LearningRateScheduler(scheduler)

# Train the model with the callback
model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2, callbacks=[lr_scheduler_callback])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x135608b1c70>

**Explanation**:
- `scheduler`: A function that takes the epoch index and the current learning rate and returns the updated learning rate.
- Learning rate is halved every 2 epochs in this example.

#### Reduce learning rate on plateau
`ReduceLROnPlateau` reduces the learning rate when a metric has stopped improving. This can help in fine-tuning the model when progress slows. It monitors a metric and reduces the learning rate by a factor if no improvement is seen for a set number of epochs. `ReduceLROnPlateau` adjusts the learning rate to improve training, while `EarlyStopping` stops training to avoid overfitting.

In [11]:
# Build a simple FFNN
model = Sequential()
model.add(Dense(32, activation='relu', input_shape=(10,)))
model.add(Dense(16, activation='relu'))
model.add(Dense(1, activation='linear'))

model.compile(optimizer='adam', loss='mean_squared_error')

# Define the ReduceLROnPlateau callback
reduce_lr_callback = ReduceLROnPlateau(
    monitor='val_loss',  # Metric to monitor
    factor=0.1,          # Factor by which the learning rate will be reduced
    patience=2,          # Number of epochs with no improvement before reducing the learning rate
    mode='min'           # Reduce when the monitored metric is minimized
)

# Train the model with the callback
model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2, callbacks=[reduce_lr_callback])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x13561a5a340>

#### Tensor board
Tensor board is a tool for visualizing metrics during training. It provides a suite of visualizations to help understand the model's performance. It logs data during training, which can then be visualized in the TensorBoard UI and can assist in debugging and optimizing our models.

TensorBoard is a web-based dashboard that allows us to visualize and analyze different aspects of our model's training process. It provides tools to:
- Visualize metrics like loss and accuracy over epochs.
- Track and visualize training and validation metrics.
- Display model graphs and architecture.
- Inspect and compare histograms and distributions of weights and biases.

In [12]:
# Build a simple FFNN
model = Sequential()
model.add(Dense(64, activation='relu', input_shape=(10,)))
model.add(Dense(32, activation='relu'))
model.add(Dense(1, activation='linear'))

model.compile(optimizer='adam', loss='mean_squared_error')

# Define the TensorBoard callback
tensorboard_callback = TensorBoard(
    log_dir='logs',          # Directory where the logs will be saved
    histogram_freq=1,        # Compute histograms every epoch
    write_graph=True,        # Visualize the model graph
    write_images=True,       # Write activation images
    update_freq='epoch'      # Update metrics at the end of each epoch
)

# Train the model with the callback
model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2, callbacks=[tensorboard_callback])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x13561bc9f70>

***Parameters of `TensorBoard` callback***
1. `log_dir` - The directory where TensorBoard will save the logs. It specifies a directory path where log files will be written. TensorBoard will read from this directory to display the metrics.
2. `histogram_freq` - Frequency (in epochs) at which histograms of weights and biases will be computed. Set this to `1` to compute histograms at every epoch or `0` to disable histogram computation. This is useful for analyzing how weights change during training.
3. `write_graph` - Whether to visualize the model graph in TensorBoard. Set to `True` to write the model graph to the log directory, allowing us to visualize the model architecture.
4. `write_images` - Whether to write model activation images to the log directory. Set to `True` to log images of weights and activations. This can help in understanding how our model's features are evolving.
5. `update_freq` - Frequency (in steps) at which the metrics will be updated. Set to `'batch'` to update after each batch or `'epoch'` to update after each epoch. This controls the granularity of logging.

##### Accessing TensorBoard UI
To access TensorBoard and visualize the logged data, follow these steps:

1. **Start TensorBoard:**
   - Open a terminal or command prompt.
   - Run the following command:
     ```bash
     tensorboard --logdir=logs
     ```
   - This command starts TensorBoard and points it to the `logs` directory where our training logs are saved.
2. **Open TensorBoard in a browser:**
   - Open a web browser and go to the URL to access the TensorBoard dashboard. We will see a message indicating the URL where TensorBoard is running (usually `http://localhost:6006`).
3. **Explore TensorBoard UI:**
   - The TensorBoard interface provides various tabs such as Scalars, Graphs, Distributions, and Histograms.