**Gradient Clipping**

Gradient clipping prevents exploding gradients during neural network training by capping gradient values to maintain stability and avoid numerical issues. This is crucial in deep networks and RNNs, where large gradients can destabilize training. There are two methods: clipping by value, which restricts each gradient within a range \([-v, v]\), and clipping by norm, which scales gradients if their norm exceeds a threshold. In Keras, this can be set using the `clipvalue` or `clipnorm` parameter in optimizers. Gradient clipping helps stabilize training and improve convergence.


In [3]:
# Import the Fashion MNIST dataset from Keras
fashion_mnist = tf.keras.datasets.fashion_mnist.load_data()

# Load the dataset into training and test sets
(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist

# Split the full training set into a smaller training set and a validation set
X_train, y_train = X_train_full[:-5000], y_train_full[:-5000]
X_valid, y_valid = X_train_full[-5000:], y_train_full[-5000:]

# Normalize the pixel values to be between 0 and 1 by dividing by 255
X_train, X_valid, X_test = X_train / 255.0, X_valid / 255.0, X_test / 255.0

# Compute the mean and standard deviation of the training set
pixel_means = X_train.mean(axis=0, keepdims=True)
pixel_stds = X_train.std(axis=0, keepdims=True)

# Standardize the training, validation, and test sets by subtracting the mean and dividing by the standard deviation
X_train_scaled = (X_train - pixel_means) / pixel_stds
X_valid_scaled = (X_valid - pixel_means) / pixel_stds
X_test_scaled = (X_test - pixel_means) / pixel_stds

# Define the class names corresponding to the labels in the Fashion MNIST dataset
class_names = ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat",
               "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz


In [2]:
tf.keras.backend.clear_session()
tf.random.set_seed(42)

In [1]:
# Import necessary TensorFlow modules
import tensorflow as tf

# Define a sequential model
model = tf.keras.Sequential([
    # Flatten layer: Converts each 28x28 image into a 1D array of 784 elements
    tf.keras.layers.Flatten(input_shape=[28, 28]),

    # use_bias=False means no bias term is added
    tf.keras.layers.Dense(300, kernel_initializer="he_normal", use_bias=False),

    #  helps to stabilize and accelerate training by reducing internal covariate shift
    tf.keras.layers.BatchNormalization(),

    # introduces non-linearity to the model
    tf.keras.layers.Activation("relu"),

    # Again, using He initialization and no bias term
    tf.keras.layers.Dense(100, kernel_initializer="he_normal", use_bias=False),

    # Another Batch Normalization layer: Normalizes the outputs of the previous layer
    tf.keras.layers.BatchNormalization(),

    # Another Activation layer: Applies the ReLU activation function
    tf.keras.layers.Activation("relu"),

    # Uses softmax activation function to output probabilities for each class
    tf.keras.layers.Dense(10, activation="softmax")
])

# Summary of the model: Provides a summary of the model architecture, showing layer types, output shapes, and number of parameters
model.summary()


Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten (Flatten)           (None, 784)               0         
                                                                 
 dense (Dense)               (None, 300)               235200    
                                                                 
 batch_normalization (Batch  (None, 300)               1200      
 Normalization)                                                  
                                                                 
 activation (Activation)     (None, 300)               0         
                                                                 
 dense_1 (Dense)             (None, 100)               30000     
                                                                 
 batch_normalization_1 (Bat  (None, 100)               400       
 chNormalization)                                       

In [4]:
from tensorflow.keras.optimizers import SGD

# SGD optimizer with a gradient clipping value set to 1.0
optimizer = SGD(clipvalue=1.0)

# Compile the model
model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer)


In [7]:
# Train for 20 epochs
model.fit(X_train_scaled, y_train, epochs=10,
          validation_data=(X_valid_scaled, y_valid))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x7e186c645720>