## Gradient Checking

To determine whether the gradient calculation is performed correctly, I've implemented a simple gradient checker using information from this [online article](https://towardsdatascience.com/how-to-debug-a-neural-network-with-gradient-checking-41deec0357a9).<br>

We start by approximating the gradient of the model using the limit definition of a derivative:
$$\frac{d}{d \theta}J(\theta) = \lim\limits_{\epsilon \rightarrow 0} \frac{J(\theta + \epsilon) - J(\theta - \epsilon)}{2\epsilon}$$
Where $J$ represents our cost function, and $\theta$ represents our model's parameters. We can use an arbitrarily small value for $\epsilon$ (1e-4 by default) to approximate the gradient.<br>
Then, we calculate the normalized distance between the actual and approximated gradient vectors, as follows:
$$\text{distance} = \frac{\| d\theta_{approx} - d\theta \|_2}{\| d\theta_{approx} \|_2 + \| d\theta \|_2}$$
If the distance is sufficiently small (less than  $\epsilon^2$), the gradient is *probably* correct.<br>
The source code for the `gradient_check()` function can be found in `/src/sandbox/utils.py`.<br>

Last update: 12/24/23

In [1]:
from importlib import reload
import numpy as np
from sandbox import activations, costs, initializers, layers, model, optimizers, utils

Checking activation functions

In [2]:

reload(activations)
reload(costs)
reload(layers)

# Create dummy data
X = np.random.randn(100, 3)
Y = np.random.randint(0, 2, (100, 2))

# Create model
activation = model.Model()
# activation.add(layers.Dense(units=10, activation=activations.ReLU()))
# activation.add(layers.Dense(units=10, activation=activations.Linear()))
# activation.add(layers.Dense(units=10, activation=activations.Arctan()))
# activation.add(layers.Dense(units=10, activation=activations.BentIdentity()))
# activation.add(layers.Dense(units=10, activation=activations.Linear()))
# activation.add(layers.Dense(units=10, activation=activations.LeakyReLU(alpha=0.05)))
# activation.add(layers.Dense(units=10, activation=activations.Tanh()))
# activation.add(layers.Dense(units=10, activation=activations.ELU()))
# activation.add(layers.Dense(units=10, activation=activations.SELU()))
# activation.add(layers.Dense(units=10, activation=activations.SLU()))
# activation.add(layers.Dense(units=10, activation=activations.Softplus()))
# activation.add(layers.Dense(units=10, activation=activations.Softsign()))
# activation.add(layers.Dense(units=10, activation=activations.Gaussian()))
# activation.add(layers.Dense(units=10, activation=activations.PiecewiseLinear()))
# activation.add(layers.Dense(units=10, activation=activations.Softmax()))
# activation.add(layers.Dense(units=1, activation=activations.Sigmoid()))

activation.add(layers.Dense(units=16, activation=activations.ReLU()))
activation.add(layers.Dense(units=8, activation=activations.ReLU()))
activation.add(layers.Dense(units=2, activation=activations.Softmax()))

activation.configure(
    input_size=3,
    cost_type=costs.CategoricalCrossentropy(),
    optimizer=optimizers.SGD(),
)

# Check gradient
diff = utils.gradient_check(activation, X, Y)
print(diff)

ValueError: shapes (3,100) and (3,16) not aligned: 100 (dim 1) != 3 (dim 0)

Checking cost functions

In [38]:
reload(costs)

# Create dummy data
X = np.random.randn(100, 1)
Y = np.random.randint(0, 2, (100, 1))

# Create model
cost = model.Model()
cost.add(layers.Dense(units=4, activation=activations.ReLU()))
cost.add(layers.Dense(units=2, activation=activations.ReLU()))
cost.add(layers.Dense(units=1, activation=activations.Sigmoid()))

# Check gradients for each cost type

# Binary Cross-Entropy
cost.configure(
    input_size=1,
    cost_type=costs.BinaryCrossentropy(),
    optimizer=optimizers.SGD()
)
diff = utils.gradient_check(cost, X, Y)
print(diff)

# Categorical Cross-Entropy
cost.configure(
    input_size=1,
    cost_type=costs.CategoricalCrossentropy(),
    optimizer=optimizers.SGD()
)
diff = utils.gradient_check(cost, X, Y)
print(diff)

# Mean Squared Error
cost.configure(
    input_size=1,
    cost_type=costs.MSE(),
    optimizer=optimizers.SGD()
)
diff = utils.gradient_check(cost, X, Y)
print(diff)

# Mean Absolute Error
cost.configure(
    input_size=1,
    cost_type=costs.MAE(),
    optimizer=optimizers.SGD()
)
diff = utils.gradient_check(cost, X, Y)
print(diff)

2.22407724488422e-10
5.893096966740448e-11
1.4485176430592358e-10
5.2466482580955975e-11


Checking layer types - excluding dropout, which does not work with gradient checking

In [39]:
reload(layers)

# Create dummy data
X = np.random.randn(100, 1)
Y = np.random.randint(0, 2, (100, 1))

# Create model
layer = model.Model()
layer.add(layers.Dense(units=4, activation=activations.ReLU()))
layer.add(layers.Dense(units=2, activation=activations.ReLU()))
layer.add(layers.Dense(units=1, activation=activations.Sigmoid()))

layer.configure(
    input_size=1,
    cost_type=costs.BinaryCrossentropy(),
    optimizer=optimizers.SGD()
)

# Check gradient
diff = utils.gradient_check(layer, X, Y)
print(diff)

6.804909933078426e-10
