## Gradient Checking

To determine whether the gradient calculation is performed correctly, I've implemented a simple gradient checker using information from this [online article](https://towardsdatascience.com/how-to-debug-a-neural-network-with-gradient-checking-41deec0357a9).<br>

We start by approximating the gradient of the model using the limit definition of a derivative:
$$\frac{d}{d \theta}J(\theta) = \lim\limits_{\epsilon \rightarrow 0} \frac{J(\theta + \epsilon) - J(\theta - \epsilon)}{2\epsilon}$$
Where $J$ represents our cost function, and $\theta$ represents our model's parameters. We can use an arbitrarily small value for $\epsilon$ (1e-4 by default) to approximate the gradient.<br>
Then, we calculate the normalized distance between the actual and approximated gradient vectors, as follows:
$$\text{distance} = \frac{\| d\theta_{approx} - d\theta \|_2}{\| d\theta_{approx} \|_2 + \| d\theta \|_2}$$
If the distance is sufficiently small (certainly less than $\epsilon$), the gradient is *probably* correct.<br>
The source code for the `gradient_check()` function can be found in `/src/sandbox/utils.py`.

In [39]:
from importlib import reload
import numpy as np
from sandbox import model, layers, activations, costs, utils

Checking activation functions

In [40]:

reload(activations)

# Create dummy data
X = np.random.randn(100, 1)
Y = np.random.randint(0, 2, (100, 1))

# Create model
activation = model.Model()
activation.add(layers.Dense(units=10, activation=activations.ReLU()))
activation.add(layers.Dense(units=10, activation=activations.Linear()))
activation.add(layers.Dense(units=10, activation=activations.Arctan()))
activation.add(layers.Dense(units=10, activation=activations.BentIdentity()))
activation.add(layers.Dense(units=10, activation=activations.Linear()))
activation.add(layers.Dense(units=10, activation=activations.LeakyReLU(alpha=0.05)))
activation.add(layers.Dense(units=10, activation=activations.Tanh()))
activation.add(layers.Dense(units=10, activation=activations.ELU()))
activation.add(layers.Dense(units=10, activation=activations.SELU()))
activation.add(layers.Dense(units=10, activation=activations.SLU()))
activation.add(layers.Dense(units=10, activation=activations.Softplus()))
activation.add(layers.Dense(units=10, activation=activations.Softsign()))
activation.add(layers.Dense(units=10, activation=activations.Gaussian()))
activation.add(layers.Dense(units=10, activation=activations.PiecewiseLinear()))
activation.add(layers.Dense(units=1, activation=activations.Sigmoid()))
activation.configure(cost_type=costs.BinaryCrossentropy())
activation.initialize_parameters(input_size=X.shape[1])

# Check gradient
diff = utils.gradient_check(activation, X, Y)
print(diff)

1.2382259724025657e-08


Checking cost functions

In [79]:
reload(costs)

# Create dummy data
X = np.random.randn(100, 1)
Y = np.random.randint(0, 2, (100, 1))

# Create model
cost = model.Model()
cost.add(layers.Dense(units=4, activation=activations.ReLU()))
cost.add(layers.Dense(units=2, activation=activations.ReLU()))
cost.add(layers.Dense(units=1, activation=activations.Sigmoid()))
cost.configure(cost_type=None)
cost.initialize_parameters(input_size=X.shape[1])

# Check gradients for each cost type

# Binary Cross-Entropy
cost.configure(cost_type=costs.BinaryCrossentropy())
diff = utils.gradient_check(cost, X, Y)
print(diff)

# Mean Squared Error
cost.configure(cost_type=costs.MSE())
diff = utils.gradient_check(cost, X, Y)
print(diff)

# Mean Absolute Error
cost.configure(cost_type=costs.MAE())
diff = utils.gradient_check(cost, X, Y)
print(diff)

2.637607713721354e-10
6.981784894868495e-10
2.19527231535713e-09


Checking layer types - excluding dropout, which does not work with gradient checking

In [80]:
reload(layers)

# Create dummy data
X = np.random.randn(100, 1)
Y = np.random.randint(0, 2, (100, 1))

# Create model
layer = model.Model()
layer.add(layers.Dense(units=4, activation=activations.ReLU()))
layer.add(layers.Dense(units=2, activation=activations.ReLU()))
layer.add(layers.Dense(units=1, activation=activations.Sigmoid()))
layer.configure(cost_type=costs.BinaryCrossentropy())
layer.initialize_parameters(input_size=X.shape[1])

# Check gradient
diff = utils.gradient_check(layer, X, Y)
print(diff)

1.0431558123944667e-09
