### ConditionalGradient

Constraining the parameters of a neural network has been shown to be beneficial in training because of the underlying regularization effects. Often, parameters are constrained via a soft penalty (which never guarantees the constraint satisfaction) or via a projection operation (which is computationally expensive). Conditional gradient (CG) optimizer, on the other hand, enforces the constraints strictly without the need for an expensive projection step. It works by minimizing a linear approximation of the objective within the constraint set. 

In [1]:
!pip install -q -U tensorflow-addons

[?25l[K     |▎                               | 10kB 19.0MB/s eta 0:00:01[K     |▋                               | 20kB 1.6MB/s eta 0:00:01[K     |█                               | 30kB 2.0MB/s eta 0:00:01[K     |█▏                              | 40kB 1.6MB/s eta 0:00:01[K     |█▌                              | 51kB 1.9MB/s eta 0:00:01[K     |█▉                              | 61kB 2.1MB/s eta 0:00:01[K     |██                              | 71kB 2.4MB/s eta 0:00:01[K     |██▍                             | 81kB 2.6MB/s eta 0:00:01[K     |██▊                             | 92kB 2.8MB/s eta 0:00:01[K     |███                             | 102kB 2.7MB/s eta 0:00:01[K     |███▎                            | 112kB 2.7MB/s eta 0:00:01[K     |███▋                            | 122kB 2.7MB/s eta 0:00:01[K     |███▉                            | 133kB 2.7MB/s eta 0:00:01[K     |████▏                           | 143kB 2.7MB/s eta 0:00:01[K     |████▌                     

In [2]:
import tensorflow as tf
import tensorflow_addons as tfa

from matplotlib import pyplot as plt

In [3]:
batch_size = 64
epochs = 10

In [4]:
model_1 = tf.keras.Sequential([
    tf.keras.layers.Dense(64, input_shape=(784,), activation = 'relu', name = 'dense_1'),
    tf.keras.layers.Dense(64, activation = 'relu', name = 'dense_2'),
    tf.keras.layers.Dense(10, activation = 'softmax', name = 'predictions'),
])

In [5]:
# Prep MNIST Data

dataset = {}
num_validation = 10000
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

x_train = x_train.reshape(-1, 784).astype('float32') / 255
x_test = x_test.reshape(-1, 784).astype('float32') / 255

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


### Define a Frobenius Norm Callback Function

In [6]:
def frobenius_norm(m):
    """This function is to calculate the frobenius norm of the matrix of all
    layer's weight.
  
    Args:
        m: is a list of weights param for each layers.
    """
    total_reduce_sum = 0
    for i in range(len(m)):
        total_reduce_sum = total_reduce_sum + tf.math.reduce_sum(m[i]**2)
    norm = total_reduce_sum**0.5
    return norm

In [7]:
CG_frobenius_norm_of_weight = []
CG_get_weight_norm = tf.keras.callbacks.LambdaCallback(
    on_epoch_end = lambda batch, logs: CG_frobenius_norm_of_weight.append(
        frobenius_norm(model_1.trainable_weights).numpy()))