### LazyAdam Optimizer

LazyAdam is a variant of the Adam optimizer that handles sparse updates more efficiently. The original Adam algorithm maintains two moving-average accumulators for each trainable variable; the accumulators are updated at every step. This class provides lazier handling of gradient updates for sparse variables. It only updates moving-average accumulators for sparse variable indices that appear in the current batch, rather than updating the accumulators for all indices. Compared with the original Adam optimizer, it can provide large improvements in model training throughput for some applications. However, it provides slightly different semantics than the original Adam algorithm, and may lead to different empirical results.

In [1]:
!pip install -q -U tensorflow-addons

[K     |████████████████████████████████| 1.1MB 3.4MB/s 
[?25h

In [2]:
import tensorflow as tf
import tensorflow_addons as tfa

In [3]:
batch_size = 64
epochs = 10

### Model

In [4]:
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, input_shape = (784,), activation = 'relu', name = 'dense_1'),
    tf.keras.layers.Dense(64, activation = 'relu', name = 'dense_2'),
    tf.keras.layers.Dense(10, activation = 'softmax', name = 'predictions'),
])

### Prep Data

In [5]:
dataset = {}
num_validation = 10000
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

x_train = x_train.reshape(-1, 784).astype('float32') / 255
x_test = x_test.reshape(-1, 784).astype('float32') / 255

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
