Skip to content

Latest commit

 

History

History
95 lines (61 loc) · 2.49 KB

index.md

File metadata and controls

95 lines (61 loc) · 2.49 KB

Optimizers

Usage with compile() & fit()

An optimizer is one of the two arguments required for compiling a Keras model:

from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential()
model.add(layers.Dense(64, kernel_initializer='uniform', input_shape=(10,)))
model.add(layers.Activation('softmax'))

opt = keras.optimizers.Adam(learning_rate=0.01)
model.compile(loss='categorical_crossentropy', optimizer=opt)

You can either instantiate an optimizer before passing it to model.compile() , as in the above example, or you can pass it by its string identifier. In the latter case, the default parameters for the optimizer will be used.

# pass optimizer by name: default parameters will be used
model.compile(loss='categorical_crossentropy', optimizer='adam')

Usage in a custom training loop

When writing a custom training loop, you would retrieve gradients via a tf.GradientTape instance, then call optimizer.apply_gradients() to update your weights:

# Instantiate an optimizer.
optimizer = tf.keras.optimizers.Adam()

# Iterate over the batches of a dataset.
for x, y in dataset:
    # Open a GradientTape.
    with tf.GradientTape() as tape:
        # Forward pass.
        logits = model(x)
        # Loss value for this batch.
        loss_value = loss_fn(y, logits)

    # Get gradients of loss wrt the weights.
    gradients = tape.gradient(loss_value, model.trainable_weights)

    # Update the weights of the model.
    optimizer.apply_gradients(zip(gradients, model.trainable_weights))

Note that when you use apply_gradients, the optimizer does not apply gradient clipping to the gradients: if you want gradient clipping, you would have to do it by hand before calling the method.


Learning rate decay / scheduling

You can use a learning rate schedule to modulate how the learning rate of your optimizer changes over time:

lr_schedule = keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate=1e-2,
    decay_steps=10000,
    decay_rate=0.9)
optimizer = keras.optimizers.SGD(learning_rate=lr_schedule)

Check out the learning rate schedule API documentation for a list of available schedules.


Available optimizers

{{toc}}


Core Optimizer API

These methods and attributes are common to all Keras optimizers.

{{autogenerated}}