You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our TF-layers Nadam optimizer is basically the same as Adam except that we use use_nesterov=True for training_ops.apply_adam. It is based on TF 1.15 tensorflow/contrib/opt/python/training/nadam_optimizer.py. So it also has the same options as normal Adam:
Ok, I did not further look into this. The clipping and weight decay probably is added here to decouple it. The use_ema is disabled by default, so the ema_... options are not used. So maybe it is mostly the same. Except of a different epsilon default.
Our TF-layers Nadam optimizer is basically the same as Adam except that we use
use_nesterov=True
fortraining_ops.apply_adam
. It is based on TF 1.15 tensorflow/contrib/opt/python/training/nadam_optimizer.py. So it also has the same options as normal Adam:I noticed that
tf.keras.optimizers.experimental.Nadam
has some different options:Ok, I did not further look into this. The clipping and weight decay probably is added here to decouple it. The
use_ema
is disabled by default, so theema_...
options are not used. So maybe it is mostly the same. Except of a different epsilon default.See also:
#766 (comment)
keras-team/keras#15710
Now I noticed, in PyTorch,
torch.optim.NAdam
again has different options:I specifically wonder about the momentum_decay. What is this?
The text was updated successfully, but these errors were encountered: