Join GitHub today
numerical instability for Adam and Adadelta optimizer #1767
For Adam and Adadelta optimizer, when the model is close to convergence, the accuracy often suddenly drops to 0 with perplexity going to NAN, as shown below:
Epoch 3, 251750/348124; acc: 70.47; ppl: 3.77; 3911 tok/s; lr: 0.0010000; 717152.5 s elapsed
The code I have run is OpenNMT-py on a large dataset with 16M parallel sentences (Unite Nation Parallel Corpus v1.0), this phenomenon is observed on Adam and Adadelta which involves division, so far not seen on SGD. I suggest developers to check for divide by zero in Adam and Adadelta optimizers, and probably others.
try changing epsilon (eps) to 1e-3: