numerical instability for Adam and Adadelta optimizer #1767
Labels
Comments
try changing epsilon (eps) to 1e-3: |
we do have an |
@xuancong84 Hi, Have you solved this problem? I encountered similar problem. I am wondering how did you solve it? Thank you very much. |
houseroad
added a commit
to houseroad/pytorch
that referenced
this issue
Jan 29, 2019
…08e7e3 Summary: Previous import was dc75285d4a1cff9618400164dfdb26c5a1bab70a Included changes: - **[15c33c9](onnx/onnx@15c33c9: Add ppc64le build (pytorch#1768) <Chin Huang> - **[198f840](onnx/onnx@198f840: Update Broadcasting.md (pytorch#1769) <Verma-Rajat> - **[60ac95f](onnx/onnx@60ac95f: Merge back from release 1.4.1 (pytorch#1767) <Raymond Yang> - **[a683372](onnx/onnx@a683372: Bump up version number for v1.4.0 (pytorch#1761) (pytorch#1763) <Raymond Yang> - **[dbf3581](onnx/onnx@dbf3581: Add TfIdfVectorizer operator to ONNX (pytorch#1721) <Dmitri Smirnov> Differential Revision: D13858840 fbshipit-source-id: 90b2e21c80de4936507a27fc93d0879128ab4fb7
facebook-github-bot
added a commit
that referenced
this issue
Jan 29, 2019
…08e7e3 (#16493) Summary: Pull Request resolved: #16493 Previous import was dc75285d4a1cff9618400164dfdb26c5a1bab70a Included changes: - **[15c33c9](onnx/onnx@15c33c9: Add ppc64le build (#1768) <Chin Huang> - **[198f840](onnx/onnx@198f840: Update Broadcasting.md (#1769) <Verma-Rajat> - **[60ac95f](onnx/onnx@60ac95f: Merge back from release 1.4.1 (#1767) <Raymond Yang> - **[a683372](onnx/onnx@a683372: Bump up version number for v1.4.0 (#1761) (#1763) <Raymond Yang> - **[dbf3581](onnx/onnx@dbf3581: Add TfIdfVectorizer operator to ONNX (#1721) <Dmitri Smirnov> Reviewed By: zrphercule Differential Revision: D13858840 fbshipit-source-id: 1d00f63f265cc6deed965b92ed00c44f547ff03e
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
For Adam and Adadelta optimizer, when the model is close to convergence, the accuracy often suddenly drops to 0 with perplexity going to NAN, as shown below:
Epoch 3, 251750/348124; acc: 70.47; ppl: 3.77; 3911 tok/s; lr: 0.0010000; 717152.5 s elapsed
Epoch 3, 251800/348124; acc: 71.91; ppl: 3.53; 3796 tok/s; lr: 0.0010000; 717190.5 s elapsed
Epoch 3, 251850/348124; acc: 71.03; ppl: 3.58; 3752 tok/s; lr: 0.0010000; 717227.2 s elapsed
Epoch 3, 251900/348124; acc: 69.85; ppl: 3.86; 3830 tok/s; lr: 0.0010000; 717266.6 s elapsed
Epoch 3, 251950/348124; acc: 70.55; ppl: 3.73; 3930 tok/s; lr: 0.0010000; 717302.3 s elapsed
Epoch 3, 252000/348124; acc: 69.78; ppl: 4.03; 3912 tok/s; lr: 0.0010000; 717340.9 s elapsed
Epoch 3, 252050/348124; acc: 69.01; ppl: 4.18; 2699 tok/s; lr: 0.0010000; 717392.5 s elapsed
Epoch 3, 252100/348124; acc: 70.09; ppl: 3.90; 3935 tok/s; lr: 0.0010000; 717429.4 s elapsed
Epoch 3, 252150/348124; acc: 69.48; ppl: 4.18; 3758 tok/s; lr: 0.0010000; 717463.5 s elapsed
Epoch 3, 252200/348124; acc: 26.95; ppl: nan; 3753 tok/s; lr: 0.0010000; 717506.3 s elapsed
Epoch 3, 252250/348124; acc: 0.00; ppl: nan; 3925 tok/s; lr: 0.0010000; 717546.5 s elapsed
Epoch 3, 252300/348124; acc: 0.00; ppl: nan; 3822 tok/s; lr: 0.0010000; 717584.6 s elapsed
Epoch 3, 252350/348124; acc: 0.00; ppl: nan; 3813 tok/s; lr: 0.0010000; 717622.8 s elapsed
Epoch 3, 252400/348124; acc: 0.00; ppl: nan; 3677 tok/s; lr: 0.0010000; 717661.0 s elapsed
Epoch 3, 252450/348124; acc: 0.00; ppl: nan; 3999 tok/s; lr: 0.0010000; 717699.2 s elapsed
Epoch 3, 252500/348124; acc: 0.00; ppl: nan; 3939 tok/s; lr: 0.0010000; 717738.1 s elapsed
Epoch 3, 252550/348124; acc: 0.00; ppl: nan; 3872 tok/s; lr: 0.0010000; 717771.3 s elapsed
The code I have run is OpenNMT-py on a large dataset with 16M parallel sentences (Unite Nation Parallel Corpus v1.0), this phenomenon is observed on Adam and Adadelta which involves division, so far not seen on SGD. I suggest developers to check for divide by zero in Adam and Adadelta optimizers, and probably others.
The text was updated successfully, but these errors were encountered: