Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

numerical instability for Adam and Adadelta optimizer #1767

Closed
xuancong84 opened this issue Jun 10, 2017 · 3 comments
Closed

numerical instability for Adam and Adadelta optimizer #1767

xuancong84 opened this issue Jun 10, 2017 · 3 comments

Comments

@xuancong84
Copy link

For Adam and Adadelta optimizer, when the model is close to convergence, the accuracy often suddenly drops to 0 with perplexity going to NAN, as shown below:

Epoch 3, 251750/348124; acc: 70.47; ppl: 3.77; 3911 tok/s; lr: 0.0010000; 717152.5 s elapsed
Epoch 3, 251800/348124; acc: 71.91; ppl: 3.53; 3796 tok/s; lr: 0.0010000; 717190.5 s elapsed
Epoch 3, 251850/348124; acc: 71.03; ppl: 3.58; 3752 tok/s; lr: 0.0010000; 717227.2 s elapsed
Epoch 3, 251900/348124; acc: 69.85; ppl: 3.86; 3830 tok/s; lr: 0.0010000; 717266.6 s elapsed
Epoch 3, 251950/348124; acc: 70.55; ppl: 3.73; 3930 tok/s; lr: 0.0010000; 717302.3 s elapsed
Epoch 3, 252000/348124; acc: 69.78; ppl: 4.03; 3912 tok/s; lr: 0.0010000; 717340.9 s elapsed
Epoch 3, 252050/348124; acc: 69.01; ppl: 4.18; 2699 tok/s; lr: 0.0010000; 717392.5 s elapsed
Epoch 3, 252100/348124; acc: 70.09; ppl: 3.90; 3935 tok/s; lr: 0.0010000; 717429.4 s elapsed
Epoch 3, 252150/348124; acc: 69.48; ppl: 4.18; 3758 tok/s; lr: 0.0010000; 717463.5 s elapsed
Epoch 3, 252200/348124; acc: 26.95; ppl: nan; 3753 tok/s; lr: 0.0010000; 717506.3 s elapsed
Epoch 3, 252250/348124; acc: 0.00; ppl: nan; 3925 tok/s; lr: 0.0010000; 717546.5 s elapsed
Epoch 3, 252300/348124; acc: 0.00; ppl: nan; 3822 tok/s; lr: 0.0010000; 717584.6 s elapsed
Epoch 3, 252350/348124; acc: 0.00; ppl: nan; 3813 tok/s; lr: 0.0010000; 717622.8 s elapsed
Epoch 3, 252400/348124; acc: 0.00; ppl: nan; 3677 tok/s; lr: 0.0010000; 717661.0 s elapsed
Epoch 3, 252450/348124; acc: 0.00; ppl: nan; 3999 tok/s; lr: 0.0010000; 717699.2 s elapsed
Epoch 3, 252500/348124; acc: 0.00; ppl: nan; 3939 tok/s; lr: 0.0010000; 717738.1 s elapsed
Epoch 3, 252550/348124; acc: 0.00; ppl: nan; 3872 tok/s; lr: 0.0010000; 717771.3 s elapsed

The code I have run is OpenNMT-py on a large dataset with 16M parallel sentences (Unite Nation Parallel Corpus v1.0), this phenomenon is observed on Adam and Adadelta which involves division, so far not seen on SGD. I suggest developers to check for divide by zero in Adam and Adadelta optimizers, and probably others.

@ethancaballero
Copy link

ethancaballero commented Jun 11, 2017

try changing epsilon (eps) to 1e-3:
https://github.com/pytorch/pytorch/blob/master/torch/optim/adam.py#L24

@soumith
Copy link
Member

soumith commented Jul 15, 2017

we do have an eps to avoid divide by zero as @ethancaballero pointed out.

@weedwind
Copy link

@xuancong84 Hi, Have you solved this problem? I encountered similar problem. I am wondering how did you solve it?

Thank you very much.

houseroad added a commit to houseroad/pytorch that referenced this issue Jan 29, 2019
…08e7e3

Summary:
Previous import was dc75285d4a1cff9618400164dfdb26c5a1bab70a

Included changes:
- **[15c33c9](onnx/onnx@15c33c9)**: Add ppc64le build (pytorch#1768) <Chin Huang>
- **[198f840](onnx/onnx@198f840)**: Update Broadcasting.md (pytorch#1769) <Verma-Rajat>
- **[60ac95f](onnx/onnx@60ac95f)**: Merge back from release 1.4.1 (pytorch#1767) <Raymond Yang>
- **[a683372](onnx/onnx@a683372)**: Bump up version number for v1.4.0 (pytorch#1761) (pytorch#1763) <Raymond Yang>
- **[dbf3581](onnx/onnx@dbf3581)**: Add TfIdfVectorizer operator to ONNX (pytorch#1721) <Dmitri Smirnov>

Differential Revision: D13858840

fbshipit-source-id: 90b2e21c80de4936507a27fc93d0879128ab4fb7
facebook-github-bot pushed a commit that referenced this issue Jan 29, 2019
…08e7e3 (#16493)

Summary:
Pull Request resolved: #16493

Previous import was dc75285d4a1cff9618400164dfdb26c5a1bab70a

Included changes:
- **[15c33c9](onnx/onnx@15c33c9)**: Add ppc64le build (#1768) <Chin Huang>
- **[198f840](onnx/onnx@198f840)**: Update Broadcasting.md (#1769) <Verma-Rajat>
- **[60ac95f](onnx/onnx@60ac95f)**: Merge back from release 1.4.1 (#1767) <Raymond Yang>
- **[a683372](onnx/onnx@a683372)**: Bump up version number for v1.4.0 (#1761) (#1763) <Raymond Yang>
- **[dbf3581](onnx/onnx@dbf3581)**: Add TfIdfVectorizer operator to ONNX (#1721) <Dmitri Smirnov>

Reviewed By: zrphercule

Differential Revision: D13858840

fbshipit-source-id: 1d00f63f265cc6deed965b92ed00c44f547ff03e
IvanYashchuk pushed a commit to IvanYashchuk/pytorch that referenced this issue Jun 27, 2022
pytorchmergebot pushed a commit that referenced this issue Jul 13, 2022
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/

Code changes includes:

- TransformPropagator refactor: switched to Dijkstra instead of exhaustive enumeration on all possible paths to reduce compilation time on transform propagation;
- Indexing refactor: remove reference tensor creation in all tensor indexing logic (#1690)
- (more) generic grouped grid reduction kernel;
- Minor parser/fuser patches:
  1. zero-dim tensor reduction support
  3. no-op binary removal within fused graph
  4. expand supported in fusion

Squashed commits to WAR github API
Commits that's actually in this PR from the devel branch:

```
a054b3e Refactor TransormPropagator to allow specifying a position and propagating to part of the DAG (#1775)
d67e1cd Indexing refactor stage 1: remove reference tensor creation in all tensor indexing logic (#1690)
1b65299 Issue 1770 (#1774)
35b0427 Avoid compilation errors like below: (#1773)
452c773 Ignore reductions of zero-dim tensors per PyTorch conventions (#1771)
31d6c56 TransformPropagator refactor (#1769)
570c5a8 Merge pull request #1767 from csarofeen/upstream_merge_0621
9d6c3d8 merging upstream 61305cd
0ed815f New TransformPropagator algorithm (#1763)
6c19520 no-op binary removal (#1764)
ec7fa41 Proper propagation of IterType (#1762)
b263562 Fix dimensionality check (#1759)
2d6343f More generic grouped grid reduction kernel (#1740)
64e2b56 [nvfuser] prevent spamming warning message (#77777) (#1758)
0c43162 [nvFuser] Improving bitwise ops support (#77158) (#1757)
b93a147 Parser expand (#1754)
```

RUN_TORCHBENCH: nvfuser
Pull Request resolved: #80355
Approved by: https://github.com/davidberard98
facebook-github-bot pushed a commit that referenced this issue Jul 13, 2022
Summary:
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/

Code changes includes:

- TransformPropagator refactor: switched to Dijkstra instead of exhaustive enumeration on all possible paths to reduce compilation time on transform propagation;
- Indexing refactor: remove reference tensor creation in all tensor indexing logic (#1690)
- (more) generic grouped grid reduction kernel;
- Minor parser/fuser patches:
  1. zero-dim tensor reduction support
  3. no-op binary removal within fused graph
  4. expand supported in fusion

Squashed commits to WAR github API
Commits that's actually in this PR from the devel branch:

```
a054b3e Refactor TransormPropagator to allow specifying a position and propagating to part of the DAG (#1775)
d67e1cd Indexing refactor stage 1: remove reference tensor creation in all tensor indexing logic (#1690)
1b65299 Issue 1770 (#1774)
35b0427 Avoid compilation errors like below: (#1773)
452c773 Ignore reductions of zero-dim tensors per PyTorch conventions (#1771)
31d6c56 TransformPropagator refactor (#1769)
570c5a8 Merge pull request #1767 from csarofeen/upstream_merge_0621
9d6c3d8 merging upstream 61305cd
0ed815f New TransformPropagator algorithm (#1763)
6c19520 no-op binary removal (#1764)
ec7fa41 Proper propagation of IterType (#1762)
b263562 Fix dimensionality check (#1759)
2d6343f More generic grouped grid reduction kernel (#1740)
64e2b56 [nvfuser] prevent spamming warning message (#77777) (#1758)
0c43162 [nvFuser] Improving bitwise ops support (#77158) (#1757)
b93a147 Parser expand (#1754)
```

RUN_TORCHBENCH: nvfuser

Pull Request resolved: #80355

Reviewed By: qihqi

Differential Revision: D37573400

Pulled By: davidberard98

fbshipit-source-id: 52ab68d89ec01ef61f69f5abeb18c9d3a312aa64
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants