-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[PT2][Optimus][Reliability]Fix a bug in gradients computation for runtime numeric check #118105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/118105
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (2 Unrelated Failures)As of commit 2f16436 with merge base 5b671ce ( UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This pull request was exported from Phabricator. Differential Revision: D53011463 |
37c94a3
to
aa35ad1
Compare
…time numeric check (pytorch#118105) Summary: We observed the following error when launch e2e AFOC model test ``` RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward. ``` f524190245 Test Plan: 'training_platform:c640e3f93574472da8894d9a0365f6a0' f524376722 P1086047304 Differential Revision: D53011463
This pull request was exported from Phabricator. Differential Revision: D53011463 |
…time numeric check (pytorch#118105) Summary: We observed the following error when launch e2e AFOC model test ``` RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward. ``` f524190245 Test Plan: 'training_platform:c640e3f93574472da8894d9a0365f6a0' f524376722 P1086047304 Reviewed By: jackiexu1992 Differential Revision: D53011463
aa35ad1
to
2f16436
Compare
This pull request was exported from Phabricator. Differential Revision: D53011463 |
@pytorchbot merge -f 'Landed internally' (Initiating merge automatically since Phabricator Diff has merged, using force because this PR might not pass merge_rules.json but landed internally) |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Summary:
We observed the following error when launch e2e AFOC model test
f524190245
Differential Revision: D53011463
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler