-
Notifications
You must be signed in to change notification settings - Fork 22.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add instructional error message for cudnn RNN double backward workaround #33884
Conversation
Mitigates #5261. It's not possible for us to support cudnn RNN double backwards due to limitations in the cudnn API. This PR makes it so that we raise an error message if users try to get the double backward on a cudnn RNN; in the error message we suggest using the non-cudnn RNN. Test Plan: - added some tests to check the error message [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only minor nit, thanks!
std::string("the derivative for '") + name + "' is not implemented"); | ||
template <typename T> | ||
T not_implemented_base(const char* name, const char* reason) { | ||
std::string msg = std::string("the derivative for '") + name + "' is not implemented."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I usually do this with c10::str("the derivative for '", name, "' is not implemented.");
as it looks slightly better.
…ard workaround" Mitigates #5261. It's not possible for us to support cudnn RNN double backwards due to limitations in the cudnn API. This PR makes it so that we raise an error message if users try to get the double backward on a cudnn RNN; in the error message we suggest using the non-cudnn RNN. Test Plan: - added some tests to check the error message [ghstack-poisoned]
Mitigates #5261. It's not possible for us to support cudnn RNN double backwards due to limitations in the cudnn API. This PR makes it so that we raise an error message if users try to get the double backward on a cudnn RNN; in the error message we suggest using the non-cudnn RNN. Test Plan: - added some tests to check the error message ghstack-source-id: d5eb3c1852d2729ff9a8d9cd5d813db82682140d Pull Request resolved: #33884
💊 CircleCI build failures summary and remediationsAs of commit 8907802 (more details on the Dr. CI page):
🕵️ 2 new failures recognized by patternsThe following build failures do not appear to be due to upstream breakages: pytorch_linux_xenial_py3_clang5_mobile_custom_build_static (1/2)Step: "Build" (full log | pattern match details)
|
…ard workaround" Mitigates #5261. It's not possible for us to support cudnn RNN double backwards due to limitations in the cudnn API. This PR makes it so that we raise an error message if users try to get the double backward on a cudnn RNN; in the error message we suggest using the non-cudnn RNN. Test Plan: - added some tests to check the error message Differential Revision: [D20143544](https://our.internmc.facebook.com/intern/diff/D20143544) [ghstack-poisoned]
Mitigates #5261. It's not possible for us to support cudnn RNN double backwards due to limitations in the cudnn API. This PR makes it so that we raise an error message if users try to get the double backward on a cudnn RNN; in the error message we suggest using the non-cudnn RNN. Test Plan: - added some tests to check the error message ghstack-source-id: 4e847612b16b2a0135298c8af2eb131adbb9140b Pull Request resolved: #33884
…ard workaround" Mitigates #5261. It's not possible for us to support cudnn RNN double backwards due to limitations in the cudnn API. This PR makes it so that we raise an error message if users try to get the double backward on a cudnn RNN; in the error message we suggest using the non-cudnn RNN. Test Plan: - added some tests to check the error message Differential Revision: [D20143544](https://our.internmc.facebook.com/intern/diff/D20143544) [ghstack-poisoned]
Mitigates #5261. It's not possible for us to support cudnn RNN double backwards due to limitations in the cudnn API. This PR makes it so that we raise an error message if users try to get the double backward on a cudnn RNN; in the error message we suggest using the non-cudnn RNN. Test Plan: - added some tests to check the error message ghstack-source-id: 7fecbc66391d91a7f1287a396864e5ffd750141f Pull Request resolved: #33884
test_cross_device_reentrant_autograd_cuda seems to consistently fail on the CUDA job for me, but I can't reproduce it locally or while ssh-ing into a CircleCI machine. This will need more investigation. |
Edit: I figured out how to repro the error locally. It required running two tests in a specific order. https://gist.github.com/zou3519/17aa029509c83470aa299666ceab1d6f. It looks like what happening is that raising the double backward error message permanently messes with the "checkpoint state" of the engine. Will dig into this a bit more later. |
Ah, that makes sense, thanks for pointing it out! Yes, the tests I added check for the existence of the error message... we can remove the tests for now as a workaround (leaving this PR untested, lol) |
Yes this bug is pretty nasty for testing... I think you can work around this by running the double backward that fails by hand with a |
…ard workaround" Mitigates #5261. It's not possible for us to support cudnn RNN double backwards due to limitations in the cudnn API. This PR makes it so that we raise an error message if users try to get the double backward on a cudnn RNN; in the error message we suggest using the non-cudnn RNN. Test Plan: - added some tests to check the error message Differential Revision: [D20143544](https://our.internmc.facebook.com/intern/diff/D20143544) [ghstack-poisoned]
Mitigates #5261. It's not possible for us to support cudnn RNN double backwards due to limitations in the cudnn API. This PR makes it so that we raise an error message if users try to get the double backward on a cudnn RNN; in the error message we suggest using the non-cudnn RNN. Test Plan: - added some tests to check the error message ghstack-source-id: 41af886bf3e66c6817041d791a5132fabdeffa32 Pull Request resolved: #33884
Thanks! That worked out great |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updated test!
…ard workaround" Mitigates #5261. It's not possible for us to support cudnn RNN double backwards due to limitations in the cudnn API. This PR makes it so that we raise an error message if users try to get the double backward on a cudnn RNN; in the error message we suggest using the non-cudnn RNN. Test Plan: - added some tests to check the error message Differential Revision: [D20143544](https://our.internmc.facebook.com/intern/diff/D20143544) [ghstack-poisoned]
…ard workaround" Mitigates #5261. It's not possible for us to support cudnn RNN double backwards due to limitations in the cudnn API. This PR makes it so that we raise an error message if users try to get the double backward on a cudnn RNN; in the error message we suggest using the non-cudnn RNN. Test Plan: - added some tests to check the error message Differential Revision: [D20143544](https://our.internmc.facebook.com/intern/diff/D20143544) [ghstack-poisoned]
…ard workaround" Mitigates #5261. It's not possible for us to support cudnn RNN double backwards due to limitations in the cudnn API. This PR makes it so that we raise an error message if users try to get the double backward on a cudnn RNN; in the error message we suggest using the non-cudnn RNN. Test Plan: - added some tests to check the error message Differential Revision: [D20143544](https://our.internmc.facebook.com/intern/diff/D20143544) [ghstack-poisoned]
…ard workaround" Mitigates #5261. It's not possible for us to support cudnn RNN double backwards due to limitations in the cudnn API. This PR makes it so that we raise an error message if users try to get the double backward on a cudnn RNN; in the error message we suggest using the non-cudnn RNN. Test Plan: - added some tests to check the error message Differential Revision: [D20143544](https://our.internmc.facebook.com/intern/diff/D20143544) [ghstack-poisoned]
Mitigates #5261. It's not possible for us to support cudnn RNN double backwards due to limitations in the cudnn API. This PR makes it so that we raise an error message if users try to get the double backward on a cudnn RNN; in the error message we suggest using the non-cudnn RNN. Test Plan: - added some tests to check the error message ghstack-source-id: ea38bf6b2baac59315d9a40e9857edf0e5f8d64a Pull Request resolved: #33884
…ard workaround" Mitigates #5261. It's not possible for us to support cudnn RNN double backwards due to limitations in the cudnn API. This PR makes it so that we raise an error message if users try to get the double backward on a cudnn RNN; in the error message we suggest using the non-cudnn RNN. Test Plan: - added some tests to check the error message Differential Revision: [D20143544](https://our.internmc.facebook.com/intern/diff/D20143544) [ghstack-poisoned]
Mitigates #5261. It's not possible for us to support cudnn RNN double backwards due to limitations in the cudnn API. This PR makes it so that we raise an error message if users try to get the double backward on a cudnn RNN; in the error message we suggest using the non-cudnn RNN. Test Plan: - added some testing to check the error message ghstack-source-id: 52a85733add28a80fff68af86d6fc8bf8dd99e58 Pull Request resolved: #33884
Codecov Report
@@ Coverage Diff @@
## gh/zou3519/238/base #33884 +/- ##
=======================================================
- Coverage 80.66% 80.66% -0.01%
=======================================================
Files 1913 1913
Lines 208058 208064 +6
=======================================================
- Hits 167833 167825 -8
- Misses 40225 40239 +14 |
Stack from ghstack:
Mitigates #5261.
It's not possible for us to support cudnn RNN double backwards due to
limitations in the cudnn API. This PR makes it so that we raise an error
message if users try to get the double backward on a cudnn RNN; in the
error message we suggest using the non-cudnn RNN.
Test Plan:
Differential Revision: D20143544