-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix flaky profiler and test_callback_simple RPC tests #41287
Conversation
Profiler tests that test profiling with builtin functions and `test_callback_simple` test has been broken for a while. This diff fixes that by preferring c10 ops to non-c10 ops in our operation matching logic. The result of this is that these ops go through the c10 dispatch and thus have profiling enabled. For `test_callback_simple` this results in the effect that we choose `aten::add.Tensor` over `aten::add.Int` which fixes the type issue. Differential Revision: [D22489197](https://our.internmc.facebook.com/intern/diff/D22489197/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22489197/)! [ghstack-poisoned]
Profiler tests that test profiling with builtin functions and `test_callback_simple` test has been broken for a while. This diff fixes that by preferring c10 ops to non-c10 ops in our operation matching logic. The result of this is that these ops go through the c10 dispatch and thus have profiling enabled. For `test_callback_simple` this results in the effect that we choose `aten::add.Tensor` over `aten::add.Int` which fixes the type issue. Differential Revision: [D22489197](https://our.internmc.facebook.com/intern/diff/D22489197/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22489197/)! [ghstack-poisoned]
Pull Request resolved: #41287 Profiler tests that test profiling with builtin functions and `test_callback_simple` test has been broken for a while. This diff fixes that by preferring c10 ops to non-c10 ops in our operation matching logic. The result of this is that these ops go through the c10 dispatch and thus have profiling enabled. For `test_callback_simple` this results in the effect that we choose `aten::add.Tensor` over `aten::add.Int` which fixes the type issue. Differential Revision: [D22489197](https://our.internmc.facebook.com/intern/diff/D22489197/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22489197/)! ghstack-source-id: 107572222
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with the code but the change is simple and looks safe as it only changes the order so to me it looks good! But maybe someone who knows the JIT dispatch better can provide more insight...
// torch.add) so a worst-case linear search should not incur significant | ||
// extra overhead. | ||
if (op->isC10Op()) { | ||
return op; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing! And thanks for digging into the bottom of this error!
This op-matching logic does need a thorough revamp.
Profiler tests that test profiling with builtin functions and `test_callback_simple` test has been broken for a while. This diff fixes that by preferring c10 ops to non-c10 ops in our operation matching logic. The result of this is that these ops go through the c10 dispatch and thus have profiling enabled. For `test_callback_simple` this results in the effect that we choose `aten::add.Tensor` over `aten::add.Int` which fixes the type issue. Differential Revision: [D22489197](https://our.internmc.facebook.com/intern/diff/D22489197/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22489197/)! [ghstack-poisoned]
Pull Request resolved: #41287 Profiler tests that test profiling with builtin functions and `test_callback_simple` test has been broken for a while. This diff fixes that by preferring c10 ops to non-c10 ops in our operation matching logic. The result of this is that these ops go through the c10 dispatch and thus have profiling enabled. For `test_callback_simple` this results in the effect that we choose `aten::add.Tensor` over `aten::add.Int` which fixes the type issue. Differential Revision: [D22489197](https://our.internmc.facebook.com/intern/diff/D22489197/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22489197/)! ghstack-source-id: 107680843
Profiler tests that test profiling with builtin functions and `test_callback_simple` test has been broken for a while. This diff fixes that by preferring c10 ops to non-c10 ops in our operation matching logic. The result of this is that these ops go through the c10 dispatch and thus have profiling enabled. For `test_callback_simple` this results in the effect that we choose `aten::add.Tensor` over `aten::add.Int` which fixes the type issue. Differential Revision: [D22489197](https://our.internmc.facebook.com/intern/diff/D22489197/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22489197/)! [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit ea9ed34 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 6 times. |
Profiler tests that test profiling with builtin functions and `test_callback_simple` test has been broken for a while. This diff fixes that by preferring c10 ops to non-c10 ops in our operation matching logic. The result of this is that these ops go through the c10 dispatch and thus have profiling enabled. For `test_callback_simple` this results in the effect that we choose `aten::add.Tensor` over `aten::add.Int` which fixes the type issue. Differential Revision: [D22489197](https://our.internmc.facebook.com/intern/diff/D22489197/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22489197/)! [ghstack-poisoned]
Profiler tests that test profiling with builtin functions and `test_callback_simple` test has been broken for a while. This diff fixes that by preferring c10 ops to non-c10 ops in our operation matching logic. The result of this is that these ops go through the c10 dispatch and thus have profiling enabled. For `test_callback_simple` this results in the effect that we choose `aten::add.Tensor` over `aten::add.Int` which fixes the type issue. Differential Revision: [D22489197](https://our.internmc.facebook.com/intern/diff/D22489197/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22489197/)! [ghstack-poisoned]
Pull Request resolved: #41287 Profiler tests that test profiling with builtin functions and `test_callback_simple` test has been broken for a while. This diff fixes that by preferring c10 ops to non-c10 ops in our operation matching logic. The result of this is that these ops go through the c10 dispatch and thus have profiling enabled. For `test_callback_simple` this results in the effect that we choose `aten::add.Tensor` over `aten::add.Int` which fixes the type issue. Differential Revision: [D22489197](https://our.internmc.facebook.com/intern/diff/D22489197/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22489197/)! ghstack-source-id: 107697506
This pull request has been merged in fd03290. |
This pull request has been merged in fd03290. |
As per the title, we should generally not have exception swalling and this commit makes it so that if there is a true error in JIT operator resolution, it is propagated back to the RPC callee and we don't silently swallow any other exceptions that may happen. Swallowing the exceptions previously resulted in hard to debug issues such as unexpected ops showing up in profiler, and flaky tests which were fixed by #41287 Added a unittest that validates the error that comes from `jit/pybind_utils.h`. Differential Revision: [D25392905](https://our.internmc.facebook.com/intern/diff/D25392905/) [ghstack-poisoned]
…wing" As per the title, we should generally not have exception swalling and this commit makes it so that if there is a true error in JIT operator resolution, it is propagated back to the RPC callee and we don't silently swallow any other exceptions that may happen. Swallowing the exceptions previously resulted in hard to debug issues such as unexpected ops showing up in profiler, and flaky tests which were fixed by #41287 Added a unittest that validates the error that comes from `jit/pybind_utils.h`. Differential Revision: [D25392905](https://our.internmc.facebook.com/intern/diff/D25392905/) [ghstack-poisoned]
Pull Request resolved: #49009 As per the title, we should generally not have exception swalling and this commit makes it so that if there is a true error in JIT operator resolution, it is propagated back to the RPC callee and we don't silently swallow any other exceptions that may happen. Swallowing the exceptions previously resulted in hard to debug issues such as unexpected ops showing up in profiler, and flaky tests which were fixed by #41287 Added a unittest that validates the error that comes from `jit/pybind_utils.h`. ghstack-source-id: 118076202 Differential Revision: [D25392905](https://our.internmc.facebook.com/intern/diff/D25392905/)
…wing" As per the title, we should generally not have exception swalling and this commit makes it so that if there is a true error in JIT operator resolution, it is propagated back to the RPC callee and we don't silently swallow any other exceptions that may happen. Swallowing the exceptions previously resulted in hard to debug issues such as unexpected ops showing up in profiler, and flaky tests which were fixed by #41287 Added a unittest that validates the error that comes from `jit/pybind_utils.h`. Differential Revision: [D25392905](https://our.internmc.facebook.com/intern/diff/D25392905/) [ghstack-poisoned]
…wing" As per the title, we should generally not have exception swalling and this commit makes it so that if there is a true error in JIT operator resolution, it is propagated back to the RPC callee and we don't silently swallow any other exceptions that may happen. Swallowing the exceptions previously resulted in hard to debug issues such as unexpected ops showing up in profiler, and flaky tests which were fixed by #41287 Added a unittest that validates the error that comes from `jit/pybind_utils.h`. Differential Revision: [D25392905](https://our.internmc.facebook.com/intern/diff/D25392905/) [ghstack-poisoned]
Pull Request resolved: #49009 As per the title, we should generally not have exception swalling and this commit makes it so that if there is a true error in JIT operator resolution, it is propagated back to the RPC callee and we don't silently swallow any other exceptions that may happen. Swallowing the exceptions previously resulted in hard to debug issues such as unexpected ops showing up in profiler, and flaky tests which were fixed by #41287 Added a unittest that validates the error that comes from `jit/pybind_utils.h`. ghstack-source-id: 118122150 Differential Revision: [D25392905](https://our.internmc.facebook.com/intern/diff/D25392905/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D25392905/)!
…wing" As per the title, we should generally not have exception swalling and this commit makes it so that if there is a true error in JIT operator resolution, it is propagated back to the RPC callee and we don't silently swallow any other exceptions that may happen. Swallowing the exceptions previously resulted in hard to debug issues such as unexpected ops showing up in profiler, and flaky tests which were fixed by #41287 Added a unittest that validates the error that comes from `jit/pybind_utils.h`. Differential Revision: [D25392905](https://our.internmc.facebook.com/intern/diff/D25392905/) [ghstack-poisoned]
Pull Request resolved: #49009 As per the title, we should generally not have exception swalling and this commit makes it so that if there is a true error in JIT operator resolution, it is propagated back to the RPC callee and we don't silently swallow any other exceptions that may happen. Swallowing the exceptions previously resulted in hard to debug issues such as unexpected ops showing up in profiler, and flaky tests which were fixed by #41287 Added a unittest that validates the error that comes from `jit/pybind_utils.h`. ghstack-source-id: 118457676 Differential Revision: [D25392905](https://our.internmc.facebook.com/intern/diff/D25392905/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D25392905/)!
…wing" As per the title, we should generally not have exception swalling and this commit makes it so that if there is a true error in JIT operator resolution, it is propagated back to the RPC callee and we don't silently swallow any other exceptions that may happen. Swallowing the exceptions previously resulted in hard to debug issues such as unexpected ops showing up in profiler, and flaky tests which were fixed by #41287 Added a unittest that validates the error that comes from `jit/pybind_utils.h`. Differential Revision: [D25392905](https://our.internmc.facebook.com/intern/diff/D25392905/) [ghstack-poisoned]
…wing" As per the title, we should generally not have exception swalling and this commit makes it so that if there is a true error in JIT operator resolution, it is propagated back to the RPC callee and we don't silently swallow any other exceptions that may happen. Swallowing the exceptions previously resulted in hard to debug issues such as unexpected ops showing up in profiler, and flaky tests which were fixed by #41287 Added a unittest that validates the error that comes from `jit/pybind_utils.h`. Differential Revision: [D25392905](https://our.internmc.facebook.com/intern/diff/D25392905/) [ghstack-poisoned]
…wing" As per the title, we should generally not have exception swalling and this commit makes it so that if there is a true error in JIT operator resolution, it is propagated back to the RPC callee and we don't silently swallow any other exceptions that may happen. Swallowing the exceptions previously resulted in hard to debug issues such as unexpected ops showing up in profiler, and flaky tests which were fixed by #41287 Added a unittest that validates the error that comes from `jit/pybind_utils.h`. Differential Revision: [D25392905](https://our.internmc.facebook.com/intern/diff/D25392905/) [ghstack-poisoned]
Pull Request resolved: #49009 As per the title, we should generally not have exception swalling and this commit makes it so that if there is a true error in JIT operator resolution, it is propagated back to the RPC callee and we don't silently swallow any other exceptions that may happen. Swallowing the exceptions previously resulted in hard to debug issues such as unexpected ops showing up in profiler, and flaky tests which were fixed by #41287 Added a unittest that validates the error that comes from `jit/pybind_utils.h`. ghstack-source-id: 118746790 Differential Revision: [D25392905](https://our.internmc.facebook.com/intern/diff/D25392905/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D25392905/)!
…wing" As per the title, we should generally not have exception swalling and this commit makes it so that if there is a true error in JIT operator resolution, it is propagated back to the RPC callee and we don't silently swallow any other exceptions that may happen. Swallowing the exceptions previously resulted in hard to debug issues such as unexpected ops showing up in profiler, and flaky tests which were fixed by #41287 Added a unittest that validates the error that comes from `jit/pybind_utils.h`. Differential Revision: [D25392905](https://our.internmc.facebook.com/intern/diff/D25392905/) [ghstack-poisoned]
Pull Request resolved: #49009 As per the title, we should generally not have exception swalling and this commit makes it so that if there is a true error in JIT operator resolution, it is propagated back to the RPC callee and we don't silently swallow any other exceptions that may happen. Swallowing the exceptions previously resulted in hard to debug issues such as unexpected ops showing up in profiler, and flaky tests which were fixed by #41287 Added a unittest that validates the error that comes from `jit/pybind_utils.h`. ghstack-source-id: 118747511 Differential Revision: [D25392905](https://our.internmc.facebook.com/intern/diff/D25392905/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D25392905/)!
…wing" As per the title, we should generally not have exception swalling and this commit makes it so that if there is a true error in JIT operator resolution, it is propagated back to the RPC callee and we don't silently swallow any other exceptions that may happen. Swallowing the exceptions previously resulted in hard to debug issues such as unexpected ops showing up in profiler, and flaky tests which were fixed by #41287 Added a unittest that validates the error that comes from `jit/pybind_utils.h`. Differential Revision: [D25392905](https://our.internmc.facebook.com/intern/diff/D25392905/) [ghstack-poisoned]
Pull Request resolved: #49009 As per the title, we should generally not have exception swalling and this commit makes it so that if there is a true error in JIT operator resolution, it is propagated back to the RPC callee and we don't silently swallow any other exceptions that may happen. Swallowing the exceptions previously resulted in hard to debug issues such as unexpected ops showing up in profiler, and flaky tests which were fixed by #41287 Added a unittest that validates the error that comes from `jit/pybind_utils.h`. ghstack-source-id: 118794661 Differential Revision: [D25392905](https://our.internmc.facebook.com/intern/diff/D25392905/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D25392905/)!
Summary: Pull Request resolved: #49009 As per the title, we should generally not have exception swalling and this commit makes it so that if there is a true error in JIT operator resolution, it is propagated back to the RPC callee and we don't silently swallow any other exceptions that may happen. Swallowing the exceptions previously resulted in hard to debug issues such as unexpected ops showing up in profiler, and flaky tests which were fixed by #41287 Added a unittest that validates the error that comes from `jit/pybind_utils.h`. ghstack-source-id: 118794661 Test Plan: CI Reviewed By: mrshenli Differential Revision: D25392905 fbshipit-source-id: 6f93251635740bcf902824548b2bc6f9249be5f0
…ch#49009) Summary: Pull Request resolved: pytorch#49009 As per the title, we should generally not have exception swalling and this commit makes it so that if there is a true error in JIT operator resolution, it is propagated back to the RPC callee and we don't silently swallow any other exceptions that may happen. Swallowing the exceptions previously resulted in hard to debug issues such as unexpected ops showing up in profiler, and flaky tests which were fixed by pytorch#41287 Added a unittest that validates the error that comes from `jit/pybind_utils.h`. ghstack-source-id: 118794661 Test Plan: CI Reviewed By: mrshenli Differential Revision: D25392905 fbshipit-source-id: 6f93251635740bcf902824548b2bc6f9249be5f0
Stack from ghstack:
Profiler tests that test profiling with builtin functions and
test_callback_simple
test has been broken for a while. This diff fixes that by preferring c10 ops to non-c10 ops in our operation matching logic.The result of this is that these ops go through the c10 dispatch and thus have profiling enabled. For
test_callback_simple
this results in the effect that we chooseaten::add.Tensor
overaten::add.Int
which fixes the type issue.Differential Revision: D22489197
NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on Phabricator!