Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flaky profiler and test_callback_simple RPC tests #41287

Closed
wants to merge 6 commits into from

Conversation

rohan-varma
Copy link
Member

@rohan-varma rohan-varma commented Jul 10, 2020

Stack from ghstack:

Profiler tests that test profiling with builtin functions and test_callback_simple test has been broken for a while. This diff fixes that by preferring c10 ops to non-c10 ops in our operation matching logic.

The result of this is that these ops go through the c10 dispatch and thus have profiling enabled. For test_callback_simple this results in the effect that we choose aten::add.Tensor over aten::add.Int which fixes the type issue.

Differential Revision: D22489197

NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on Phabricator!

Profiler tests that test profiling with builtin functions and `test_callback_simple` test has been broken for a while. This diff fixes that by preferring c10 ops to non-c10 ops in our operation matching logic.

The result of this is that these ops go through the c10 dispatch and thus have profiling enabled. For `test_callback_simple` this results in the effect that we choose `aten::add.Tensor` over `aten::add.Int` which fixes the type issue.

Differential Revision: [D22489197](https://our.internmc.facebook.com/intern/diff/D22489197/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22489197/)!

[ghstack-poisoned]
Profiler tests that test profiling with builtin functions and `test_callback_simple` test has been broken for a while. This diff fixes that by preferring c10 ops to non-c10 ops in our operation matching logic.

The result of this is that these ops go through the c10 dispatch and thus have profiling enabled. For `test_callback_simple` this results in the effect that we choose `aten::add.Tensor` over `aten::add.Int` which fixes the type issue.

Differential Revision: [D22489197](https://our.internmc.facebook.com/intern/diff/D22489197/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22489197/)!

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Jul 11, 2020
Pull Request resolved: #41287

Profiler tests that test profiling with builtin functions and `test_callback_simple` test has been broken for a while. This diff fixes that by preferring c10 ops to non-c10 ops in our operation matching logic.

The result of this is that these ops go through the c10 dispatch and thus have profiling enabled. For `test_callback_simple` this results in the effect that we choose `aten::add.Tensor` over `aten::add.Int` which fixes the type issue.

Differential Revision: [D22489197](https://our.internmc.facebook.com/intern/diff/D22489197/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22489197/)!
ghstack-source-id: 107572222
@rohan-varma rohan-varma requested a review from lw July 13, 2020 20:18
Copy link
Contributor

@lw lw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with the code but the change is simple and looks safe as it only changes the order so to me it looks good! But maybe someone who knows the JIT dispatch better can provide more insight...

// torch.add) so a worst-case linear search should not incur significant
// extra overhead.
if (op->isC10Op()) {
return op;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing! And thanks for digging into the bottom of this error!

This op-matching logic does need a thorough revamp.

Profiler tests that test profiling with builtin functions and `test_callback_simple` test has been broken for a while. This diff fixes that by preferring c10 ops to non-c10 ops in our operation matching logic.

The result of this is that these ops go through the c10 dispatch and thus have profiling enabled. For `test_callback_simple` this results in the effect that we choose `aten::add.Tensor` over `aten::add.Int` which fixes the type issue.

Differential Revision: [D22489197](https://our.internmc.facebook.com/intern/diff/D22489197/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22489197/)!

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Jul 13, 2020
Pull Request resolved: #41287

Profiler tests that test profiling with builtin functions and `test_callback_simple` test has been broken for a while. This diff fixes that by preferring c10 ops to non-c10 ops in our operation matching logic.

The result of this is that these ops go through the c10 dispatch and thus have profiling enabled. For `test_callback_simple` this results in the effect that we choose `aten::add.Tensor` over `aten::add.Int` which fixes the type issue.

Differential Revision: [D22489197](https://our.internmc.facebook.com/intern/diff/D22489197/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22489197/)!
ghstack-source-id: 107680843
Profiler tests that test profiling with builtin functions and `test_callback_simple` test has been broken for a while. This diff fixes that by preferring c10 ops to non-c10 ops in our operation matching logic.

The result of this is that these ops go through the c10 dispatch and thus have profiling enabled. For `test_callback_simple` this results in the effect that we choose `aten::add.Tensor` over `aten::add.Int` which fixes the type issue.

Differential Revision: [D22489197](https://our.internmc.facebook.com/intern/diff/D22489197/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22489197/)!

[ghstack-poisoned]
@dr-ci
Copy link

dr-ci bot commented Jul 14, 2020

💊 CI failures summary and remediations

As of commit ea9ed34 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 6 times.

Profiler tests that test profiling with builtin functions and `test_callback_simple` test has been broken for a while. This diff fixes that by preferring c10 ops to non-c10 ops in our operation matching logic.

The result of this is that these ops go through the c10 dispatch and thus have profiling enabled. For `test_callback_simple` this results in the effect that we choose `aten::add.Tensor` over `aten::add.Int` which fixes the type issue.

Differential Revision: [D22489197](https://our.internmc.facebook.com/intern/diff/D22489197/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22489197/)!

[ghstack-poisoned]
Profiler tests that test profiling with builtin functions and `test_callback_simple` test has been broken for a while. This diff fixes that by preferring c10 ops to non-c10 ops in our operation matching logic.

The result of this is that these ops go through the c10 dispatch and thus have profiling enabled. For `test_callback_simple` this results in the effect that we choose `aten::add.Tensor` over `aten::add.Int` which fixes the type issue.

Differential Revision: [D22489197](https://our.internmc.facebook.com/intern/diff/D22489197/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22489197/)!

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Jul 14, 2020
Pull Request resolved: #41287

Profiler tests that test profiling with builtin functions and `test_callback_simple` test has been broken for a while. This diff fixes that by preferring c10 ops to non-c10 ops in our operation matching logic.

The result of this is that these ops go through the c10 dispatch and thus have profiling enabled. For `test_callback_simple` this results in the effect that we choose `aten::add.Tensor` over `aten::add.Int` which fixes the type issue.

Differential Revision: [D22489197](https://our.internmc.facebook.com/intern/diff/D22489197/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22489197/)!
ghstack-source-id: 107697506
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in fd03290.

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in fd03290.

@facebook-github-bot facebook-github-bot deleted the gh/rohan-varma/148/head branch July 18, 2020 14:16
rohan-varma added a commit that referenced this pull request Dec 8, 2020
As per the title, we should generally not have exception swalling and
this commit makes it so that if there is a true error in JIT operator
resolution, it is propagated back to the RPC callee and we don't silently
swallow any other exceptions that may happen. Swallowing the exceptions
previously resulted in hard to debug issues such as unexpected ops showing up
in profiler, and flaky tests which were fixed by
#41287

Added a unittest that validates the error that comes from `jit/pybind_utils.h`.

Differential Revision: [D25392905](https://our.internmc.facebook.com/intern/diff/D25392905/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Dec 8, 2020
…wing"

As per the title, we should generally not have exception swalling and
this commit makes it so that if there is a true error in JIT operator
resolution, it is propagated back to the RPC callee and we don't silently
swallow any other exceptions that may happen. Swallowing the exceptions
previously resulted in hard to debug issues such as unexpected ops showing up
in profiler, and flaky tests which were fixed by
#41287

Added a unittest that validates the error that comes from `jit/pybind_utils.h`.

Differential Revision: [D25392905](https://our.internmc.facebook.com/intern/diff/D25392905/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Dec 8, 2020
Pull Request resolved: #49009

As per the title, we should generally not have exception swalling and
this commit makes it so that if there is a true error in JIT operator
resolution, it is propagated back to the RPC callee and we don't silently
swallow any other exceptions that may happen. Swallowing the exceptions
previously resulted in hard to debug issues such as unexpected ops showing up
in profiler, and flaky tests which were fixed by
#41287

Added a unittest that validates the error that comes from `jit/pybind_utils.h`.
ghstack-source-id: 118076202

Differential Revision: [D25392905](https://our.internmc.facebook.com/intern/diff/D25392905/)
rohan-varma added a commit that referenced this pull request Dec 8, 2020
…wing"

As per the title, we should generally not have exception swalling and
this commit makes it so that if there is a true error in JIT operator
resolution, it is propagated back to the RPC callee and we don't silently
swallow any other exceptions that may happen. Swallowing the exceptions
previously resulted in hard to debug issues such as unexpected ops showing up
in profiler, and flaky tests which were fixed by
#41287

Added a unittest that validates the error that comes from `jit/pybind_utils.h`.

Differential Revision: [D25392905](https://our.internmc.facebook.com/intern/diff/D25392905/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Dec 8, 2020
…wing"

As per the title, we should generally not have exception swalling and
this commit makes it so that if there is a true error in JIT operator
resolution, it is propagated back to the RPC callee and we don't silently
swallow any other exceptions that may happen. Swallowing the exceptions
previously resulted in hard to debug issues such as unexpected ops showing up
in profiler, and flaky tests which were fixed by
#41287

Added a unittest that validates the error that comes from `jit/pybind_utils.h`.

Differential Revision: [D25392905](https://our.internmc.facebook.com/intern/diff/D25392905/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Dec 8, 2020
Pull Request resolved: #49009

As per the title, we should generally not have exception swalling and
this commit makes it so that if there is a true error in JIT operator
resolution, it is propagated back to the RPC callee and we don't silently
swallow any other exceptions that may happen. Swallowing the exceptions
previously resulted in hard to debug issues such as unexpected ops showing up
in profiler, and flaky tests which were fixed by
#41287

Added a unittest that validates the error that comes from `jit/pybind_utils.h`.
ghstack-source-id: 118122150

Differential Revision: [D25392905](https://our.internmc.facebook.com/intern/diff/D25392905/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D25392905/)!
rohan-varma added a commit that referenced this pull request Dec 12, 2020
…wing"

As per the title, we should generally not have exception swalling and
this commit makes it so that if there is a true error in JIT operator
resolution, it is propagated back to the RPC callee and we don't silently
swallow any other exceptions that may happen. Swallowing the exceptions
previously resulted in hard to debug issues such as unexpected ops showing up
in profiler, and flaky tests which were fixed by
#41287

Added a unittest that validates the error that comes from `jit/pybind_utils.h`.

Differential Revision: [D25392905](https://our.internmc.facebook.com/intern/diff/D25392905/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Dec 12, 2020
Pull Request resolved: #49009

As per the title, we should generally not have exception swalling and
this commit makes it so that if there is a true error in JIT operator
resolution, it is propagated back to the RPC callee and we don't silently
swallow any other exceptions that may happen. Swallowing the exceptions
previously resulted in hard to debug issues such as unexpected ops showing up
in profiler, and flaky tests which were fixed by
#41287

Added a unittest that validates the error that comes from `jit/pybind_utils.h`.
ghstack-source-id: 118457676

Differential Revision: [D25392905](https://our.internmc.facebook.com/intern/diff/D25392905/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D25392905/)!
rohan-varma added a commit that referenced this pull request Dec 16, 2020
…wing"

As per the title, we should generally not have exception swalling and
this commit makes it so that if there is a true error in JIT operator
resolution, it is propagated back to the RPC callee and we don't silently
swallow any other exceptions that may happen. Swallowing the exceptions
previously resulted in hard to debug issues such as unexpected ops showing up
in profiler, and flaky tests which were fixed by
#41287

Added a unittest that validates the error that comes from `jit/pybind_utils.h`.

Differential Revision: [D25392905](https://our.internmc.facebook.com/intern/diff/D25392905/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Dec 16, 2020
…wing"

As per the title, we should generally not have exception swalling and
this commit makes it so that if there is a true error in JIT operator
resolution, it is propagated back to the RPC callee and we don't silently
swallow any other exceptions that may happen. Swallowing the exceptions
previously resulted in hard to debug issues such as unexpected ops showing up
in profiler, and flaky tests which were fixed by
#41287

Added a unittest that validates the error that comes from `jit/pybind_utils.h`.

Differential Revision: [D25392905](https://our.internmc.facebook.com/intern/diff/D25392905/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Dec 16, 2020
…wing"

As per the title, we should generally not have exception swalling and
this commit makes it so that if there is a true error in JIT operator
resolution, it is propagated back to the RPC callee and we don't silently
swallow any other exceptions that may happen. Swallowing the exceptions
previously resulted in hard to debug issues such as unexpected ops showing up
in profiler, and flaky tests which were fixed by
#41287

Added a unittest that validates the error that comes from `jit/pybind_utils.h`.

Differential Revision: [D25392905](https://our.internmc.facebook.com/intern/diff/D25392905/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Dec 16, 2020
Pull Request resolved: #49009

As per the title, we should generally not have exception swalling and
this commit makes it so that if there is a true error in JIT operator
resolution, it is propagated back to the RPC callee and we don't silently
swallow any other exceptions that may happen. Swallowing the exceptions
previously resulted in hard to debug issues such as unexpected ops showing up
in profiler, and flaky tests which were fixed by
#41287

Added a unittest that validates the error that comes from `jit/pybind_utils.h`.
ghstack-source-id: 118746790

Differential Revision: [D25392905](https://our.internmc.facebook.com/intern/diff/D25392905/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D25392905/)!
rohan-varma added a commit that referenced this pull request Dec 16, 2020
…wing"

As per the title, we should generally not have exception swalling and
this commit makes it so that if there is a true error in JIT operator
resolution, it is propagated back to the RPC callee and we don't silently
swallow any other exceptions that may happen. Swallowing the exceptions
previously resulted in hard to debug issues such as unexpected ops showing up
in profiler, and flaky tests which were fixed by
#41287

Added a unittest that validates the error that comes from `jit/pybind_utils.h`.

Differential Revision: [D25392905](https://our.internmc.facebook.com/intern/diff/D25392905/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Dec 16, 2020
Pull Request resolved: #49009

As per the title, we should generally not have exception swalling and
this commit makes it so that if there is a true error in JIT operator
resolution, it is propagated back to the RPC callee and we don't silently
swallow any other exceptions that may happen. Swallowing the exceptions
previously resulted in hard to debug issues such as unexpected ops showing up
in profiler, and flaky tests which were fixed by
#41287

Added a unittest that validates the error that comes from `jit/pybind_utils.h`.
ghstack-source-id: 118747511

Differential Revision: [D25392905](https://our.internmc.facebook.com/intern/diff/D25392905/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D25392905/)!
rohan-varma added a commit that referenced this pull request Dec 17, 2020
…wing"

As per the title, we should generally not have exception swalling and
this commit makes it so that if there is a true error in JIT operator
resolution, it is propagated back to the RPC callee and we don't silently
swallow any other exceptions that may happen. Swallowing the exceptions
previously resulted in hard to debug issues such as unexpected ops showing up
in profiler, and flaky tests which were fixed by
#41287

Added a unittest that validates the error that comes from `jit/pybind_utils.h`.

Differential Revision: [D25392905](https://our.internmc.facebook.com/intern/diff/D25392905/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Dec 17, 2020
Pull Request resolved: #49009

As per the title, we should generally not have exception swalling and
this commit makes it so that if there is a true error in JIT operator
resolution, it is propagated back to the RPC callee and we don't silently
swallow any other exceptions that may happen. Swallowing the exceptions
previously resulted in hard to debug issues such as unexpected ops showing up
in profiler, and flaky tests which were fixed by
#41287

Added a unittest that validates the error that comes from `jit/pybind_utils.h`.
ghstack-source-id: 118794661

Differential Revision: [D25392905](https://our.internmc.facebook.com/intern/diff/D25392905/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D25392905/)!
facebook-github-bot pushed a commit that referenced this pull request Dec 17, 2020
Summary:
Pull Request resolved: #49009

As per the title, we should generally not have exception swalling and
this commit makes it so that if there is a true error in JIT operator
resolution, it is propagated back to the RPC callee and we don't silently
swallow any other exceptions that may happen. Swallowing the exceptions
previously resulted in hard to debug issues such as unexpected ops showing up
in profiler, and flaky tests which were fixed by
#41287

Added a unittest that validates the error that comes from `jit/pybind_utils.h`.
ghstack-source-id: 118794661

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D25392905

fbshipit-source-id: 6f93251635740bcf902824548b2bc6f9249be5f0
hwangdeyu pushed a commit to hwangdeyu/pytorch that referenced this pull request Jan 6, 2021
…ch#49009)

Summary:
Pull Request resolved: pytorch#49009

As per the title, we should generally not have exception swalling and
this commit makes it so that if there is a true error in JIT operator
resolution, it is propagated back to the RPC callee and we don't silently
swallow any other exceptions that may happen. Swallowing the exceptions
previously resulted in hard to debug issues such as unexpected ops showing up
in profiler, and flaky tests which were fixed by
pytorch#41287

Added a unittest that validates the error that comes from `jit/pybind_utils.h`.
ghstack-source-id: 118794661

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D25392905

fbshipit-source-id: 6f93251635740bcf902824548b2bc6f9249be5f0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants