Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[nnc] Test cases for uneven split + reorder #53091

Closed
wants to merge 2 commits into from

Conversation

bertmaher
Copy link
Contributor

@bertmaher bertmaher commented Mar 2, 2021

Stack from ghstack:

Split with tail followed by reorder causes a segfault in NNC
Split with mask followed by reorder generates invalid code that writes out of
bounds

Differential Revision: D26746254

Split with tail followed by reorder causes a segfault in NNC
Split with mask followed by reorder generates invalid code that writes out of
bounds

Differential Revision: [D26746254](https://our.internmc.facebook.com/intern/diff/D26746254/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Mar 2, 2021

💊 CI failures summary and remediations

As of commit 55f5d95 (more details on the Dr. CI page):



🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_test (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Mar 02 20:52:47 [E request_callback_no_python.cpp:656] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Mar 02 20:52:47 At:
Mar 02 20:52:47   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(122): serialize
Mar 02 20:52:47   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(175): serialize
Mar 02 20:52:47 
Mar 02 20:52:47 [E request_callback_no_python.cpp:656] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Mar 02 20:52:47 
Mar 02 20:52:47 At:
Mar 02 20:52:47   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(122): serialize
Mar 02 20:52:47   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(175): serialize
Mar 02 20:52:47 
Mar 02 20:52:47 [E request_callback_no_python.cpp:656] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Mar 02 20:52:47 
Mar 02 20:52:47 At:
Mar 02 20:52:47   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(122): serialize
Mar 02 20:52:47   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(175): serialize
Mar 02 20:52:47 
Mar 02 20:52:47 ok (2.045s)
Mar 02 20:52:49   test_return_future_remote (__main__.ProcessGroupRpcTestWithSpawn) ... RPC was initialized with the PROCESS_GROUP backend which is deprecated and slated to be removed and superseded by the TENSORPIPE backend. It is recommended to migrate to the TENSORPIPE backend.
Mar 02 20:52:49 RPC was initialized with the PROCESS_GROUP backend which is deprecated and slated to be removed and superseded by the TENSORPIPE backend. It is recommended to migrate to the TENSORPIPE backend.
Mar 02 20:52:49 RPC was initialized with the PROCESS_GROUP backend which is deprecated and slated to be removed and superseded by the TENSORPIPE backend. It is recommended to migrate to the TENSORPIPE backend.
Mar 02 20:52:49 RPC was initialized with the PROCESS_GROUP backend which is deprecated and slated to be removed and superseded by the TENSORPIPE backend. It is recommended to migrate to the TENSORPIPE backend.

🚧 1 fixed upstream failure:

These were probably caused by upstream breakages that were already fixed.

Please rebase on the viable/strict branch (expand for instructions)

If your commit is older than viable/strict, run these commands:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

bertmaher added a commit that referenced this pull request Mar 2, 2021
Split with tail followed by reorder causes a segfault in NNC
Split with mask followed by reorder generates invalid code that writes out of
bounds

Differential Revision: [D26746254](https://our.internmc.facebook.com/intern/diff/D26746254/)

ghstack-source-id: 122827308
Pull Request resolved: #53091
Copy link
Contributor Author

@bertmaher bertmaher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously since these cause segfaults I won't land until those are cleared up ;-)

@navahgar
Copy link
Contributor

navahgar commented Mar 2, 2021

Obviously since these cause segfaults I won't land until those are cleared up ;-)

How about disabling and landing them? So that it is easier to test and enable on fix.

Copy link
Contributor

@asuhan asuhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to you and the difficulty of the fix whether to land those disabled or wait, but obviously we'll need to re-review if we wait.

Out of curiosity, how did you find these, convolutions with shapes which need epilogue for vectorization?

@bertmaher
Copy link
Contributor Author

Up to you and the difficulty of the fix whether to land those disabled or wait, but obviously we'll need to re-review if we wait.

Out of curiosity, how did you find these, convolutions with shapes which need epilogue for vectorization?

Yeah I’ll go ahead and disable and land so we can parallelization the fixes.

I found them when trying to vectorize a 56x56 convolution by a factor of 16; I was hoping either to get an 8-wide epilogue or a mask, and neither worked :-)

Split with tail followed by reorder causes a segfault in NNC
Split with mask followed by reorder generates invalid code that writes out of
bounds

Differential Revision: [D26746254](https://our.internmc.facebook.com/intern/diff/D26746254/)

[ghstack-poisoned]
bertmaher added a commit that referenced this pull request Mar 2, 2021
Pull Request resolved: #53091

Split with tail followed by reorder causes a segfault in NNC
Split with mask followed by reorder generates invalid code that writes out of
bounds
ghstack-source-id: 122870733

Differential Revision: [D26746254](https://our.internmc.facebook.com/intern/diff/D26746254/)
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 565d823.

@facebook-github-bot facebook-github-bot deleted the gh/bertmaher/82/head branch March 6, 2021 15:17
aocsa pushed a commit to Quansight/pytorch that referenced this pull request Mar 15, 2021
Summary:
Pull Request resolved: pytorch#53091

Split with tail followed by reorder causes a segfault in NNC
Split with mask followed by reorder generates invalid code that writes out of
bounds
ghstack-source-id: 122870733

Test Plan: LoopNest.ColReduceSplit*

Reviewed By: navahgar

Differential Revision: D26746254

fbshipit-source-id: f8a0de18531b34d2bf06ccaa35d9c98b81b5c600
xsacha pushed a commit to xsacha/pytorch that referenced this pull request Mar 31, 2021
Summary:
Pull Request resolved: pytorch#53091

Split with tail followed by reorder causes a segfault in NNC
Split with mask followed by reorder generates invalid code that writes out of
bounds
ghstack-source-id: 122870733

Test Plan: LoopNest.ColReduceSplit*

Reviewed By: navahgar

Differential Revision: D26746254

fbshipit-source-id: f8a0de18531b34d2bf06ccaa35d9c98b81b5c600
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants