Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dispatch to mv rather than mm in the case that tensor1.ndim == 1 and tensor2.ndim == 2 #75195

Closed
wants to merge 10 commits into from

Conversation

lezcano
Copy link
Collaborator

@lezcano lezcano commented Apr 4, 2022

…tensor2.ndim == 2

This should hopefully be faster, it makes the calling code simpler, and
it solves a bug when using matmul with the out= parameter (before it
would throw an incorrect error).

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Apr 4, 2022

🔗 Helpful links

💊 CI failures summary and remediations

As of commit debf0bf (more details on the Dr. CI page):

Expand to see more
  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages

See GitHub Actions build pull / pytorch-xla-linux-bionic-py3.7-clang8 / test (xla, 1, 1, linux.2xlarge) (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-05-04T20:36:24.5572937Z �[0;31m[ FAILED ] �[mAtenXlaTensorTest.TestMatmul_1x2
2022-05-04T20:36:24.5513452Z �[0;32m[ RUN      ] �[mXlaUtilCacheTest.BasicTest
2022-05-04T20:36:24.5514102Z �[0;32m[       OK ] �[mXlaUtilCacheTest.BasicTest (0 ms)
2022-05-04T20:36:24.5517062Z �[0;32m[----------] �[m1 test from XlaUtilCacheTest (0 ms total)
2022-05-04T20:36:24.5517392Z 
2022-05-04T20:36:24.5570100Z �[0;32m[----------] �[mGlobal test environment tear-down
2022-05-04T20:36:24.5570786Z �[0;32m[==========] �[m623 tests from 8 test suites ran. (512555 ms total)
2022-05-04T20:36:24.5571192Z �[0;32m[  PASSED  ] �[m621 tests.
2022-05-04T20:36:24.5571554Z �[0;32m[  SKIPPED ] �[m1 test, listed below:
2022-05-04T20:36:24.5572040Z �[0;32m[  SKIPPED ] �[mAtenXlaTensorTest.TestGroupNormBackward
2022-05-04T20:36:24.5572501Z �[0;31m[  FAILED  ] �[m1 test, listed below:
2022-05-04T20:36:24.5572937Z �[0;31m[  FAILED  ] �[mAtenXlaTensorTest.TestMatmul_1x2
2022-05-04T20:36:24.5573171Z 
2022-05-04T20:36:24.5573268Z  1 FAILED TEST
2022-05-04T20:36:24.7276308Z + cleanup
2022-05-04T20:36:24.7276526Z + retcode=1
2022-05-04T20:36:24.7276685Z + set +x
2022-05-04T20:36:24.7569431Z ##[error]Process completed with exit code 1.
2022-05-04T20:36:24.7765995Z ##[group]Run pytorch/pytorch/.github/actions/get-workflow-job-id@master
2022-05-04T20:36:24.7766245Z with:
2022-05-04T20:36:24.7766611Z   github-token: ***
2022-05-04T20:36:24.7766792Z env:

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

…m == 1 and tensor2.ndim == 2"

This should hopefully be faster, it makes the calling code simpler, and
it solves a bug when using matmul with the out= parameter (before it
would throw an incorrect error).

[ghstack-poisoned]
@lezcano lezcano mentioned this pull request Apr 4, 2022
…m == 1 and tensor2.ndim == 2"

This should hopefully be faster, it makes the calling code simpler, and
it solves a bug when using matmul with the out= parameter (before it
would throw an incorrect error).

[ghstack-poisoned]
@lezcano
Copy link
Collaborator Author

lezcano commented Apr 4, 2022

@ezyang This PR shows (or will show, once the CI of the previous PRs is green) that this is the line that's causing trouble.

We get this error: https://github.com/pytorch/pytorch/runs/5820894454?check_suite_focus=true in the distributed build. I have tried to track down what could be causing it, and following the execution of the trace that it throws, I don't see how that state could be reached. It's also particularly odd that this just happens in some random test in a distributed setting...

…m == 1 and tensor2.ndim == 2"

This should hopefully be faster, it makes the calling code simpler, and
it solves a bug when using matmul with the out= parameter (before it
would throw an incorrect error).

[ghstack-poisoned]
@ezyang
Copy link
Contributor

ezyang commented Apr 5, 2022

Can you repro it locally?

I looked over the patch and it looks... fine-ish? The contiguous changes look a little skeevy

@lezcano
Copy link
Collaborator Author

lezcano commented Apr 5, 2022

I could not reproduce it locally. This PR should just be one line. I pushed some further changes to start debugging it via CI, but this is certainly not the best approach.

Once we had access to sshing into the jobs, but J believe that's not working atm...

…m == 1 and tensor2.ndim == 2"

This should hopefully be faster, it makes the calling code simpler, and
it solves a bug when using matmul with the out= parameter (before it
would throw an incorrect error).

[ghstack-poisoned]
…m == 1 and tensor2.ndim == 2"

This should hopefully be faster, it makes the calling code simpler, and
it solves a bug when using matmul with the out= parameter (before it
would throw an incorrect error).

[ghstack-poisoned]
…m == 1 and tensor2.ndim == 2"

This should hopefully be faster, it makes the calling code simpler, and
it solves a bug when using matmul with the out= parameter (before it
would throw an incorrect error).

[ghstack-poisoned]
@lezcano lezcano mentioned this pull request May 4, 2022
@ngimel
Copy link
Collaborator

ngimel commented May 4, 2022

@pytorchbot merge this

facebook-github-bot pushed a commit that referenced this pull request May 6, 2022
…tensor2.ndim == 2 (#75195)

Summary:
This should hopefully be faster, it makes the calling code simpler, and
it solves a bug when using matmul with the out= parameter (before it
would throw an incorrect error).

Pull Request resolved: #75195

Approved by: https://github.com/ezyang

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/4baf7c0899a2fa9c3630613f37d5fc65971db21c

Reviewed By: malfet

Differential Revision: D36171145

fbshipit-source-id: f6054f25940e6a8ad9de0e7576def098b15e9c3f
@facebook-github-bot facebook-github-bot deleted the gh/Lezcano/57/head branch May 8, 2022 14:16
zou3519 added a commit that referenced this pull request Jul 29, 2022
…= 1 and tensor2.ndim == 2 (#75195)"

This reverts commit d024d26.

ghstack-source-id: d5daa0c70bcc8ca42e69025680b1c3d2343ed061
Pull Request resolved: #82484
zou3519 added a commit that referenced this pull request Jul 29, 2022
…= 1 and tensor2.ndim == 2 (#75195)"

This reverts commit d024d26.

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Jul 29, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Why is this a partial revert?
- the out= tests for nn.functional.linear fail on a complete revert
- the profiler tests fail on the revert (so I updated the expecttests
for the profiler tests)

Test Plan:
- test offline that the functorch regression was fixed

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Jul 29, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Why is this a partial revert?
- the out= tests for nn.functional.linear fail on a complete revert
- the profiler tests fail on the revert (so I updated the expecttests
for the profiler tests)

Test Plan:
- test offline that the functorch regression was fixed

ghstack-source-id: 1332851f306c1354c705b0d1dfaa8a573233d024
Pull Request resolved: #82504
@zou3519 zou3519 mentioned this pull request Aug 1, 2022
zou3519 added a commit that referenced this pull request Aug 1, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Aug 1, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Aug 1, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Aug 1, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Aug 1, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed

ghstack-source-id: ff73cb6ce253afc336f7090a51450f69955cd81c
Pull Request resolved: #82504
zou3519 added a commit that referenced this pull request Aug 1, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Aug 1, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed

ghstack-source-id: f127f9be33ba35ceeabbc6a9a4d8c24654defad7
Pull Request resolved: #82504
zou3519 added a commit that referenced this pull request Aug 1, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Aug 2, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Aug 2, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Aug 2, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed

ghstack-source-id: ea72d0f01b28fa298535233e818ca62180f1c3f5
Pull Request resolved: #82504
pytorchmergebot pushed a commit that referenced this pull request Aug 2, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed
Pull Request resolved: #82504
Approved by: https://github.com/ngimel, https://github.com/ezyang, https://github.com/atalman
zou3519 added a commit that referenced this pull request Aug 2, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed
Pull Request resolved: #82504
Approved by: https://github.com/ngimel, https://github.com/ezyang, https://github.com/atalman
@zou3519 zou3519 mentioned this pull request Aug 2, 2022
atalman pushed a commit that referenced this pull request Aug 2, 2022
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Test Plan:
- test offline that the functorch regression was fixed
Pull Request resolved: #82504
Approved by: https://github.com/ngimel, https://github.com/ezyang, https://github.com/atalman
facebook-github-bot pushed a commit that referenced this pull request Aug 4, 2022
Summary:
This is a short-term fix for a serious regression in functorch
(pytorch/functorch#989).

Additional things this PR does:
- the out= tests for nn.functional.linear fail after the revert. I added
some xfails. These xfails were present in the original PR (#75195).
- the profiler tests fail on the revert, so I updated the expecttests
for the profiler tests

Pull Request resolved: #82504
Approved by: https://github.com/ngimel, https://github.com/ezyang, https://github.com/atalman

Test Plan:
contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/8f86e361918e3c8a0ee2b569be6c82dfbf32d705

Test plan from GitHub:
- test offline that the functorch regression was fixed

Reviewed By: kit1980

Differential Revision: D38394573

Pulled By: zou3519

fbshipit-source-id: f9185d9cb447fb439d8e402712f2f2617f73b8cc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul open source release notes: performance_as_product release notes category topic: performance topic category with-ssh
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants