[OpInfo] move matmul to OpInfo #55543

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

walterddr wants to merge 4 commits into pytorch:master from walterddr:hackathon_opinfo_matmul

Contributor

walterddr commented Apr 7, 2021

No description provided.

walterddr added the hackathon label

facebook-github-bot added the cla signed label

Contributor

facebook-github-bot commented Apr 7, 2021 •

edited

Loading

💊 CI failures summary and remediations

As of commit 6df490f (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

walterddr force-pushed the hackathon_opinfo_matmul branch from 9312e5f to d03d0b6 Compare

April 7, 2021 23:09

codecov bot commented Apr 8, 2021 •

edited

Loading

Codecov Report

Merging #55543 (6df490f) into master (f1a0b81) will decrease coverage by 0.19%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master   #55543      +/-   ##
==========================================
- Coverage   77.43%   77.24%   -0.20%     
==========================================
  Files        1892     1892              
  Lines      187597   187605       +8     
==========================================
- Hits       145268   144908     -360     
- Misses      42329    42697     +368

mruberry mentioned this pull request

OpInfo Tracker #54261

Closed

walterddr commented

View reviewed changes

torch/testing/_internal/common_methods_invocations.py Outdated

Comment on lines 3224 to 3280

Contributor Author

walterddr Apr 8, 2021

@mruberry which one of these skip is worth filing an issue?

I think test_unsupported_dtypes should definitely be one since neither support nor unsupport passes for float16.
not sure about the other 2 though.

Rong Rong added 2 commits

April 8, 2021 16:48


          [OpInfo] move matmul to OpInfo

f6877aa


          add skip for the 4 failures

c302658

walterddr force-pushed the hackathon_opinfo_matmul branch from d03d0b6 to c302658 Compare

April 8, 2021 23:50

mruberry reviewed

View reviewed changes

torch/testing/_internal/common_methods_invocations.py Outdated

    
                                ((S, S, M, M), (M,), "4d_1d", (True,)),

                                ((M,), (S, S, M, S), "1d_4d", (True,))]

                  sample_inputs = []

                  for input_args in test_cases:

Collaborator

mruberry Apr 11, 2021

for lhs_shape, rhs_shape in test_cases:

mruberry reviewed

View reviewed changes

torch/testing/_internal/common_methods_invocations.py Outdated

    
              def sample_inputs_matmul(op_info, device, dtype, requires_grad):

                  test_cases = [((L,), (L,), '', (True,)),

Collaborator

mruberry Apr 11, 2021

The strings and bools can be removed from this iterable.

Style nit: prefer tuples to lists when the iterable doesn't need to be mutable.

mruberry reviewed

View reviewed changes

torch/testing/_internal/common_methods_invocations.py Outdated

    
                                ((M,), (S, S, M, S), "1d_4d", (True,))]

                  sample_inputs = []

                  for input_args in test_cases:

                      args = (make_tensor(input_args[0], device, dtype, low=None, high=None, requires_grad=requires_grad),

Collaborator

mruberry Apr 11, 2021

lhs = make_tensor(lhs_shape, device, dtype, requires_grad=requires_grad)
rhs = make_tensor(rhs_shape, device, dtype, requires_grad=requires_grad)
sample_inputs.append(SampleInput(lhs, args=(rhs,))

mruberry reviewed

View reviewed changes

torch/testing/_internal/common_methods_invocations.py

    
                         sample_inputs_func=sample_inputs_matmul,

                         skips=(

                             # matmul does not correctly warn when resizing out= inputs

                             SkipInfo('TestCommon', 'test_out'),

Collaborator

mruberry Apr 11, 2021

This is correct and doesn't require an issue (the comments alone are enough to track out= behavior, and many ops do not implement their out= behavior properly)

mruberry reviewed

View reviewed changes

torch/testing/_internal/common_methods_invocations.py Outdated

    
                         skips=(

                             # matmul does not correctly warn when resizing out= inputs

                             SkipInfo('TestCommon', 'test_out'),

                             # matmul on complex128 doesn't work

Collaborator

mruberry Apr 11, 2021

Can you elaborate on what's going on here? This may require an issue.

mruberry reviewed

View reviewed changes

torch/testing/_internal/common_methods_invocations.py Outdated

    
                             # matmul on complex128 doesn't work

                             SkipInfo('TestGradients', 'test_fn_grad',

                                      device_type='cpu', dtypes=(torch.complex128,)),

                             # some how it doesn't throw on cpu, and crash when enabled

Collaborator

mruberry Apr 11, 2021

Yikes! Crashing sounds like an issue. Maybe even a high priority issue!

mruberry reviewed

View reviewed changes

Collaborator

mruberry left a comment

This is looking good, @walterddr. I made a few suggestions for code simplification and for two issues to file. With those tweaks and links to both issues in comments this should be OK to land!


          address diff comments

3a0c35c

This was referenced Apr 11, 2021

matmul on CPU seems to produce incorrect results for complex128 on gradcheck #55754

Closed

Some float16 inputs to CPU matmul are supported, others aren't #55755

Open


          add issue annotation

6df490f

mruberry self-requested a review

April 12, 2021 05:25

mruberry approved these changes

View reviewed changes

Collaborator

mruberry left a comment

Cool! Thanks @walterddr!

Contributor

facebook-github-bot commented Apr 12, 2021

@walterddr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot closed this in

08561ca

Contributor

facebook-github-bot commented Apr 12, 2021

@walterddr merged this pull request in 08561ca.

facebook-github-bot added the Merged label

Contributor

facebook-github-bot commented Apr 12, 2021

This pull request has been reverted by dab1cdf.

facebook-github-bot added the Reverted label

Collaborator

mruberry commented Apr 12, 2021

Looks like this hit another build with no PR CI signal :(

Should be a straightforward fix. If you submit it using ci-all/ that will let us validate the issue is resolved.

Contributor

anjali411 commented Apr 12, 2021

Hey @walterddr this PR fails on cuda 11. These tests are currently not run by CI by default. I think you need to add:

dtypesIfCUDA=floating_types_and(torch.float16, torch.complex64, torch.complex128,
                                           *[torch.bfloat16] if (CUDA11OrLater and SM53orLater) else []),

to your op-info. Also could you please submit the PR to be relanded to ci-all?

Contributor

imaginary-person commented Apr 12, 2021 •

edited

Loading

Hello @anjali411 & @walterddr,

It seems like dot is not implemented for BFloat16 at all, so the sample input for matmul that uses dot would cause the test to fail even with the check mentioned in the above comment.

pytorch/aten/src/ATen/native/Blas.cpp

Line 138 in 593295d

    
           return AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND(at::ScalarType::Half, self.scalar_type(), "dot", [&] {

The GPUs used are SM75, so the check you mentioned will enable the failing test anyway, i.e. *[torch.bfloat16] if (CUDA11OrLater and SM53orLater) else [] would be torch.bfloat16.

What about not using that sample input if the dtype is BFloat16?
Thanks!

Ref for GPU capability:

pytorch/.jenkins/pytorch/win-test-helpers/build_pytorch.bat

Line 63 in c371542

    
           :: default on circleci is Tesla T4 which has capability of 7.5, ref: https://developer.nvidia.com/cuda-gpus

This comment has been minimized.

Sign in to view

Collaborator

mruberry commented Apr 12, 2021

@mruberry, @anjali411 & @heitorschueroff: for such PRs, how about (before importing them, during development), running the tests that don't run by default by temporarily modifying pytorch/.circleci/config.yml for such a PR, and then reverting that file to its original state after those CI checks complete?

Submitting from a branch that starts with "ci-all/" should run all the checks without having to make those changes, though.

Collaborator

mruberry commented Apr 12, 2021

Hello @anjali411 & @walterddr,

It seems like dot is not implemented for BFloat16 at all, so the sample input for matmul that uses dot would cause the test to fail even with the check mentioned in the above comment.

pytorch/aten/src/ATen/native/Blas.cpp

Line 138 in 593295d

return AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND(at::ScalarType::Half, self.scalar_type(), "dot", [&] {

The GPUs used are SM75, so the check you mentioned will enable the failing test anyway, i.e. *[torch.bfloat16] if (CUDA11OrLater and SM53orLater) else [] would be torch.bfloat16.

What about not using that sample input if the dtype is BFloat16?
Thanks!

Thanks for this analysis, @imaginary-person. Seems like removing bfloat16 from the list of supported matmul dtypes (on CUDA) is a good idea. What do you think, @walterddr?

Contributor

imaginary-person commented Apr 12, 2021 •

edited

Loading

Submitting from a branch that starts with "ci-all/" should run all the checks without having to make those changes, though.

Thank you, @mruberry! That seems a lot more convenient.

EDIT: I can't use it, though, as only folks at FB have the permission to do so 😞

This comment has been minimized.

Sign in to view

Collaborator

mruberry commented Apr 13, 2021

Submitting from a branch that starts with "ci-all/" should run all the checks without having to make those changes, though.

Thank you, @mruberry! That seems a lot more convenient.

EDIT: I can't use it, though, as only folks at FB have the permission to do so 😞

I'm sorry; I thought it was available to everyone. @malfet, do we provide a mechanism for the OSS community to use ci-all?

Contributor

imaginary-person commented Apr 13, 2021

No worries, @mruberry! I was able to run tests on CUDA 11.1 by modifying the circle ci config file (temporarily).

Contributor

imaginary-person commented Apr 13, 2021 •

edited

Loading

@walterddr, I was wrong about the CUDA compute capability being 75 in the failing test.
It's 52 in the logs. I don't know if that's intentional (tests backward compatibility?), or if the GPU isn't actually an Nvidia Tesla T4.

Contributor Author

walterddr commented Apr 13, 2021

@walterddr, I was wrong about the CUDA compute capability being 75 in the failing test.
It's 52 in the logs. I don't know if that's intentional (tests backward compatibility?), or if the GPU isn't actually an Nvidia Tesla T4.

thanks for the report. Let me create a follow PR from the base repo and tag you

Contributor

imaginary-person commented Apr 13, 2021

thanks for the report. Let me create a follow PR from the base repo and tag you

No worries! I only mentioned because I had misinformed you earlier. It seems only Windows tests use SM_75 GPUs.
It doesn't change the approach required towards this PR - you should remove bfloat16, as @mruberry mentioned.

walterddr mentioned this pull request

[OpInfo][take2] move matmul to OpInfo #55947

Closed

facebook-github-bot pushed a commit that referenced this pull request


          [OpInfo][take2] move matmul to OpInfo (#55947)

Summary:
This is a reland of #55543 after fixing bfloat16 issues.

Pull Request resolved: #55947

Reviewed By: mruberry

Differential Revision: D27765035

Pulled By: walterddr

fbshipit-source-id: b27a769de7686777012194ebbc1f38fc5d4acb67

krshrimali pushed a commit to krshrimali/pytorch that referenced this pull request


          [OpInfo] move matmul to OpInfo (pytorch#55543)

638a0f4

Summary: Pull Request resolved: pytorch#55543

Reviewed By: samestep

Differential Revision: D27708944

Pulled By: walterddr

fbshipit-source-id: c200ded15082eaeed7ba3077a0c8629fed0db505

krshrimali pushed a commit to krshrimali/pytorch that referenced this pull request


          [OpInfo][take2] move matmul to OpInfo (pytorch#55947)

b8c661a

Summary:
This is a reland of pytorch#55543 after fixing bfloat16 issues.

Pull Request resolved: pytorch#55947

Reviewed By: mruberry

Differential Revision: D27765035

Pulled By: walterddr

fbshipit-source-id: b27a769de7686777012194ebbc1f38fc5d4acb67

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed hackathon Merged Reverted