Skip to content

Conversation

mcarilli
Copy link
Collaborator

@mcarilli mcarilli commented Feb 22, 2021

Should close #51992.

Suggested by one of our compiler people (Hari Sandanagobalane, don't know github handle). Also big thanks to @ngimel @zasdfgbnm for distilling a minimal repro of the original failures.

Good news is, the bug is a likely a foreach kernel bug and not an 11.2 compiler bug. Therefore, in theory it could affect any cuda version. We think 11.2 exposed it by optimizing more aggressively than previous toolkits.

IIRC some foreach tests were disabled because of this bug, but I'm not sure which or how many. @ngimel @izdeby what tests should I reenable?

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Feb 22, 2021

💊 CI failures summary and remediations

As of commit 60c811b (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

1 failure not recognized by patterns:

Job Step Action
GitHub Actions flake8-py3 Add annotations 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

@codecov
Copy link

codecov bot commented Feb 22, 2021

Codecov Report

Merging #52591 (e53c5d3) into master (d491fc6) will increase coverage by 0.55%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #52591      +/-   ##
==========================================
+ Coverage   80.20%   80.76%   +0.55%     
==========================================
  Files        1969     1969              
  Lines      216041   216063      +22     
==========================================
+ Hits       173284   174502    +1218     
+ Misses      42757    41561    -1196     

@zasdfgbnm
Copy link
Collaborator

Please reenable these tests: https://github.com/pytorch/pytorch/pull/51598/files#diff-f2b37d0b5812153acf9ff1e337d375c5cad0076c4ddc30535ec2d55c8ac9b770

@bdhirsh bdhirsh added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Feb 22, 2021
@izdeby izdeby mentioned this pull request Feb 22, 2021
@izdeby
Copy link
Contributor

izdeby commented Feb 22, 2021

@mcarilli, thank you for this! Looks like the fix is working.

@ngimel
Copy link
Collaborator

ngimel commented Feb 22, 2021

Please change tests

def test_int_scalar(self, device, dtype):
to make sure that this is caught by foreach tests (it definitely was not caught previously, foreach tests were passing on 11.2)

for(int i_start = 0; i_start < n && i_start < chunk_size; i_start += blockDim.x * kILP) {
load_args<depth>(r_args, args, i_start, chunk_size, n);
// Regardless if "depth" is 1 (for inplace) or 2 (for out of place), r_args has depth 1
load_args<1>(r_args, args, i_start, chunk_size, n);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depth template argument is no longer needed

@mcarilli
Copy link
Collaborator Author

Closing to resubmit to ci-all (#52634).

@mcarilli mcarilli closed this Feb 22, 2021
facebook-github-bot pushed a commit that referenced this pull request Mar 1, 2021
…n 11.2 (ci-all edition) (#52634)

Summary:
Should close #51992.

ci-all resubmit of #52591. The plot also thickened considerably since then. Every foreach functor, it turns out, has bad `r_args` accesses for certain code paths and instantiations.

Also, I noticed the [`n % kILP == 0`](https://github.com/pytorch/pytorch/blob/2680ff7759d8a441eada383ba7aa0fa42c7d35ed/aten/src/ATen/native/cuda/ForeachFunctors.cuh#L87) condition for vectorization in all functors is way too restrictive: it'll refuse to vectorize anything on any tensor whose overall numel is not a multiple of ILP. That's out of scope though.

Pull Request resolved: #52634

Reviewed By: H-Huang

Differential Revision: D26725991

Pulled By: izdeby

fbshipit-source-id: 4bade0ac186bf85527baddc1c44b2c2b8e3c9777
aocsa pushed a commit to Quansight/pytorch that referenced this pull request Mar 15, 2021
…n 11.2 (ci-all edition) (pytorch#52634)

Summary:
Should close pytorch#51992.

ci-all resubmit of pytorch#52591. The plot also thickened considerably since then. Every foreach functor, it turns out, has bad `r_args` accesses for certain code paths and instantiations.

Also, I noticed the [`n % kILP == 0`](https://github.com/pytorch/pytorch/blob/2680ff7759d8a441eada383ba7aa0fa42c7d35ed/aten/src/ATen/native/cuda/ForeachFunctors.cuh#L87) condition for vectorization in all functors is way too restrictive: it'll refuse to vectorize anything on any tensor whose overall numel is not a multiple of ILP. That's out of scope though.

Pull Request resolved: pytorch#52634

Reviewed By: H-Huang

Differential Revision: D26725991

Pulled By: izdeby

fbshipit-source-id: 4bade0ac186bf85527baddc1c44b2c2b8e3c9777
xsacha pushed a commit to xsacha/pytorch that referenced this pull request Mar 31, 2021
…n 11.2 (ci-all edition) (pytorch#52634)

Summary:
Should close pytorch#51992.

ci-all resubmit of pytorch#52591. The plot also thickened considerably since then. Every foreach functor, it turns out, has bad `r_args` accesses for certain code paths and instantiations.

Also, I noticed the [`n % kILP == 0`](https://github.com/pytorch/pytorch/blob/2680ff7759d8a441eada383ba7aa0fa42c7d35ed/aten/src/ATen/native/cuda/ForeachFunctors.cuh#L87) condition for vectorization in all functors is way too restrictive: it'll refuse to vectorize anything on any tensor whose overall numel is not a multiple of ILP. That's out of scope though.

Pull Request resolved: pytorch#52634

Reviewed By: H-Huang

Differential Revision: D26725991

Pulled By: izdeby

fbshipit-source-id: 4bade0ac186bf85527baddc1c44b2c2b8e3c9777
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test_optim.py failures for CUDA 11.2 on CI

7 participants