-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Probable fix for out of place BinaryOpScalar bad values and/or IMAs on 11.2 #52591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💊 CI failures summary and remediationsAs of commit 60c811b (more details on the Dr. CI page):
1 failure not recognized by patterns:
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
Codecov Report
@@ Coverage Diff @@
## master #52591 +/- ##
==========================================
+ Coverage 80.20% 80.76% +0.55%
==========================================
Files 1969 1969
Lines 216041 216063 +22
==========================================
+ Hits 173284 174502 +1218
+ Misses 42757 41561 -1196 |
@mcarilli, thank you for this! Looks like the fix is working. |
Please change tests Line 353 in 4386a38
|
for(int i_start = 0; i_start < n && i_start < chunk_size; i_start += blockDim.x * kILP) { | ||
load_args<depth>(r_args, args, i_start, chunk_size, n); | ||
// Regardless if "depth" is 1 (for inplace) or 2 (for out of place), r_args has depth 1 | ||
load_args<1>(r_args, args, i_start, chunk_size, n); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
depth
template argument is no longer needed
Closing to resubmit to ci-all (#52634). |
…n 11.2 (ci-all edition) (#52634) Summary: Should close #51992. ci-all resubmit of #52591. The plot also thickened considerably since then. Every foreach functor, it turns out, has bad `r_args` accesses for certain code paths and instantiations. Also, I noticed the [`n % kILP == 0`](https://github.com/pytorch/pytorch/blob/2680ff7759d8a441eada383ba7aa0fa42c7d35ed/aten/src/ATen/native/cuda/ForeachFunctors.cuh#L87) condition for vectorization in all functors is way too restrictive: it'll refuse to vectorize anything on any tensor whose overall numel is not a multiple of ILP. That's out of scope though. Pull Request resolved: #52634 Reviewed By: H-Huang Differential Revision: D26725991 Pulled By: izdeby fbshipit-source-id: 4bade0ac186bf85527baddc1c44b2c2b8e3c9777
…n 11.2 (ci-all edition) (pytorch#52634) Summary: Should close pytorch#51992. ci-all resubmit of pytorch#52591. The plot also thickened considerably since then. Every foreach functor, it turns out, has bad `r_args` accesses for certain code paths and instantiations. Also, I noticed the [`n % kILP == 0`](https://github.com/pytorch/pytorch/blob/2680ff7759d8a441eada383ba7aa0fa42c7d35ed/aten/src/ATen/native/cuda/ForeachFunctors.cuh#L87) condition for vectorization in all functors is way too restrictive: it'll refuse to vectorize anything on any tensor whose overall numel is not a multiple of ILP. That's out of scope though. Pull Request resolved: pytorch#52634 Reviewed By: H-Huang Differential Revision: D26725991 Pulled By: izdeby fbshipit-source-id: 4bade0ac186bf85527baddc1c44b2c2b8e3c9777
…n 11.2 (ci-all edition) (pytorch#52634) Summary: Should close pytorch#51992. ci-all resubmit of pytorch#52591. The plot also thickened considerably since then. Every foreach functor, it turns out, has bad `r_args` accesses for certain code paths and instantiations. Also, I noticed the [`n % kILP == 0`](https://github.com/pytorch/pytorch/blob/2680ff7759d8a441eada383ba7aa0fa42c7d35ed/aten/src/ATen/native/cuda/ForeachFunctors.cuh#L87) condition for vectorization in all functors is way too restrictive: it'll refuse to vectorize anything on any tensor whose overall numel is not a multiple of ILP. That's out of scope though. Pull Request resolved: pytorch#52634 Reviewed By: H-Huang Differential Revision: D26725991 Pulled By: izdeby fbshipit-source-id: 4bade0ac186bf85527baddc1c44b2c2b8e3c9777
Should close #51992.
Suggested by one of our compiler people (Hari Sandanagobalane, don't know github handle). Also big thanks to @ngimel @zasdfgbnm for distilling a minimal repro of the original failures.
Good news is, the bug is a likely a foreach kernel bug and not an 11.2 compiler bug. Therefore, in theory it could affect any cuda version. We think 11.2 exposed it by optimizing more aggressively than previous toolkits.
IIRC some foreach tests were disabled because of this bug, but I'm not sure which or how many. @ngimel @izdeby what tests should I reenable?