New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix replication_pad for cuda launch configuration #50565
Conversation
auto devOutput = output_.packed_accessor64<scalar_t, 3>(); | ||
|
||
int outputPlaneSize = devOutput.size(2); | ||
int size1 = devOutput.size(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int64_t for sizes
int size1 = devOutput.size(1); | ||
int size0 = devOutput.size(0); | ||
|
||
int y_left = size1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (int64_t block_y=0; block_y < size1; block_y += 65535){
auto block_y_size = std::min(size1-block_y, 65535);
for (int64_t block_z=0; block_z < size0; block_z += 65535) {
auto block_z_size = std::min(size0-block_z, 65535);
dim3 gridSize(THCCeilDiv(outputPlaneSize, 256), block_y_size, block_z_size);
dim3 blockSize(...);
launch_kernel(...., block_y, block_z);
}
}
is a bit simpler
Codecov Report
@@ Coverage Diff @@
## master #50565 +/- ##
==========================================
+ Coverage 80.65% 80.67% +0.01%
==========================================
Files 1913 1910 -3
Lines 208151 207864 -287
==========================================
- Hits 167887 167696 -191
+ Misses 40264 40168 -96 |
cc @ptrblck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Fix #49601