[fixing cuda launch config failure on UpSampleNearest] #28927

jjsjann123 · 2019-10-30T21:56:27Z

This is to fix #22526

Adding limitation on launch config for grid sizes as well, previous code is asking to launch blocks more than what's supported by the hardware;
Test added in test_cuda;

Adding limitation on launch config for grid sizes as well; Test added in test_cuda;

facebook-github-bot

@soumith is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ngimel

Please make sure that if you are limiting the grid dimensions, you are handling it correctly in the kernel. I've flagged 2 places where you are not doing that, there are likely others. Also, you've added a single test that's not testing everything you changed, and it is also not testing correctness, just that you are not erroring out. That's the wrong kind of test to find silent wrong answers.

ngimel · 2019-10-31T05:07:14Z

aten/src/ATen/native/cuda/UpSampleNearest2d.cu

+  int grid_x = std::min<int>(
+      maxGridSize[0], cuda::ATenCeilDiv(output_width, block_x));
+  int grid_y = std::min<int>(
+      maxGridSize[1], cuda::ATenCeilDiv(output_height, block_y));


Granted, your grid_y and especially grid_x are very unlikely to be more than maxGridSize, but if they ever are, you don't have a loop in the kernel to handle that, so better error out.
Also, if your grid_x is ever more than maxGridSize[0] (which is 2**31), you'll overflow in your offset computation which uses int type.

ngimel · 2019-10-31T05:16:38Z

aten/src/ATen/native/cuda/UpSampleNearest2d.cu

-  dim3 gdim{cuda::ATenCeilDiv(n, bdim.x)};
+  dim3 gdim{std::min<unsigned int>(
+      at::cuda::getCurrentDeviceProperties()->maxGridSize[0],
+      cuda::ATenCeilDiv(n, bdim.x))};


same here, you are not handling it correctly in the kernel, so don't truncate grid dim here just to silently produce wrong results

jjsjann123 · 2019-10-31T07:30:10Z

@ngimel good catch!

Let me take a better look tomorrow to make sure it errors out in cases where the kernel would otherwise produce wrong results.

Summary: This is to fix pytorch/pytorch#22526 Adding limitation on launch config for grid sizes as well, previous code is asking to launch blocks more than what's supported by the hardware; Test added in test_cuda; Pull Request resolved: pytorch/pytorch#28927 Differential Revision: D18241759 Pulled By: soumith fbshipit-source-id: 8f2535bb0bc4ea7998024b137576a38067668999

facebook-github-bot · 2019-10-31T10:04:10Z

@soumith merged this pull request in 1b1e3d5.

soumith · 2019-10-31T14:48:07Z

okay I'm reverting the PR, sorry. I'll wait for a new PR with changes that Natalia asked for.

zhanwenchen · 2019-11-01T21:26:17Z

Any update?

ngimel · 2019-11-01T21:38:42Z

#29016

jjsjann123 added 3 commits October 30, 2019 14:50

[fixing cuda launch config failure on UpSampleNearest]

da41b05

Adding limitation on launch config for grid sizes as well; Test added in test_cuda;

fixing test which seems to require too much resource

bf519d3

fixing rocm build issue with uint/int

980451f

jjsjann123 requested a review from ngimel October 31, 2019 03:14

soumith approved these changes Oct 31, 2019

View reviewed changes

facebook-github-bot reviewed Oct 31, 2019

View reviewed changes

ngimel requested changes Oct 31, 2019

View reviewed changes

facebook-github-bot closed this in 1b1e3d5 Oct 31, 2019

facebook-github-bot added the merged label Oct 31, 2019

HarikrishnanBalagopal mentioned this pull request May 7, 2020

Building on google colab gives lots of warnings lmnt-com/haste#13

Closed

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[fixing cuda launch config failure on UpSampleNearest] #28927

[fixing cuda launch config failure on UpSampleNearest] #28927

Uh oh!

jjsjann123 commented Oct 30, 2019

Uh oh!

facebook-github-bot left a comment

Uh oh!

ngimel left a comment

Uh oh!

ngimel Oct 31, 2019

Uh oh!

ngimel Oct 31, 2019

Uh oh!

jjsjann123 commented Oct 31, 2019

Uh oh!

facebook-github-bot commented Oct 31, 2019

Uh oh!

soumith commented Oct 31, 2019

Uh oh!

zhanwenchen commented Nov 1, 2019

Uh oh!

ngimel commented Nov 1, 2019

Uh oh!

Uh oh!

[fixing cuda launch config failure on UpSampleNearest] #28927

[fixing cuda launch config failure on UpSampleNearest] #28927

Uh oh!

Conversation

jjsjann123 commented Oct 30, 2019

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

ngimel left a comment

Choose a reason for hiding this comment

Uh oh!

ngimel Oct 31, 2019

Choose a reason for hiding this comment

Uh oh!

ngimel Oct 31, 2019

Choose a reason for hiding this comment

Uh oh!

jjsjann123 commented Oct 31, 2019

Uh oh!

facebook-github-bot commented Oct 31, 2019

Uh oh!

soumith commented Oct 31, 2019

Uh oh!

zhanwenchen commented Nov 1, 2019

Uh oh!

ngimel commented Nov 1, 2019

Uh oh!

Uh oh!