-
Notifications
You must be signed in to change notification settings - Fork 25k
[fixing cuda launch config failure on UpSampleNearest] #28927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fixing cuda launch config failure on UpSampleNearest] #28927
Conversation
Adding limitation on launch config for grid sizes as well; Test added in test_cuda;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soumith is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure that if you are limiting the grid dimensions, you are handling it correctly in the kernel. I've flagged 2 places where you are not doing that, there are likely others. Also, you've added a single test that's not testing everything you changed, and it is also not testing correctness, just that you are not erroring out. That's the wrong kind of test to find silent wrong answers.
int grid_x = std::min<int>( | ||
maxGridSize[0], cuda::ATenCeilDiv(output_width, block_x)); | ||
int grid_y = std::min<int>( | ||
maxGridSize[1], cuda::ATenCeilDiv(output_height, block_y)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Granted, your grid_y and especially grid_x are very unlikely to be more than maxGridSize, but if they ever are, you don't have a loop in the kernel to handle that, so better error out.
Also, if your grid_x is ever more than maxGridSize[0] (which is 2**31), you'll overflow in your offset computation which uses int
type.
dim3 gdim{cuda::ATenCeilDiv(n, bdim.x)}; | ||
dim3 gdim{std::min<unsigned int>( | ||
at::cuda::getCurrentDeviceProperties()->maxGridSize[0], | ||
cuda::ATenCeilDiv(n, bdim.x))}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, you are not handling it correctly in the kernel, so don't truncate grid dim here just to silently produce wrong results
@ngimel good catch! Let me take a better look tomorrow to make sure it errors out in cases where the kernel would otherwise produce wrong results. |
Summary: This is to fix pytorch/pytorch#22526 Adding limitation on launch config for grid sizes as well, previous code is asking to launch blocks more than what's supported by the hardware; Test added in test_cuda; Pull Request resolved: pytorch/pytorch#28927 Differential Revision: D18241759 Pulled By: soumith fbshipit-source-id: 8f2535bb0bc4ea7998024b137576a38067668999
okay I'm reverting the PR, sorry. I'll wait for a new PR with changes that Natalia asked for. |
Any update? |
This is to fix #22526
Adding limitation on launch config for grid sizes as well, previous code is asking to launch blocks more than what's supported by the hardware;
Test added in test_cuda;