Skip to content

[fixing cuda launch config failure on UpSampleNearest] #28927

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

jjsjann123
Copy link
Collaborator

This is to fix #22526

Adding limitation on launch config for grid sizes as well, previous code is asking to launch blocks more than what's supported by the hardware;
Test added in test_cuda;

@jjsjann123 jjsjann123 requested a review from ngimel October 31, 2019 03:14
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soumith is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Collaborator

@ngimel ngimel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure that if you are limiting the grid dimensions, you are handling it correctly in the kernel. I've flagged 2 places where you are not doing that, there are likely others. Also, you've added a single test that's not testing everything you changed, and it is also not testing correctness, just that you are not erroring out. That's the wrong kind of test to find silent wrong answers.

int grid_x = std::min<int>(
maxGridSize[0], cuda::ATenCeilDiv(output_width, block_x));
int grid_y = std::min<int>(
maxGridSize[1], cuda::ATenCeilDiv(output_height, block_y));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Granted, your grid_y and especially grid_x are very unlikely to be more than maxGridSize, but if they ever are, you don't have a loop in the kernel to handle that, so better error out.
Also, if your grid_x is ever more than maxGridSize[0] (which is 2**31), you'll overflow in your offset computation which uses int type.

dim3 gdim{cuda::ATenCeilDiv(n, bdim.x)};
dim3 gdim{std::min<unsigned int>(
at::cuda::getCurrentDeviceProperties()->maxGridSize[0],
cuda::ATenCeilDiv(n, bdim.x))};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, you are not handling it correctly in the kernel, so don't truncate grid dim here just to silently produce wrong results

@jjsjann123
Copy link
Collaborator Author

@ngimel good catch!

Let me take a better look tomorrow to make sure it errors out in cases where the kernel would otherwise produce wrong results.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Oct 31, 2019
Summary:
This is to fix pytorch/pytorch#22526

Adding limitation on launch config for grid sizes as well, previous code is asking to launch blocks more than what's supported by the hardware;
Test added in test_cuda;
Pull Request resolved: pytorch/pytorch#28927

Differential Revision: D18241759

Pulled By: soumith

fbshipit-source-id: 8f2535bb0bc4ea7998024b137576a38067668999
@facebook-github-bot
Copy link
Contributor

@soumith merged this pull request in 1b1e3d5.

@soumith
Copy link
Member

soumith commented Oct 31, 2019

okay I'm reverting the PR, sorry. I'll wait for a new PR with changes that Natalia asked for.

@zhanwenchen
Copy link

Any update?

@ngimel
Copy link
Collaborator

ngimel commented Nov 1, 2019

#29016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RuntimeError: CUDA error: invalid configuration argument
6 participants