Skip to content

Conversation

syed-ahmed
Copy link
Collaborator

Fixes #12324

Co-authored-by: Christian Sarofeen <csarofeen@nvidia.com>
@syed-ahmed syed-ahmed changed the title Adds launch bounds for CTC loss kernel [CUDA 10 fixes] Adds launch bounds for CTC loss kernel Oct 5, 2018
@ssnl
Copy link
Collaborator

ssnl commented Oct 5, 2018

@pytorchbot retest this please

@syed-ahmed
Copy link
Collaborator Author

Looks like the rocm builds are failing with this error: expected unqualified-id __launch_bounds__(std::is_same<scalar_t, float>::value ? 1024 : 896, 1). Anybody happen to know why that's the case?

@syed-ahmed
Copy link
Collaborator Author

syed-ahmed commented Oct 10, 2018

Looks like HIP couldn't handle the std::is_same through the __launch_bounds__ macro expansion. Adding brackets around it solved the issue. Windows failure seems unrelated, so this looks good to me.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Oct 11, 2018
Summary:
Fixes pytorch/pytorch#12324
Pull Request resolved: pytorch/pytorch#12379

Differential Revision: D10318361

Pulled By: ezyang

fbshipit-source-id: aec4ae8205e780b18560d639543ed9d0ef0527ce
@syed-ahmed syed-ahmed deleted the ctc-fix-launch-bounds branch October 11, 2018 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test_autograd.TestAutograd.test_ctc_loss fails : cuda runtime error : too many resources requested

5 participants