New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix asserts in cuda code #39047
fix asserts in cuda code #39047
Conversation
💊 CI failures summary and remediationsAs of commit 2692d8f (more details on the Dr. CI page):
ci.pytorch.org: 1 failedThis comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 12 times. |
@@ -73,15 +73,15 @@ TensorInfo<T, IndexType>::TensorInfo(T* p, | |||
template <typename T, typename IndexType> | |||
void | |||
TensorInfo<T, IndexType>::reduceDim(int dim) { | |||
assert(dim < dims && dim >= 0); | |||
TORCH_INTERNAL_ASSERT(dim < dims && dim >= 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, these aren't run in CUDA?! Intruiging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I make no claims about the completeness of this PR, but this is certainly an improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@@ -104,6 +104,7 @@ struct TopKTypeConfig<long> { | |||
typedef unsigned long long int RadixType; | |||
|
|||
static inline __device__ RadixType convert(long v) { | |||
//static_assert fails on windows, so leave it as CUDA_KERNEL_ASSERT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...how does the CUDA_KERNEL_ASSERT not fail on windows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will fail, if someone tries to run caffe2 radix sort for long inputs on windows pytorch build. Hopefully, no one will do it, but I can't make it static_assert, because with static_assert windows build itself fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just make this into int64_t
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with caffe2 code and don't know if it's possible. It is also probably untested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it will break anything because of the assert sizeof(long) == 8
here.
adding milestone 1.5.1 since this seems worth getting into the release, because some of these are regressions. |
Summary: Gets rid of some in-kernel asserts where they can be replaced with static_asserts Replaces bare in-kernel `assert` in one case with `CUDA_KERNEL_ASSERT` where necessary replaces host code `assert`s with `TORCH_INTERNAL_ASSERT` Another group of asserts is in fractional max pooling kernels which should be fixed regardless pytorch#39044, the problems there are not just asserts. I've audited remaining cases of in-kernel asserts, and they are more like `TORCH_INTERNAL_ASSERT`, so they should not happen with invalid user data. I think it's ok to leave them as is. Pull Request resolved: pytorch#39047 Differential Revision: D21750392 Pulled By: ngimel fbshipit-source-id: e9417523a2c672284de3515933cb7ed166e56719
Summary: Gets rid of some in-kernel asserts where they can be replaced with static_asserts Replaces bare in-kernel `assert` in one case with `CUDA_KERNEL_ASSERT` where necessary replaces host code `assert`s with `TORCH_INTERNAL_ASSERT` Another group of asserts is in fractional max pooling kernels which should be fixed regardless #39044, the problems there are not just asserts. I've audited remaining cases of in-kernel asserts, and they are more like `TORCH_INTERNAL_ASSERT`, so they should not happen with invalid user data. I think it's ok to leave them as is. Pull Request resolved: #39047 Differential Revision: D21750392 Pulled By: ngimel fbshipit-source-id: e9417523a2c672284de3515933cb7ed166e56719
Gets rid of some in-kernel asserts where they can be replaced with static_asserts
Replaces bare in-kernel
assert
in one case withCUDA_KERNEL_ASSERT
where necessaryreplaces host code
assert
s withTORCH_INTERNAL_ASSERT
Another group of asserts is in fractional max pooling kernels which should be fixed regardless #39044, the problems there are not just asserts.
I've audited remaining cases of in-kernel asserts, and they are more like
TORCH_INTERNAL_ASSERT
, so they should not happen with invalid user data. I think it's ok to leave them as is.