New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA BFloat16 signal windows #45155
CUDA BFloat16 signal windows #45155
Conversation
💊 CI failures summary and remediationsAs of commit cdf7c49 (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages: pytorch_linux_backward_compatibility_check_test (1/1)Step: "Run tests" (full log | diagnosis details | 🔁 rerun)
|
it is tested in |
test added for all dtypes, and arange is required for hann. arange will be tested in #44848 |
}); | ||
const scalar_t alpha = static_cast<scalar_t>((window_length - 1) / 2.0); | ||
gpu_kernel(iter, [=]GPU_LAMBDA(scalar_t a) -> scalar_t { | ||
return calc_i0(static_cast<scalar_t>(beta) * ::sqrt(1 - ::pow((a - alpha) / alpha, static_cast<scalar_t>(2.0)))) / calc_i0(static_cast<scalar_t>(beta)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the intermediate computations of i0 args here should still be in accscalar_t? I can merge as though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is i0 bw bound or compute bound? If it is not bw bound, does it still make sense to compute on accscalar_t?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, but it's doing internal computations in fp32 anyway, and here
static_cast<scalar_t>(beta) * ::sqrt(1 - ::pow((a - alpha) / alpha
is still doing repeated conversions and truncations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pytorch/aten/src/ATen/native/Math.h
Line 542 in 719d29d
inline c10::BFloat16 calc_i0(c10::BFloat16 a) { return calc_i0(static_cast<float>(a)); } |
OK,
calc_i0
is already computing on float32 anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
ping @ngimel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Looks like this op is never tested for the support of different dtypes?