result of 2 ** s differs between eager mode and inductor + triton + cuda when in float32 denormal range #125557
Labels
oncall: pt2
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
upstream triton
Upstream Triton Issue
馃悰 Describe the bug
When I calculate 2.0 ** s on cuda for very small s so that the result is in the float32 denormal range, the result from PT eager mode is the correct denormalized floating point number, and the result from torch.compile + inductor + triton is 0.0. Is this expected?
Repro:
looking at generated triton code (https://gist.github.com/vkuzo/63a35ea68a58721f40806a32af04435d), it looks like the relevant line of triton code is
https://forums.developer.nvidia.com/t/more-accurate-version-of-exp2f-with-no-change-in-performance/243209 offers some hints that this cuda function does not support results in the subnormal range. What's the right behavior on PT side - can we do better than the silently diverging results today?
Versions
https://gist.github.com/vkuzo/b5f49e49e7bb30e4cc59d95a0fd4249f
cc @ezyang @msaroufim @bdhirsh @anijain2305 @chauhang
The text was updated successfully, but these errors were encountered: