Skip to content

Fix NaN in all 3 gated deltanet Helion references and submission#131

Merged
S1ro1 merged 2 commits intogpu-mode:mainfrom
yf225:fix-gated-deltanet-reference-nan
Mar 14, 2026
Merged

Fix NaN in all 3 gated deltanet Helion references and submission#131
S1ro1 merged 2 commits intogpu-mode:mainfrom
yf225:fix-gated-deltanet-reference-nan

Conversation

@yf225
Copy link
Copy Markdown
Contributor

@yf225 yf225 commented Mar 14, 2026

Summary

  • Fix exp(g_diff) overflow causing inf * 0 = NaN in all 3 gated deltanet Helion kernels
  • ref_kernel in chunk_fwd_o: exp(g_i - g_j) was computed before applying the causal mask — zero out g_diff in the upper triangle before exp()
  • _chunk_scaled_dot_kkt_fwd_eager in all 3 kernels (chunk_fwd_o, chunk_fwd_h, recompute_w_u): exp(g_diff) * strict_lower overflows in the upper triangle — zero out g_diff outside the strict lower triangle before exp()
  • submission.py for chunk_fwd_o: exp(-g) overflows when g is very negative — restructure to compute exp(g_i - g_j) inline instead

Root cause: g is a cumulative sum of negative increments, so values get very negative. Differences g_i - g_j in the upper triangle can be large positive, overflowing exp() to inf. Multiplying inf * 0 (from the mask) produces NaN.

Test plan

  • Verified all 3 test shapes pass locally with no NaN in reference or submission for all 3 kernels
  • Confirmed torch.allclose(out, ref, rtol=1e-2, atol=1e-2) for all test shapes

🤖 Generated with Claude Code

yf225 and others added 2 commits March 14, 2026 10:45
The reference kernel computed exp(g_i - g_j) before applying the causal
mask. When g values are very negative (cumulative sums of negative
increments), the upper-triangle differences g_i - g_j overflow exp() to
inf, and inf * 0 (causal mask) produces NaN.

Fix: zero out g_diff in the upper triangle before calling exp(), so we
never compute exp(large_positive). Apply the same fix in the submission
kernel which had a similar issue with exp(-g) overflowing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…t kernels

Zero out g_diff outside the strict lower triangle before calling exp(),
preventing inf * 0 = NaN when upper-triangle g differences overflow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@yf225 yf225 changed the title Fix NaN in gated_deltanet_chunk_fwd_o reference and submission Fix NaN in all 3 gated deltanet Helion references and submission Mar 14, 2026
@S1ro1 S1ro1 merged commit ec71b91 into gpu-mode:main Mar 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants