Skip to content

llama/rope: gate fp64 hf_precompute_freqs_cis on cos/sin scaling#19308

Merged
JacobSzwejbka merged 1 commit intopytorch:mainfrom
rascani:main
May 6, 2026
Merged

llama/rope: gate fp64 hf_precompute_freqs_cis on cos/sin scaling#19308
JacobSzwejbka merged 1 commit intopytorch:mainfrom
rascani:main

Conversation

@rascani
Copy link
Copy Markdown
Contributor

@rascani rascani commented May 5, 2026

Summary

a79521b ("Add LongRoPE support and fp64 RoPE precompute for Phi-3 / Phi-4 family") unconditionally moved hf_precompute_freqs_cis to fp64 cos/sin precompute with a final cast to fp32. That works for the Phi-4 device validation that motivated the commit, but it broke test_static_attention.py::test_within_transformer on the Linux unittest runners (pull, pull-editable, trunk-release have been 100% red since the commit landed).

The test compares mha_transformer (built with use_hf_rope=False, taking the pure-fp32 precompute_freqs_cis path) against static_transformer (built with use_hf_rope=True, taking hf_precompute_freqs_cis) at rtol=1e-3, with shared weights. Before a79521b, both paths produced bit-identical fp32 cos/sin tables (verified empirically: 0/192 entries differed). After the commit, HF cos/sin diverge from non-HF by ~1 ULP in 38/192 entries; that drift compounds across 4 transformer layers and tips past rtol=1e-3 on the CI runners (Python 3.10, source-built torch). Local Python 3.12 stayed just barely within tolerance, which is why review missed it.

Gate the fp64 precompute on the property the original commit was actually protecting: a non-trivial cos/sin scale being applied. That is either LongRoPE active (Phi-3 / Phi-4 set short_factor and long_factor via config) or an explicit attention_factor != 1.0 passed through. Both cases preserve fp64; vanilla HF RoPE (Llama family, the test config) goes back to fp32 throughout and re-establishes bit-identical agreement with the non-HF path.

Authored with Claude Code.

Test plan

CI

a79521b ("Add LongRoPE support and fp64 RoPE precompute for Phi-3 /
Phi-4 family") unconditionally moved hf_precompute_freqs_cis to fp64
cos/sin precompute with a final cast to fp32. That works for the Phi-4
device validation that motivated the commit, but it broke
test_static_attention.py::test_within_transformer on the Linux unittest
runners (pull, pull-editable, trunk-release have been 100% red since the
commit landed).

The test compares mha_transformer (built with use_hf_rope=False, taking
the pure-fp32 precompute_freqs_cis path) against static_transformer
(built with use_hf_rope=True, taking hf_precompute_freqs_cis) at
rtol=1e-3, with shared weights. Before a79521b, both paths produced
bit-identical fp32 cos/sin tables (verified empirically: 0/192 entries
differed). After the commit, HF cos/sin diverge from non-HF by ~1 ULP
in 38/192 entries; that drift compounds across 4 transformer layers
and tips past rtol=1e-3 on the CI runners (Python 3.10, source-built
torch). Local Python 3.12 stayed just barely within tolerance, which
is why review missed it.

Gate the fp64 precompute on the property the original commit was
actually protecting: a non-trivial cos/sin scale being applied. That is
either LongRoPE active (Phi-3 / Phi-4 set short_factor and long_factor
via config) or an explicit attention_factor != 1.0 passed through. Both
cases preserve fp64; vanilla HF RoPE (Llama family, the test config)
goes back to fp32 throughout and re-establishes bit-identical agreement
with the non-HF path.

Authored with Claude Code.
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented May 5, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19308

Note: Links to docs will display an error until the docs builds have been completed.

⏳ 1 Pending, 2 Unrelated Failures

As of commit 4bae62e with merge base acffcb0 (image):

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 5, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@digantdesai digantdesai requested a review from SS-JIA May 5, 2026 21:58
@rascani rascani marked this pull request as ready for review May 5, 2026 22:11
@rascani rascani requested a review from lucylq as a code owner May 5, 2026 22:11
@JacobSzwejbka JacobSzwejbka merged commit 5dcf0ed into pytorch:main May 6, 2026
185 of 194 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants