Improve YOCO static attention: reusable helper, correct tensor op, runtime guard (#18545) by viveknayakatmeta · Pull Request #18545 · pytorch/executorch

viveknayakatmeta · 2026-03-27T20:29:29Z

Summary:

replace the inline first_kv_shared index computation in _from_config with a reusable _is_kv_shared_layer() helper that matches llama_transformer.py's pattern and adds a missing first_shared <= 0 edge-case guard,
fix torch.cat → torch.stack in _process_normal_kv for SHA kv_to_share construction, since per-head K/V tensors are rank-3 and torch.cat(dim=1) concatenates seq dimensions incorrectly while torch.stack(dim=1) correctly inserts a new heads dimension,
change the forward() K/V skip guard from structural (if self.is_kv_shared_layer) to runtime (if shared_kv is not None) with an added assertion that self.is_kv_shared_layer holds.

Reviewed By: billmguo

Differential Revision: D97637849

pytorch-bot · 2026-03-27T20:29:33Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18545

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 2 Unrelated Failures

As of commit c81fb28 with merge base 6fccd5a ():

NEW FAILURES - The following jobs have failed:

Lint / lintrunner-mypy (gh)
>>> Lint for backends/arm/test/models/stable_diffusion/stable_diffusion_module_test_configs.py:
pull / test-llama-runner-qnn-linux (fp32, qnn_16a16w, qnn) / linux-job (gh)
RuntimeError: Command docker exec -t b29bf5422002802d805d4f87a720946abc7fca6a070723f897474e31ce440ad4 /exec failed with exit code 1

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / unittest-editable / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2026-03-27T20:29:36Z

@viveknayakatmeta has exported this pull request. If you are a Meta employee, you can view the originating Diff in D97637849.

github-actions · 2026-03-27T20:30:16Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

…ntime guard (pytorch#18545) Summary: Pull Request resolved: pytorch#18545 - replace the inline first_kv_shared index computation in _from_config with a reusable _is_kv_shared_layer() helper that matches llama_transformer.py's pattern and adds a missing first_shared <= 0 edge-case guard, - fix torch.cat → torch.stack in _process_normal_kv for SHA kv_to_share construction, since per-head K/V tensors are rank-3 and torch.cat(dim=1) concatenates seq dimensions incorrectly while torch.stack(dim=1) correctly inserts a new heads dimension, - change the forward() K/V skip guard from structural (if self.is_kv_shared_layer) to runtime (if shared_kv is not None) with an added assertion that self.is_kv_shared_layer holds. Differential Revision: D97637849

…ntime guard (pytorch#18545) Summary: - replace the inline first_kv_shared index computation in _from_config with a reusable _is_kv_shared_layer() helper that matches llama_transformer.py's pattern and adds a missing first_shared <= 0 edge-case guard, - fix torch.cat → torch.stack in _process_normal_kv for SHA kv_to_share construction, since per-head K/V tensors are rank-3 and torch.cat(dim=1) concatenates seq dimensions incorrectly while torch.stack(dim=1) correctly inserts a new heads dimension, - change the forward() K/V skip guard from structural (if self.is_kv_shared_layer) to runtime (if shared_kv is not None) with an added assertion that self.is_kv_shared_layer holds. Reviewed By: billmguo Differential Revision: D97637849

…ntime guard (pytorch#18545) Summary: Pull Request resolved: pytorch#18545 - replace the inline first_kv_shared index computation in _from_config with a reusable _is_kv_shared_layer() helper that matches llama_transformer.py's pattern and adds a missing first_shared <= 0 edge-case guard, - fix torch.cat → torch.stack in _process_normal_kv for SHA kv_to_share construction, since per-head K/V tensors are rank-3 and torch.cat(dim=1) concatenates seq dimensions incorrectly while torch.stack(dim=1) correctly inserts a new heads dimension, - change the forward() K/V skip guard from structural (if self.is_kv_shared_layer) to runtime (if shared_kv is not None) with an added assertion that self.is_kv_shared_layer holds. Reviewed By: billmguo Differential Revision: D97637849

…ntime guard (pytorch#18545) Differential Revision: D97637849 Pull Request resolved: pytorch#18545

viveknayakatmeta requested a review from lucylq as a code owner March 27, 2026 20:29

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 27, 2026

meta-codesync Bot added fb-exported meta-exported labels Mar 27, 2026

meta-codesync Bot changed the title ~~Improve YOCO static attention: reusable helper, correct tensor op, runtime guard~~ Improve YOCO static attention: reusable helper, correct tensor op, runtime guard (#18545) Mar 27, 2026

viveknayakatmeta force-pushed the export-D97637849 branch from a041cde to ae10e9e Compare March 27, 2026 20:33

billmguo approved these changes Mar 27, 2026

View reviewed changes

viveknayakatmeta force-pushed the export-D97637849 branch from ae10e9e to 02e129f Compare March 27, 2026 21:37

viveknayakatmeta force-pushed the export-D97637849 branch from 02e129f to c81fb28 Compare March 27, 2026 21:41

meta-codesync Bot merged commit 502d2de into pytorch:main Mar 28, 2026
157 of 163 checks passed

rascani pushed a commit to rascani/executorch that referenced this pull request Apr 1, 2026

Improve YOCO static attention: reusable helper, correct tensor op, ru…

074b3a6

…ntime guard (pytorch#18545) Differential Revision: D97637849 Pull Request resolved: pytorch#18545

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve YOCO static attention: reusable helper, correct tensor op, runtime guard (#18545)#18545

Improve YOCO static attention: reusable helper, correct tensor op, runtime guard (#18545)#18545
meta-codesync[bot] merged 1 commit intopytorch:mainfrom
viveknayakatmeta:export-D97637849

viveknayakatmeta commented Mar 27, 2026 •

edited by meta-codesync Bot

Loading

Uh oh!

pytorch-bot Bot commented Mar 27, 2026 •

edited

Loading

Uh oh!

meta-codesync Bot commented Mar 27, 2026

Uh oh!

github-actions Bot commented Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

viveknayakatmeta commented Mar 27, 2026 • edited by meta-codesync Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18545

❌ 2 New Failures, 2 Unrelated Failures

Uh oh!

meta-codesync Bot commented Mar 27, 2026

Uh oh!

github-actions Bot commented Mar 27, 2026

This PR needs a release notes: label

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

viveknayakatmeta commented Mar 27, 2026 •

edited by meta-codesync Bot

Loading

pytorch-bot Bot commented Mar 27, 2026 •

edited

Loading

This PR needs a `release notes:` label