Add LoRA support to StaticAttention for split_mha=False by lucylq · Pull Request #18345 · pytorch/executorch

lucylq · 2026-03-19T21:46:16Z

When ModelArgs.target_modules is set, create LoRALinear instead of
nn.Linear for targeted q/k/v/o projections. Only applies to
split_mha=False path. Existing behavior unchanged when target_modules
is None.

Authored with Claude.

[ghstack-poisoned]

pytorch-bot · 2026-03-19T21:46:20Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18345

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 2 Unrelated Failures

As of commit de2e79b with merge base 02bad9d ():

NEW FAILURE - The following job has failed:

pull / unittest / macos / macos-job (gh)
backends/xnnpack/test/ops/test_conv2d.py::TestConv2d::test_fp16_conv2d

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / unittest-editable / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

lucylq · 2026-03-19T21:46:21Z

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]

Copilot

Pull request overview

This PR adds LoRA-aware projection construction to the StaticAttention implementation when using the non-split MHA path (split_mha=False), so that q/k/v/o projections can become LoRALinear based on ModelArgs.target_modules, while keeping existing behavior unchanged when target_modules is None.

Changes:

For split_mha=False, conditionally instantiate LoRALinear for q/k/v projections when their corresponding target names are present in config.target_modules.
For split_mha=False, conditionally instantiate LoRALinear for the output projection (wo) when output_proj or o_proj is targeted.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-19T22:03:07Z

examples/models/llama/static_attention.py

+            has_lora = config.target_modules is not None
+            _PROJ_TARGET = {
+                "wqs": ("q_proj", self.dim, self.head_dim * self.n_heads),
+                "wks": ("k_proj", self.dim, self.head_dim * self.n_kv_heads),
+                "wvs": ("v_proj", self.dim, self.head_dim * self.n_kv_heads),
+            }
+            for attr, (target, in_dim, out_dim) in _PROJ_TARGET.items():
+                if has_lora and target in config.target_modules:
+                    proj = LoRALinear(
+                        in_dim=in_dim,
+                        out_dim=out_dim,
+                        rank=config.r,
+                        alpha=config.lora_alpha,
+                        use_bias=self.attention_qkv_bias,
                    )


When config.target_modules is set but config.r and/or config.lora_alpha are left as None (both are Optional in ModelArgs), this path will attempt to construct LoRALinear(rank=None, alpha=None) and fail with a low-signal TypeError. Consider adding an explicit validation (ValueError with a clear message) before creating any LoRALinear modules, similar to LoRAFeedForward.

Copilot · 2026-03-19T22:03:07Z

examples/models/llama/static_attention.py

+            has_lora = config.target_modules is not None
+            _PROJ_TARGET = {
+                "wqs": ("q_proj", self.dim, self.head_dim * self.n_heads),
+                "wks": ("k_proj", self.dim, self.head_dim * self.n_kv_heads),
+                "wvs": ("v_proj", self.dim, self.head_dim * self.n_kv_heads),
+            }
+            for attr, (target, in_dim, out_dim) in _PROJ_TARGET.items():
+                if has_lora and target in config.target_modules:
+                    proj = LoRALinear(
+                        in_dim=in_dim,
+                        out_dim=out_dim,
+                        rank=config.r,
+                        alpha=config.lora_alpha,
+                        use_bias=self.attention_qkv_bias,
                    )
-                ]
-            )
+                else:
+                    proj = nn.Linear(in_dim, out_dim, bias=self.attention_qkv_bias)
+                setattr(self, attr, nn.ModuleList([proj]))



New behavior is introduced here (direct StaticAttention(..., split_mha=False) now conditionally builds LoRALinear based on config.target_modules), but existing tests in test_static_attention.py only exercise LoRA via from_attention_mha. Please add a unit test that directly constructs StaticAttention with split_mha=False and target_modules set, and asserts the expected projection types and a forward equivalence check.

[ghstack-poisoned]

Update

4c90db6

[ghstack-poisoned]

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 19, 2026

This was referenced Mar 19, 2026

Update lora def #18342

Open

Refactor: extract _prepare_eager_model() from CoreML export main() #18343

Open

Add skip_names and LoRALinear guard to replace_linear_with_split_linear #18346

Open

Add LoRA multimethod export to CoreML static LLM export #18347

Open

lucylq added 2 commits March 19, 2026 14:50

Update

bd071ee

[ghstack-poisoned]

Update

be27f2c

[ghstack-poisoned]

lucylq requested a review from Copilot March 19, 2026 21:59

Copilot started reviewing on behalf of lucylq March 19, 2026 22:00 View session

Update

2de0d83

[ghstack-poisoned]

Copilot AI reviewed Mar 19, 2026

View reviewed changes

lucylq requested a review from sxu March 19, 2026 22:33

lucylq mentioned this pull request Mar 19, 2026

[wip] Add CI test for CoreML LoRA multimethod export #18354

Open

Update

de2e79b

[ghstack-poisoned]

lucylq mentioned this pull request Mar 19, 2026

Add --method argument to CoreML static LLM runner #18355

Open

lucylq requested a review from billmguo March 20, 2026 00:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LoRA support to StaticAttention for split_mha=False#18345

Add LoRA support to StaticAttention for split_mha=False#18345
lucylq wants to merge 5 commits intogh/lucylq/142/headfrom
gh/lucylq/144/head

lucylq commented Mar 19, 2026

Uh oh!

pytorch-bot bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

lucylq commented Mar 19, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 19, 2026

Uh oh!

Copilot AI Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lucylq commented Mar 19, 2026

Uh oh!

pytorch-bot bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18345

❌ 1 New Failure, 2 Unrelated Failures

Uh oh!

lucylq commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot bot commented Mar 19, 2026 •

edited

Loading

lucylq commented Mar 19, 2026 •

edited

Loading