Skip to content

fix the test_hotswapping_compiled_model_xxx case failure in xpu#13535

Closed
sywangyi wants to merge 1 commit intohuggingface:mainfrom
sywangyi:host_swap_compiled
Closed

fix the test_hotswapping_compiled_model_xxx case failure in xpu#13535
sywangyi wants to merge 1 commit intohuggingface:mainfrom
sywangyi:host_swap_compiled

Conversation

@sywangyi
Copy link
Copy Markdown
Contributor

@sywangyi sywangyi commented Apr 21, 2026

following case fail
tests/models/transformers/test_models_transformer_bria.py::BriaTransformerLoRAHotSwapTests::test_hotswapping_compiled_model_linear_2
tests/models/transformers/test_models_transformer_bria.py::BriaTransformerLoRAHotSwapTests::test_hotswapping_compiled_model_both_linear_and_other_2
tests/models/transformers/test_models_transformer_chroma.py::ChromaTransformerLoRAHotSwapTests::test_hotswapping_compiled_model_linear_2
tests/models/unets/test_models_unet_2d_condition.py::UNet2DConditionModelLoRAHotSwapTests::test_hotswapping_compiled_model_conv2d_2
tests/models/unets/test_models_unet_2d_condition.py::UNet2DConditionModelLoRAHotSwapTests::test_hotswapping_compiled_model_both_linear_and_conv2d_2
tests/models/unets/test_models_unet_2d_condition.py::UNet2DConditionModelLoRAHotSwapTests::test_hotswapping_compiled_model_both_linear_and_other_2
tests/models/unets/test_models_unet_2d_condition.py::UNet2DConditionModelLoRAHotSwapTests::test_hotswapping_compiled_model_linear_2
tests/models/transformers/test_models_transformer_chroma.py::ChromaTransformerLoRAHotSwapTests::test_hotswapping_compiled_model_both_linear_and_other_2

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
@github-actions github-actions Bot added lora size/S PR with diff < 50 LOC labels Apr 21, 2026
@sywangyi
Copy link
Copy Markdown
Contributor Author

@sayakpaul please help review

@sayakpaul
Copy link
Copy Markdown
Member

@claude could you review this?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 21, 2026

Claude finished @sayakpaul's task in 4m 15s —— View job


Review of PR #13535

  • Read repository review guidelines
  • Analyze the PR diff
  • Read surrounding code for context
  • Post review feedback

Summary

This PR fixes test_hotswapping_compiled_model_* test failures on XPU by temporarily setting self._compiled_call_impl = True before calling peft's hotswap_adapter_from_state_dict, forcing it to use the in-place data copy path instead of swap_tensors.

The root cause is well understood and the approach is reasonable. When torch.compile wraps a model, method calls like model.load_lora_adapter(...) are forwarded to the inner model via OptimizedModule.__getattr__. So inside load_lora_adapter, self is the unwrapped original model — it has neither _orig_mod nor _compiled_call_impl, so peft treats it as uncompiled and uses swap_tensors. On XPU (no CUDA Graphs), the inductor code re-fetches parameters on each forward call and its assert_size_stride checks fail because the swapped-in tensor has the un-padded rank shape.

Issues

1. Attribute leak after cleanup (src/diffusers/loaders/peft.py:335)

_orig_compiled_call_impl = getattr(self, "_compiled_call_impl", None)
# ...
finally:
    self._compiled_call_impl = _orig_compiled_call_impl  # sets to None if attr didn't exist

If _compiled_call_impl didn't exist on self before, _orig_compiled_call_impl is None. The finally block then does self._compiled_call_impl = None, which creates the attribute rather than removing it. While None is falsy (so getattr(model, "_compiled_call_impl", False) still evaluates to falsy), hasattr(model, "_compiled_call_impl") would now return True where it previously returned False.

Suggested fix — restore properly:

_had_compiled_call_impl = hasattr(self, "_compiled_call_impl")
_orig_compiled_call_impl = getattr(self, "_compiled_call_impl", None)
if _supports_compiled_lora_hotswap and not hasattr(self, "_orig_mod"):
    self._compiled_call_impl = True
try:
    hotswap_adapter_from_state_dict(...)
finally:
    if _had_compiled_call_impl:
        self._compiled_call_impl = _orig_compiled_call_impl
    elif hasattr(self, "_compiled_call_impl"):
        delattr(self, "_compiled_call_impl")

Fix this →

2. Scope may be too broad for non-compiled models

The condition _supports_compiled_lora_hotswap and not hasattr(self, "_orig_mod") activates even for non-compiled models where only different ranks triggered enable_lora_hotswap() (i.e., do_compile=False but rank0 != rank1 in tests). In that case, forcing the in-place copy path is likely harmless (the weights are padded either way), but it's a behavioral change for non-XPU, non-compiled scenarios that weren't previously broken.

Consider narrowing the condition to only apply when the model is actually used with torch.compile:

# Only force the compiled path when enable_lora_hotswap was called AND
# the model is actually inside a compile wrapper (self is the unwrapped inner model).
# We can check this by looking for _compiled_call_impl on the outer wrapper,
# or by tracking an additional flag.

This is a nit — if the existing tests pass across all backends with the broader condition, it's acceptable as-is. But worth noting for maintainability.

3. Consider upstreaming to peft

This workaround compensates for peft's inability to detect that the model is inside a torch.compile wrapper when accessed as the unwrapped inner model. A more durable fix would be in peft's hotswap_adapter_from_state_dict itself — it could check for the presence of padded weights (or a flag) rather than relying on _compiled_call_impl / _orig_mod. Has an issue been filed upstream?

What looks good

  • The _supports_compiled_lora_hotswap flag on enable_lora_hotswap() (line 851) is a clean way to remember the model was prepared for compiled hotswap after _prepare_lora_hotswap_kwargs is consumed.
  • Using a try/finally to ensure cleanup is correct.
  • The block comment (lines 314–322) clearly explains the problem and rationale — helpful for future maintainers.

Verdict

The fix is correct in intent and should resolve the XPU test failures. Issue #1 (attribute leak) should be addressed before merge. Issue #2 is optional. Issue #3 is a longer-term suggestion.


Copy link
Copy Markdown
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm agreeing with Claude's 3rd point, namely that I'd rather like to see this fixed in PEFT. Is there a technical reason why it's not possible? Otherwise, a similar patch would need to be rolled out to transformers and to every user of PEFT hotswapping with a similar use case and XPU.

adapter_name=adapter_name,
config=lora_config,
)
# When enable_lora_hotswap was called, weights were padded to target_rank so that
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we keep things like this, how about turning this into a context manager to keep the code inside of load_lora_adapter more readable? It would also allow to re-use the context manager if it's ever needed.

@sywangyi
Copy link
Copy Markdown
Contributor Author

@BenjaminBossan thanks for the review, I put the fix in peft. huggingface/peft#3183

@sywangyi
Copy link
Copy Markdown
Contributor Author

this WA fix could be close and put the fix in peft

@sywangyi sywangyi closed this Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lora size/S PR with diff < 50 LOC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants