Skip to content

Fix pageable H2D copies in Gated DeltaNet PyTorch fallback#45665

Merged
Cyrilvallez merged 2 commits intohuggingface:mainfrom
ruixiang63:fix/gdn-h2d-copy
Apr 28, 2026
Merged

Fix pageable H2D copies in Gated DeltaNet PyTorch fallback#45665
Cyrilvallez merged 2 commits intohuggingface:mainfrom
ruixiang63:fix/gdn-h2d-copy

Conversation

@ruixiang63
Copy link
Copy Markdown
Contributor

@ruixiang63 ruixiang63 commented Apr 27, 2026

What does this PR do?

This PR removes unnecessary pageable Host-to-Device copies in the pure-PyTorch fallback for Gated DeltaNet (torch_chunk_gated_delta_rule and torch_recurrent_gated_delta_rule), used by Qwen3-Next, Qwen3.5, Qwen3.5 MoE and OLMo-Hybrid when flash-linear-attention is not installed.

The current implementation initializes last_recurrent_state and core_attn_out via torch.zeros(...).to(value).

Because torch.zeros(...) is called without a device argument, the tensor is first allocated in pageable host memory and zero-filled on CPU, then .to(value) triggers: pageable H2D copy and an implicit synchronization (.to() defaults to non_blocking=False).
e.g. In nsys traces it looks like following:
image

This PR allocates these tensors directly on the target device and dtype, eliminating the H2D copies and the implicit syncs entirely.

  • Fixes a real perf regression observable in Nsight Systems traces of end-to-end training.
  • Affects 4 model families (Qwen3-Next / Qwen3.5 / Qwen3.5-MoE / OLMo-Hybrid), all of which inherit this fallback.

Nsight system traces:

  • Without this PR:
image
  • With this PR:
image

Code Agent Policy

The Transformers repo is currently being overwhelmed by a large number of PRs and issue comments written by
code agents. We are currently bottlenecked by our ability to review and respond to them. As a result,
we ask that new users do not submit pure code agent PRs at this time.
You may use code agents in drafting or to help you diagnose issues. We'd also ask autonomous "OpenClaw"-like agents
not to open any PRs or issues for the moment.

PRs that appear to be fully agent-written will probably be closed without review, and we may block users who do this
repeatedly or maliciously.

This is a rapidly-evolving situation that's causing significant shockwaves in the open-source community. As a result,
this policy is likely to be updated regularly in the near future. For more information, please read CONTRIBUTING.md.

  • I confirm that this is not a pure code agent PR.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case. Not an existing issue.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings. No need for documentation.
  • Did you write any new necessary tests? No need to write new tests.

AI-assisted disclosure

Didn't use AI

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

cc @Cyrilvallez @zucchini-nlp @ArthurZucker

Copy link
Copy Markdown
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, it's indeed always better to initialize directly rather than calling to

@Cyrilvallez Cyrilvallez enabled auto-merge April 28, 2026 05:25
@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: olmo_hybrid, qwen3_5, qwen3_5_moe, qwen3_next

@ruixiang63
Copy link
Copy Markdown
Contributor Author

@Cyrilvallez Thanks for the approval! It looks like there are still two workflows awaiting maintainer approval, and doc_build_status_check is still pending. Could you please approve/run them when you get a chance? Thanks!

@Cyrilvallez Cyrilvallez added this pull request to the merge queue Apr 28, 2026
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Merged via the queue into huggingface:main with commit ca72aa0 Apr 28, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants