Align torch implementation of Gated DeltaNet in Qwen3-Next with fla library. #40807

bozheng-hit · 2025-09-11T05:17:51Z

This pull request aims to align the PyTorch implementation of Gated DeltaNet in Qwen3-Next with the corresponding implementation in the fla library, ensuring consistency between the two implementations.

Cyrilvallez

Let's use a simpler scalar!

Cyrilvallez · 2025-09-11T09:03:55Z

src/transformers/models/qwen3_next/modeling_qwen3_next.py

+        head_dim = query.size(-1)
+        inv_scale = torch.rsqrt(torch.tensor(head_dim, device=query.device, dtype=query.dtype))


I think it would be easier to read if we keep it as a scalar as we usually do in the Attention modules 🤗

inv_scale = query.size(-1)**-0.5

github-actions · 2025-09-11T10:19:13Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: qwen3_next

…ibrary. (huggingface#40807) * align torch implementation of gdn with fla. * fix fla import. * fix * remove unused attr * fixes --------- Co-authored-by: bozheng-hit <dsoul0621@gmail.com> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

align torch implementation of gdn with fla.

cf9bf79

Cyrilvallez reviewed Sep 11, 2025

View reviewed changes

bozheng-hit added 2 commits September 11, 2025 18:09

fix fla import.

1af3e40

fix

e0c4db9

Cyrilvallez added 2 commits September 11, 2025 12:34

remove unused attr

7cac277

fixes

ab9f538

Cyrilvallez merged commit 6d36912 into huggingface:main Sep 11, 2025
19 of 21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Align torch implementation of Gated DeltaNet in Qwen3-Next with fla library. #40807

Align torch implementation of Gated DeltaNet in Qwen3-Next with fla library. #40807

Uh oh!

bozheng-hit commented Sep 11, 2025

Uh oh!

Cyrilvallez left a comment

Uh oh!

Cyrilvallez Sep 11, 2025

Uh oh!

github-actions bot commented Sep 11, 2025

Uh oh!

Uh oh!

Uh oh!

		head_dim = query.size(-1)
		inv_scale = torch.rsqrt(torch.tensor(head_dim, device=query.device, dtype=query.dtype))

Align torch implementation of Gated DeltaNet in Qwen3-Next with fla library. #40807

Align torch implementation of Gated DeltaNet in Qwen3-Next with fla library. #40807

Uh oh!

Conversation

bozheng-hit commented Sep 11, 2025

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 11, 2025

Uh oh!

Uh oh!

Uh oh!