Skip to content

Conversation

bozheng-hit
Copy link
Contributor

This pull request aims to align the PyTorch implementation of Gated DeltaNet in Qwen3-Next with the corresponding implementation in the fla library, ensuring consistency between the two implementations.

Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use a simpler scalar!

Comment on lines 449 to 450
head_dim = query.size(-1)
inv_scale = torch.rsqrt(torch.tensor(head_dim, device=query.device, dtype=query.dtype))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be easier to read if we keep it as a scalar as we usually do in the Attention modules 🤗

inv_scale = query.size(-1)**-0.5

Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: qwen3_next

@Cyrilvallez Cyrilvallez merged commit 6d36912 into huggingface:main Sep 11, 2025
19 of 21 checks passed
vijayabhaskar-ev pushed a commit to vijayabhaskar-ev/transformers that referenced this pull request Oct 2, 2025
…ibrary. (huggingface#40807)

* align torch implementation of gdn with fla.

* fix fla import.

* fix

* remove unused attr

* fixes

---------

Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
yuchenxie4645 pushed a commit to yuchenxie4645/transformers that referenced this pull request Oct 4, 2025
…ibrary. (huggingface#40807)

* align torch implementation of gdn with fla.

* fix fla import.

* fix

* remove unused attr

* fixes

---------

Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants