Skip to content

Conversation

bozheng-hit
Copy link
Contributor

Fix the misalignment between the l2norm in GDN of Qwen3-Next and the implementation in the FLA library.

Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: qwen3_next

Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@Cyrilvallez Cyrilvallez merged commit 98a8078 into huggingface:main Sep 12, 2025
15 checks passed
ErfanBaghaei pushed a commit to ErfanBaghaei/transformers that referenced this pull request Sep 25, 2025
…implementation in the FLA library. (huggingface#40842)

* align torch implementation of gdn with fla.

* fix fla import.

* fix

* remove unused attr

* fixes

* strictly align l2norm in Qwen3-Next with FLA implementation.

---------

Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
vijayabhaskar-ev pushed a commit to vijayabhaskar-ev/transformers that referenced this pull request Oct 2, 2025
…implementation in the FLA library. (huggingface#40842)

* align torch implementation of gdn with fla.

* fix fla import.

* fix

* remove unused attr

* fixes

* strictly align l2norm in Qwen3-Next with FLA implementation.

---------

Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
yuchenxie4645 pushed a commit to yuchenxie4645/transformers that referenced this pull request Oct 4, 2025
…implementation in the FLA library. (huggingface#40842)

* align torch implementation of gdn with fla.

* fix fla import.

* fix

* remove unused attr

* fixes

* strictly align l2norm in Qwen3-Next with FLA implementation.

---------

Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants