Skip to content

Update cohere2_moe tp_plan#46189

Merged
Cyrilvallez merged 11 commits into
mainfrom
cohere-tp-plan
May 25, 2026
Merged

Update cohere2_moe tp_plan#46189
Cyrilvallez merged 11 commits into
mainfrom
cohere-tp-plan

Conversation

@Cyrilvallez
Copy link
Copy Markdown
Member

What does this PR do?

Since #46115 was merged to be released in 5.9 without #45028, the tp_plan was still using the old format. This PR updates it.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Comment thread src/transformers/models/cohere2_moe/configuration_cohere2_moe.py
Comment thread src/transformers/models/cohere2_moe/modeling_cohere2_moe.py
super().__init__(*args, **kwargs)
self.layer_types = ["full_attention", "sliding_attention"]
self.first_k_dense_replace = 1 # first layer will be MLP, 2nd will be MoE
self.logit_scale = 1.0 # needed for `test_training_overfit` - otherwise the loss does not go down fast enough
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put it in CausalLMModelTester pretty sure other models might hav ethis issue no?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other cohere models do not need it for the loss to go down... So I'd rather keep this isolated to cohere2_moe!

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: cohere2_moe

@Cyrilvallez Cyrilvallez merged commit e65c3a2 into main May 25, 2026
41 checks passed
@Cyrilvallez Cyrilvallez deleted the cohere-tp-plan branch May 25, 2026 07:40
vasqu added a commit that referenced this pull request May 28, 2026
* Revert "init FSDP through from_pretrained (#46102)"

This reverts commit 0588858.

* Revert "Fix FSDP2 and distributed checkpointing imports for older PyTorch versions (#46141)"

This reverts commit 634500b.

* Revert "Update cohere2_moe tp_plan (#46189)"

This reverts commit e65c3a2.

* Revert "FSDP + TP & native save/load distributed (#45028)"

This reverts commit 9ba8e85.

* fix

* they should have been deleted I think

* these are actually needed changes

* oops
kaixuanliu pushed a commit to kaixuanliu/transformers that referenced this pull request May 28, 2026
* update

* fix

* dynamic

* fix

* fix

* switch in config

* in tests as well

* oupsi

* add doc

* oupsi

* fix
yuchenxie4645 pushed a commit to yuchenxie4645/transformers that referenced this pull request May 28, 2026
* update

* fix

* dynamic

* fix

* fix

* switch in config

* in tests as well

* oupsi

* add doc

* oupsi

* fix
yuchenxie4645 pushed a commit to yuchenxie4645/transformers that referenced this pull request May 28, 2026
* Revert "init FSDP through from_pretrained (huggingface#46102)"

This reverts commit 0588858.

* Revert "Fix FSDP2 and distributed checkpointing imports for older PyTorch versions (huggingface#46141)"

This reverts commit 634500b.

* Revert "Update cohere2_moe tp_plan (huggingface#46189)"

This reverts commit e65c3a2.

* Revert "FSDP + TP & native save/load distributed (huggingface#45028)"

This reverts commit 9ba8e85.

* fix

* they should have been deleted I think

* these are actually needed changes

* oops
kashif pushed a commit to kashif/transformers that referenced this pull request Jun 1, 2026
* update

* fix

* dynamic

* fix

* fix

* switch in config

* in tests as well

* oupsi

* add doc

* oupsi

* fix
kashif pushed a commit to kashif/transformers that referenced this pull request Jun 1, 2026
* Revert "init FSDP through from_pretrained (huggingface#46102)"

This reverts commit 0588858.

* Revert "Fix FSDP2 and distributed checkpointing imports for older PyTorch versions (huggingface#46141)"

This reverts commit 634500b.

* Revert "Update cohere2_moe tp_plan (huggingface#46189)"

This reverts commit e65c3a2.

* Revert "FSDP + TP & native save/load distributed (huggingface#45028)"

This reverts commit 9ba8e85.

* fix

* they should have been deleted I think

* these are actually needed changes

* oops
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants