Skip to content

Add new cohere2_moe model#46115

Merged
Cyrilvallez merged 29 commits into
mainfrom
cohere2_moe
May 20, 2026
Merged

Add new cohere2_moe model#46115
Cyrilvallez merged 29 commits into
mainfrom
cohere2_moe

Conversation

@Cyrilvallez
Copy link
Copy Markdown
Member

@Cyrilvallez Cyrilvallez commented May 20, 2026

What does this PR do?

As per the title!

Terrencezzj and others added 25 commits May 13, 2026 21:01
Port the Cohere2Moe (c4) model from cohere-transformers to upstream.
Adds modular and generated modeling files, configuration, and unit tests.
Registers Cohere2MoeConfig in CONFIG_MAPPING_NAMES, Cohere2MoeModel in
MODEL_MAPPING_NAMES, and Cohere2MoeForCausalLM in MODEL_FOR_CAUSAL_LM_MAPPING.
Adds "cohere2moe" -> "cohere2_moe" to SPECIAL_MODEL_TYPE_TO_MODULE_NAME so
the auto-mapping resolves the module directory correctly.

Co-authored-by: Cursor <cursoragent@cursor.com>
…e2Moe

- Remove CohereTokenizer.__init__ and replace with convert_to_native_format
  classmethod so the tokenizer loads fully from tokenizer.json (v5-style).
  Adds add_bos_token/add_eos_token defaults to match 4.56.2.6 behaviour.
- Update test_tokenization_cohere.py: disable test_tokenizer_from_extractor
  (no __init__ means extractor-based construction is unsupported), replace
  test_add_prefix_space_fast with tokenizer.json-based tests that verify
  pre_tokenizer/normalizer/decoder components are preserved on load.
- Add Cohere2MoeVisionIntegrationTest to test_modeling_cohere2_vision.py
  with forward and generate tests for Command A+ (cohere2moe backbone).

Co-authored-by: Cursor <cursoragent@cursor.com>
…e_position

- Rename 'input_embeds' key to 'inputs_embeds' to match create_causal_mask()
  signature (typo introduced in generated modeling file).
- Remove 'cache_position' from mask_kwargs; upstream create_causal_mask()
  does not accept this parameter (cohere-transformers-specific argument).

Result: all 6 integration tests now pass (2 skipped: flash_attn not installed).
Co-authored-by: Cursor <cursoragent@cursor.com>
Replace hardcoded expected outputs in test_model_flash_attn that were
written for command_a+_bf16 with the correct outputs from mhlv2_bf16_clean.

Co-authored-by: Cursor <cursoragent@cursor.com>
rope_scaling and sliding_window_pattern are consumed in __post_init__
and stored as derived attributes (rope_parameters and layer_types)
which the modeling code reads directly.

Co-authored-by: Cursor <cursoragent@cursor.com>
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, cohere2, cohere2_moe, cohere2_vision

@github-actions
Copy link
Copy Markdown
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=46115&sha=250645

@Cyrilvallez Cyrilvallez merged commit 9188b5e into main May 20, 2026
89 of 96 checks passed
@Cyrilvallez Cyrilvallez deleted the cohere2_moe branch May 20, 2026 11:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants