Skip to content

Conversation

@vasqu
Copy link
Contributor

@vasqu vasqu commented Dec 3, 2025

Flaky test:

  • pytest tests/models/ernie4_5_moe/test_modeling_ernie4_5_moe.py -k test_load_balancing_loss --flake-finder --flake-runs=1000
  • Encounter 0 failures

Hope it's more robust again

@vasqu vasqu changed the title [DONT MERGE YET] Fix Ernie Moe Test Fix Ernie Moe Test Dec 3, 2025
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@vasqu vasqu marked this pull request as ready for review December 3, 2025 16:19
@vasqu vasqu requested a review from ydshieh December 3, 2025 16:19
Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

@is_flaky(max_attempts=2)

(already here before this PR)

not working well?

@github-actions
Copy link
Contributor

github-actions bot commented Dec 3, 2025

[For maintainers] Suggested jobs to run (before merge)

run-slow: ernie4_5_moe

@vasqu
Copy link
Contributor Author

vasqu commented Dec 3, 2025

I think the core changes are

  • Properly propagate padding tokens within the ids
    # Add padding tokens to input_ids
    padding_block = config.pad_token_id * torch.ones(input_ids.shape[0], pad_length, dtype=torch.int32).to(
    torch_device
    )
    padded_input_ids = torch.cat((padding_block, input_ids), dim=1) # this is to simulate padding to the left
    padded_attention_mask = padded_input_ids.ne(config.pad_token_id).to(torch_device)
  • And using the config pad token id, not some arbitrary number

I think either or both contributed to slightly different behavior compared to the other moe models

@vasqu vasqu enabled auto-merge (squash) December 3, 2025 17:05
@vasqu vasqu merged commit 9e82c77 into huggingface:main Dec 3, 2025
17 checks passed
@vasqu vasqu deleted the fix-erniemoe-test branch December 3, 2025 17:12
sarathc-cerebras pushed a commit to sarathc-cerebras/transformers that referenced this pull request Dec 7, 2025
* fix

* fix

* rm unnecessary config

* remove references
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants