Fix load_balancing_loss_func incompatible with past_key_values#40908
Fix load_balancing_loss_func incompatible with past_key_values#40908tkj666 wants to merge 1 commit intohuggingface:mainfrom
load_balancing_loss_func incompatible with past_key_values#40908Conversation
|
[For maintainers] Suggested jobs to run (before merge) run-slow: ernie4_5_moe, gpt_oss, minimax, mixtral, qwen3_moe, qwen3_next |
ArthurZucker
left a comment
There was a problem hiding this comment.
hey, sorry what's the motivation? Load balancing loss function is for training 😓
Multiple models that support During inference, people may want to check the router logits, and currently due to the imcompatible shape of |
What does this PR do?
Changes the way
num_hidden_layers,batch_sizeandsequence_lengthare calculated, and slicesattention_mask, so that the shapes ofexpert_attention_maskandexpert_maskmatch, thus making it compatible with inference withpast_key_valuesFixes #30731
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@ArthurZucker