Skip to content

Fix GraniteMoeHybrid _update_mamba_mask crash on attention-only models#45514

Open
tianhaocui wants to merge 1 commit intohuggingface:mainfrom
tianhaocui:fix-granitemoehybrid-mamba-mask
Open

Fix GraniteMoeHybrid _update_mamba_mask crash on attention-only models#45514
tianhaocui wants to merge 1 commit intohuggingface:mainfrom
tianhaocui:fix-granitemoehybrid-mamba-mask

Conversation

@tianhaocui
Copy link
Copy Markdown

Fixes #45507

Summary

GraniteMoeHybridModel._update_mamba_mask calls past_key_values.has_previous_state() without checking whether the model actually has mamba layers. When all layers are attention-only (no mamba layers in config.layers_block_type), has_previous_state() fails to find a LinearAttentionCacheLayerMixin layer and raises ValueError.

Fix

Check config.layers_block_type for mamba layers before calling has_previous_state(). If no mamba layers exist, return the attention mask as-is since the mamba mask optimization is irrelevant.

Applied to both modeling_granitemoehybrid.py and modular_granitemoehybrid.py.

When all layers are attention layers (no mamba layers),
_update_mamba_mask calls past_key_values.has_previous_state() which
tries to find a LinearAttentionCacheLayerMixin layer. Since none
exist, it raises ValueError.

Skip the has_previous_state check entirely when the model has no
mamba layers, as the mamba mask optimization is irrelevant in that
case.

Fixes huggingface#45507
@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: granitemoehybrid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GraniteMoEHybrid Model Calls Invalid Method

1 participant