Skip to content

[Misc] add multi modal model fallback in generic model config parse#589

Merged
pallasathena92 merged 1 commit intomainfrom
feat/add-nemotron-h-model-config
Apr 25, 2026
Merged

[Misc] add multi modal model fallback in generic model config parse#589
pallasathena92 merged 1 commit intomainfrom
feat/add-nemotron-h-model-config

Conversation

@pallasathena92
Copy link
Copy Markdown
Collaborator

What this PR does

  • Enhance GenericModelConfig fallback to probe nested sub-configs (text_config, llm_config, language_config) commonly found in multimodal models
  • Add MoE-aware parameter estimation for unregistered Mixture-of-Experts models
  • Move estimateMoEParams() from deepseek_vl.go to interface.go as a shared utility

Why we need it

When an unregistered multimodal model (e.g., composite vision-language-audio models) falls back to GenericModelConfig, critical fields like hidden_size, max_position_embeddings, and transformers_version are
missed because they are nested under keys like llm_config, text_config, or language_config. This causes:

  • GetContextLength() → 0
  • GetParameterCount() → 0 (or massively underestimated for MoE models)
  • GetTransformerVersion() → ""
  • HasVision() → false

What changes in this PR

pkg/hfutil/modelconfig/interface.go

  • Add MaxSequenceLength, MoE fields (NRoutedExperts, NSharedExperts, MoeIntermediateSize), and hasVisionConfig to GenericModelConfig
  • Add probeNestedConfig() helper that extracts fields from nested sub-configs and detects vision_config presence
  • Override GetContextLength() to fall back to MaxSequenceLength
  • Override HasVision() to return detected vision config presence
  • Override GetParameterCount() to use MoE-aware estimation when experts are detected
  • Move estimateMoEParams() here from deepseek_vl.go (general-purpose, not DeepSeek-specific)
  • Call probeNestedConfig() unconditionally in loadGenericModelConfig() (safe — only fills zero-valued fields)

pkg/hfutil/modelconfig/deepseek_vl.go

  • Remove estimateMoEParams() (moved to interface.go, same package — no behavior change)

pkg/hfutil/modelconfig/interface_test.go

  • TestGenericMultimodalFallback — nested llm_config + vision_config with MoE fields
  • TestGenericMultimodalFallback_TextConfig — text_config variant
  • TestGenericMultimodalFallback_LanguageConfig — language_config variant
  • TestGenericMoEFallback — flat MoE config with num_local_experts field name variant
  • TestGenericContextLengthMaxSequenceLength — max_sequence_length fallback

Fixes #

How to test

Checklist

  • Tests added/updated (if applicable)
  • Docs updated (if applicable)
  • make test passes locally

@github-actions github-actions Bot added models Model configuration changes model-agent Model agent changes tests Test changes labels Apr 25, 2026
@pallasathena92 pallasathena92 force-pushed the feat/add-nemotron-h-model-config branch from 5717c5f to f0d62c3 Compare April 25, 2026 03:44
@pallasathena92 pallasathena92 merged commit e8217ff into main Apr 25, 2026
9 checks passed
@pallasathena92 pallasathena92 deleted the feat/add-nemotron-h-model-config branch April 25, 2026 04:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

model-agent Model agent changes models Model configuration changes tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants