Skip to content

feat(providers): concurrent multi-model / multi-provider architecture for per-role and per-agent routing #648

@Aaronontheweb

Description

@Aaronontheweb

Context

NetclawChatClientProvider at src/Netclaw.Daemon/Configuration/NetclawChatClientProvider.cs today exposes three role-bound chat clients: Main, Fallback, Compaction. Each role resolves via GetClient(ModelRole) and returns one of the three constructed clients. If Compaction or Fallback is unset, it silently falls back to Main (_compaction ?? _main) — which itself violates the no-silent-fallbacks rule and should be made loud as part of this or a separate issue.

The running daemon verified during live testing has Main = Qwen3.5-27B-UD and no Compaction or Fallback configured, meaning every LLM call — main session, sub-agents, memory curation, title generation — hits the same Qwen endpoint. The role abstraction exists in code but in practice only one model is in flight.

As sub-agents become more useful and the feature surface grows (e.g., per-agent model selection is a natural extension once the single-file format in #647 lands), the single-provider assumption becomes limiting. Use cases:

  • Run a small fast local model (via llama.cpp or similar) for summarizer and code-analyst, but use a bigger cloud model (Claude or Qwen-Max) for research-assistant and main sessions
  • Run compaction on a dedicated cheap model (GPT-4o-mini, Haiku) while main sessions use something stronger
  • Let a sub-agent declare its preferred model in frontmatter (filed as a follow-on to this issue and to feat(subagents): single-file markdown format + contextual prompt + project-scoped discovery #647)
  • Failover from a cloud provider to a local model when the cloud endpoint is down

Current limitations

  • NetclawChatClientProvider constructor takes one ProviderPluginFactory + one ModelSelection — it assumes a 1:1 mapping between provider and role
  • ModelRole is an enum with only Main, Fallback, Compaction — no concept of "the small fast model" vs. "the big reasoning model"
  • IChatClientProvider.GetClient(ModelRole) returns the single client bound at construction; there's no per-call or per-agent dispatch
  • SubAgentSpawner.SpawnAsync at line 95 calls _chatClientProvider.GetClient(definition.ModelRole) — bound to the profile's declared role at spawn time, but the role set is the same three slots for everyone
  • The silent Compaction → Main fallback at NetclawChatClientProvider.cs:29 hides misconfiguration

Proposal (shape, not implementation)

This issue captures the problem and the architectural direction. Implementation details are deferred — the purpose is to:

  1. Name the problem so the single-file format issue (feat(subagents): single-file markdown format + contextual prompt + project-scoped discovery #647) can reference "model selection follow-on" with a link
  2. Establish a place for design discussion before a real implementation is attempted
  3. Collect known use cases from other issues so implementation-time scoping is accurate

Rough direction to evaluate:

  • Expand IChatClientProvider with a named-client registry in addition to role-based lookup (e.g., GetClient(string name) or GetClient(ModelTier tier))
  • Support multiple simultaneous ProviderPluginFactory instances wired from config
  • Preserve the three-role API for backward compatibility; add a second, finer-grained selector on top

Explicit non-goals

  • Real-time provider failover beyond what Fallback already allows
  • Load balancing across providers
  • Dynamic model swapping mid-session

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestsubagentsspawn_agent, SubAgentActor, definition loader, discovery context layer, and related features

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions