feat(providers): concurrent multi-model / multi-provider architecture for per-role and per-agent routing

## Context

`NetclawChatClientProvider` at `src/Netclaw.Daemon/Configuration/NetclawChatClientProvider.cs` today exposes three role-bound chat clients: `Main`, `Fallback`, `Compaction`. Each role resolves via `GetClient(ModelRole)` and returns one of the three constructed clients. If `Compaction` or `Fallback` is unset, it silently falls back to `Main` (`_compaction ?? _main`) — which itself violates the no-silent-fallbacks rule and should be made loud as part of this or a separate issue.

The running daemon verified during live testing has `Main = Qwen3.5-27B-UD` and no Compaction or Fallback configured, meaning every LLM call — main session, sub-agents, memory curation, title generation — hits the same Qwen endpoint. The role abstraction exists in code but in practice only one model is in flight.

As sub-agents become more useful and the feature surface grows (e.g., per-agent model selection is a natural extension once the single-file format in #647 lands), the single-provider assumption becomes limiting. Use cases:

- Run a small fast local model (via llama.cpp or similar) for `summarizer` and `code-analyst`, but use a bigger cloud model (Claude or Qwen-Max) for `research-assistant` and main sessions
- Run compaction on a dedicated cheap model (GPT-4o-mini, Haiku) while main sessions use something stronger
- Let a sub-agent declare its preferred model in frontmatter (filed as a follow-on to this issue and to #647)
- Failover from a cloud provider to a local model when the cloud endpoint is down

## Current limitations

- `NetclawChatClientProvider` constructor takes one `ProviderPluginFactory` + one `ModelSelection` — it assumes a 1:1 mapping between provider and role
- `ModelRole` is an enum with only `Main`, `Fallback`, `Compaction` — no concept of "the small fast model" vs. "the big reasoning model"
- `IChatClientProvider.GetClient(ModelRole)` returns the single client bound at construction; there's no per-call or per-agent dispatch
- `SubAgentSpawner.SpawnAsync` at line 95 calls `_chatClientProvider.GetClient(definition.ModelRole)` — bound to the profile's declared role at spawn time, but the role set is the same three slots for everyone
- The silent Compaction → Main fallback at `NetclawChatClientProvider.cs:29` hides misconfiguration

## Proposal (shape, not implementation)

This issue captures the problem and the architectural direction. Implementation details are deferred — the purpose is to:

1. Name the problem so the single-file format issue (#647) can reference "model selection follow-on" with a link
2. Establish a place for design discussion before a real implementation is attempted
3. Collect known use cases from other issues so implementation-time scoping is accurate

Rough direction to evaluate:

- Expand `IChatClientProvider` with a named-client registry in addition to role-based lookup (e.g., `GetClient(string name)` or `GetClient(ModelTier tier)`)
- Support multiple simultaneous `ProviderPluginFactory` instances wired from config
- Preserve the three-role API for backward compatibility; add a second, finer-grained selector on top

## Explicit non-goals

- Real-time provider failover beyond what `Fallback` already allows
- Load balancing across providers
- Dynamic model swapping mid-session

## Related

- #647 — sub-agent single-file format + contextual prompt (benefits from this once landed)
- Per-agent model in frontmatter (filed as a follow-on after this issue)
- #619 — Extend OpenAiCompatibleChatClient for vLLM backend compatibility
- #621 — Anthropic cache_control ephemeral markers
- #609 — Session-sticky LLM routing (closed)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(providers): concurrent multi-model / multi-provider architecture for per-role and per-agent routing #648

Context

Current limitations

Proposal (shape, not implementation)

Explicit non-goals

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat(providers): concurrent multi-model / multi-provider architecture for per-role and per-agent routing #648

Description

Context

Current limitations

Proposal (shape, not implementation)

Explicit non-goals

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions