Motivation
Realtime pipelines have started growing per-stage override fields to tweak the contained models without editing their configs:
As richiejp noted in #10176, this "creates two layers of config" and doesn't scale — every new knob needs a new pipeline field. A generic mechanism would be cleaner and useful well beyond realtime.
Proposal: model overlays
Introduce a model overlay: a model config that inherits from a base model and merges its own overrides on top.
name: qwen3-no-think
base: qwen3-4b # inherit everything from qwen3-4b
reasoning:
disable: true # override just this
A realtime pipeline (or any caller) then just points at the overlay:
name: gpt-realtime
pipeline:
llm: qwen3-no-think
This subsumes pipeline.reasoning_effort / pipeline.disable_thinking (and any future per-pipeline overrides) with one mechanism: one base model, N overlays inheriting base settings and merging overrides. Users get quick per-model "profiles" without duplicating configs.
Notes / open questions
- Merge semantics: scalar override wins; how to handle maps/slices (replace vs merge)?
- Resolve overlays at config-load time so the rest of the stack sees a fully-merged
ModelConfig.
- Cycle detection on
base.
- Once overlays exist, the targeted
pipeline.* override fields could be deprecated in favor of overlays.
Follow-up from PR #10176 / #10184.
Motivation
Realtime pipelines have started growing per-stage override fields to tweak the contained models without editing their configs:
pipeline.reasoning_effort(feat: forward reasoning_effort to the backend so jinja models honor it #10184, merged)pipeline.disable_thinking(feat(realtime): stream the LLM / TTS / transcription pipeline stages #10176)As richiejp noted in #10176, this "creates two layers of config" and doesn't scale — every new knob needs a new pipeline field. A generic mechanism would be cleaner and useful well beyond realtime.
Proposal: model overlays
Introduce a model overlay: a model config that inherits from a base model and merges its own overrides on top.
A realtime pipeline (or any caller) then just points at the overlay:
This subsumes
pipeline.reasoning_effort/pipeline.disable_thinking(and any future per-pipeline overrides) with one mechanism: one base model, N overlays inheriting base settings and merging overrides. Users get quick per-model "profiles" without duplicating configs.Notes / open questions
ModelConfig.base.pipeline.*override fields could be deprecated in favor of overlays.Follow-up from PR #10176 / #10184.