Skip to content

Model overlays: base model + merged overrides (generalize per-pipeline overrides) #10185

@localai-bot

Description

@localai-bot

Motivation

Realtime pipelines have started growing per-stage override fields to tweak the contained models without editing their configs:

As richiejp noted in #10176, this "creates two layers of config" and doesn't scale — every new knob needs a new pipeline field. A generic mechanism would be cleaner and useful well beyond realtime.

Proposal: model overlays

Introduce a model overlay: a model config that inherits from a base model and merges its own overrides on top.

name: qwen3-no-think
base: qwen3-4b          # inherit everything from qwen3-4b
reasoning:
  disable: true         # override just this

A realtime pipeline (or any caller) then just points at the overlay:

name: gpt-realtime
pipeline:
  llm: qwen3-no-think

This subsumes pipeline.reasoning_effort / pipeline.disable_thinking (and any future per-pipeline overrides) with one mechanism: one base model, N overlays inheriting base settings and merging overrides. Users get quick per-model "profiles" without duplicating configs.

Notes / open questions

  • Merge semantics: scalar override wins; how to handle maps/slices (replace vs merge)?
  • Resolve overlays at config-load time so the rest of the stack sees a fully-merged ModelConfig.
  • Cycle detection on base.
  • Once overlays exist, the targeted pipeline.* override fields could be deprecated in favor of overlays.

Follow-up from PR #10176 / #10184.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions