fix(distributed): persist per-model load info so reconciler survives frontend restart#9981
Merged
Merged
Conversation
Adds a dedicated ModelLoadInfo table keyed by model name, decoupled from
the per-replica NodeModel rows. The reconciler can now recover model load
metadata after every NodeModel row has been removed (worker death,
eviction, MarkOffline reaping, frontend restart with stale heartbeats),
which is the read side of Bug-1 from the distributed mode bug hunt.
Registry exposes:
- UpsertModelLoadInfo: ON CONFLICT (model_name) update; last-write-wins,
matching the existing per-replica blob semantics under concurrent
multi-frontend dispatch.
- GetModelLoadInfo: read from the new table first; fall back to the
legacy NodeModel-blob scan for rows written before any frontend in
the cluster ran an UpsertModelLoadInfo (rolling-upgrade transition).
SetNodeModelLoadInfo (per-replica blob) is preserved for backward
compatibility and per-replica diagnostics; the dispatch-path hook in the
next commit calls both.
The new table joins the existing nodes AutoMigrate set under the same
schema-migration advisory lock.
Refs: Bug-1, docs/superpowers/specs/2026-05-24-distributed-mode-bug-hunt-findings.md
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7[1m]
scheduleAndLoad now writes the (backendType, ModelOptions blob) pair to the new ModelLoadInfo table in addition to the existing per-replica NodeModel.model_opts_blob field. The per-replica blob still works for the hot path; the per-model row outlives every NodeModel row going away, which is what unblocks the reconciler on the read side. Both writes are best-effort with warn-level logging on failure: a write miss here just means the reconciler may need a fresh inference request to repopulate, which is the pre-fix behavior. Concurrency: two frontends loading the same model at the same time both fire UpsertModelLoadInfo; ON CONFLICT (model_name) makes the row converge to whichever commits last. Matches the existing per-replica blob semantics. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7[1m]
Adds Ginkgo specs that prove the persistence layer behaves correctly and
that the reconciler actually recovers from the frontend-restart scenario
that was failing in production:
registry_test.go:
- per-model row survives RemoveAllNodeModelReplicas (the bug repro)
- ON CONFLICT (model_name) updates backend type + blob, last-write-wins
- legacy NodeModel-blob fallback still works (rolling-upgrade transition)
- GetModelLoadInfo returns ErrRecordNotFound when both sources are empty
- UpsertModelLoadInfo rejects empty model names
reconciler_test.go:
- Bug-1 end-to-end: with min_replicas=2, no NodeModel rows, but a
ModelLoadInfo row present, one reconcile tick fires two scheduler
calls. Pre-fix this returned "no load info" and the scheduler never
got called until a fresh inference request arrived.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7[1m]
Adds a bullet to the Replica Reconciler section explaining that per-model load metadata is persisted across frontend restarts via the new model_load_infos PostgreSQL table, so a rolling upgrade no longer needs a fresh inference request per model before the reconciler can replace dead replicas. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7[1m]
c9468fd to
558977e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
After a LocalAI frontend container restart (rolling upgrade, env-var change, etc.), the Replica Reconciler logs
failed to scale up replica ... no load infoevery 30s for each affected model and cannot maintainmin_replicasuntil someone sends a fresh inference request to that model. Production impact: in a window where a frontend has restarted but no traffic has hit a given model, a dying worker will not be replaced.Root cause
router.scheduleAndLoadalready persisted the(backendType, pb.ModelOptions)pair, but it stamped it onto theNodeModelrow of the loading replica. That row gets deleted by every healthy-row-removal path:MarkOfflinereaping stale workers,RemoveAllNodeModelReplicason stop,RemoveNodeModelon probe failure or stale health check,evictLRUAndFreeNode. When the last replica row goes away, the only copy of load info goes with it.GetModelLoadInfo's scan overNodeModelthen returnsgorm.ErrRecordNotFound, and the reconciler has nothing to replicate.Fix
New dedicated
ModelLoadInfotable keyed bymodel_name, decoupled fromNodeModel. Three small hooks:NodeRegistry.UpsertModelLoadInfo(ctx, modelName, backendType, optsBlob)writes the per-model row withON CONFLICT (model_name) DO UPDATE(last-write-wins under concurrent multi-frontend dispatch, matching the existing per-replica blob semantics).router.scheduleAndLoadcalls both the existing per-replicaSetNodeModelLoadInfo(backward compat / hot-path) and the new per-modelUpsertModelLoadInfoafter a successful load.NodeRegistry.GetModelLoadInfonow reads the per-model row first and falls back to the legacy NodeModel-blob scan so a rolling upgrade does not regress already-loaded models that were stamped only the old way.Migration: the new
model_load_infostable joins the existingAutoMigratecall under the same advisory lock; no separate migration step.Test plan
go build ./core/services/nodes/... ./pkg/model/... ./core/services/distributed/...succeedsgo test -race -count=1 ./core/services/nodes/...passes (233 specs, 7 new)RemoveAllNodeModelReplicas, ON CONFLICT last-write-wins, legacy NodeModel-blob fallback, empty-sourceErrRecordNotFound, empty-model-name rejectionmin_replicas=2, NodeModel rows wiped,ModelLoadInforow present, one reconcile tick calls the scheduler twice (before the fix: zero calls, "no load info" warning)Caveats
model_load_infos.model_opts_blob. That is identical to the existing per-NodeModel-row semantics; if stronger ordering is ever needed, the row carriesupdated_at.pb.ModelOptionsproto from the dispatch path. If a proto field is renamed/removed in a backward-incompatible way, stale rows would replay obsolete options; standard proto compatibility rules apply.AutoMigrateunder the existing advisory lock; no explicit roll-back is provided. Dropping the table on downgrade requires manual SQL (consistent with how the other distributed tables are managed today).Follow-up to #9976 (distributed-mode observability + middleware refactor).
Assisted-by: Claude:claude-opus-4-7[1m]