Skip to content

fix(gallery): keep auto-upgrade off non-dev backends when -development is installed#9736

Merged
mudler merged 1 commit into
masterfrom
fix/dev-backend-auto-upgrade-suppression
May 9, 2026
Merged

fix(gallery): keep auto-upgrade off non-dev backends when -development is installed#9736
mudler merged 1 commit into
masterfrom
fix/dev-backend-auto-upgrade-suppression

Conversation

@mudler
Copy link
Copy Markdown
Owner

@mudler mudler commented May 9, 2026

A -development backend variant (e.g. cuda12-llama-cpp-development) shares its alias with the stable counterpart and is meant to be a drop-in replacement via ListSystemBackends alias resolution. Two paths in the auto-upgrade flow let the stable variant slip back in on top of the user's explicit dev pick:

  1. ListSystemBackends emits a synthetic alias row keyed by the alias name that re-uses the chosen concrete's metadata pointer. In distributed mode, the worker's handleBackendList serialised that row over NATS as {Name: <alias>, URI: <dev URI>, Digest: <dev>} — the frontend can't reconstruct the alias relationship, and the wire-rebuilt row then carried Metadata.Name = <alias> and resolved against an unrelated gallery entry on the next upgrade check.
  2. CheckUpgradesAgainst happily iterated the synthetic row in single-node too. Today the duplicate gallery lookup is harmless because both rows share the same Metadata.Name, but any gallery change that gives a meta backend a version, or any concrete sharing its alias with a dev counterpart, would surface a phantom non-dev upgrade and auto-upgrade would install it — shadowing the dev one through alias-token preference.

Two layered fixes:

  • core/services/worker/lifecycle.go (handleBackendList): drop rows where the map key differs from b.Metadata.Name. Concrete and meta entries always have key == Metadata.Name; only synthetic aliases violate it. Workers now report only what's actually on disk; the per-node UI listing and CheckUpgrades both stop seeing phantoms.
  • core/gallery/upgrade.go (CheckUpgradesAgainst): iterate by key, skip rows where key != Metadata.Name (belt-and-suspenders for any caller-supplied installed set), and apply the dev-aware rule — build a set of installed Metadata.Names and drop any non-dev candidate X whose X-<devSuffix> counterpart is installed. Uses the configured dev suffix from getFallbackTagValues(systemState).

Manual POST /api/backends/upgrade/<name> is unaffected: it goes straight through bm.UpgradeBackend(name) without consulting the suppression list, so users who genuinely want the stable variant upgraded can still trigger it explicitly.

Tests in core/gallery/upgrade_test.go cover three cases under "CheckUpgradesAgainst (distributed)": dev-only installed → only the dev surfaces; both variants installed → dev still wins; synthetic alias row is ignored. Generic backend names are used to avoid the capability filter dropping cuda-prefixed entries on a CPU-only host.

Assisted-by: Claude:claude-opus-4-7

…t is installed

A `-development` backend variant (e.g. `cuda12-llama-cpp-development`)
shares its `alias` with the stable counterpart and is meant to be a
drop-in replacement via ListSystemBackends alias resolution. Two paths
in the auto-upgrade flow let the stable variant slip back in on top of
the user's explicit dev pick:

1. ListSystemBackends emits a synthetic alias row keyed by the alias
   name that re-uses the chosen concrete's metadata pointer. In
   distributed mode, the worker's handleBackendList serialised that
   row over NATS as `{Name: <alias>, URI: <dev URI>, Digest: <dev>}`
   — the frontend can't reconstruct the alias relationship, and the
   wire-rebuilt row then carried `Metadata.Name = <alias>` and
   resolved against an unrelated gallery entry on the next upgrade
   check.
2. CheckUpgradesAgainst happily iterated the synthetic row in
   single-node too. Today the duplicate gallery lookup is harmless
   because both rows share the same `Metadata.Name`, but any gallery
   change that gives a meta backend a version, or any concrete
   sharing its alias with a dev counterpart, would surface a phantom
   non-dev upgrade and auto-upgrade would install it — shadowing the
   dev one through alias-token preference.

Two layered fixes:

- `core/services/worker/lifecycle.go` (`handleBackendList`): drop
  rows where the map key differs from `b.Metadata.Name`. Concrete
  and meta entries always have `key == Metadata.Name`; only synthetic
  aliases violate it. Workers now report only what's actually on disk;
  the per-node UI listing and CheckUpgrades both stop seeing phantoms.
- `core/gallery/upgrade.go` (`CheckUpgradesAgainst`): iterate by key,
  skip rows where `key != Metadata.Name` (belt-and-suspenders for any
  caller-supplied installed set), and apply the dev-aware rule —
  build a set of installed `Metadata.Name`s and drop any non-dev
  candidate `X` whose `X-<devSuffix>` counterpart is installed. Uses
  the configured dev suffix from `getFallbackTagValues(systemState)`.

Manual `POST /api/backends/upgrade/<name>` is unaffected: it goes
straight through `bm.UpgradeBackend(name)` without consulting the
suppression list, so users who genuinely want the stable variant
upgraded can still trigger it explicitly.

Tests in core/gallery/upgrade_test.go cover three cases under
"CheckUpgradesAgainst (distributed)": dev-only installed → only the
dev surfaces; both variants installed → dev still wins; synthetic
alias row is ignored. Generic backend names are used to avoid the
capability filter dropping cuda-prefixed entries on a CPU-only host.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
@mudler mudler added the bug Something isn't working label May 9, 2026
@mudler mudler merged commit 3568b28 into master May 9, 2026
52 checks passed
@mudler mudler deleted the fix/dev-backend-auto-upgrade-suppression branch May 9, 2026 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant