fix(distributed): scope Upgrade All to nodes that have the backend installed#9678
Merged
Conversation
…stalled In distributed mode the React UI's "Upgrade All" button fanned every detected outdated backend out to every healthy backend node, including nodes that never had that backend installed. On heterogeneous clusters this surfaced as platform errors (e.g. mac-mini-m4 asked to upgrade cpu-insightface-development, which has no darwin/arm64 variant) and left forever-retrying pending_backend_ops rows. DistributedBackendManager.UpgradeBackend now queries ListBackends() first, builds the target node-ID set from SystemBackend.Nodes, and only fans out to those nodes — every per-node primitive (adapter.InstallBackend, the pending-ops queue, BackendOpResult) is unchanged. enqueueAndDrainBackendOp gains an optional targetNodeIDs allowlist; Install/Delete keep their fan-to-everyone semantics by passing nil. If no node reports the backend installed, UpgradeBackend now returns a clear "not installed on any node" error instead of producing a stuck queue. Adds Ginkgo coverage for the smart fan-out: backend on a subset of nodes goes only to those nodes; backend on no node returns the new error and never sends a NATS install request. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 [Claude Code]
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In distributed mode the React UI's "Upgrade All" button fanned every detected outdated backend out to every healthy backend node, including nodes that never had that backend installed. On heterogeneous clusters this surfaced as platform errors (e.g. mac-mini-m4 asked to upgrade cpu-insightface-development, which has no darwin/arm64 variant) and left forever-retrying pending_backend_ops rows.
DistributedBackendManager.UpgradeBackend now queries ListBackends() first, builds the target node-ID set from SystemBackend.Nodes, and only fans out to those nodes — every per-node primitive (adapter.InstallBackend, the pending-ops queue, BackendOpResult) is unchanged. enqueueAndDrainBackendOp gains an optional targetNodeIDs allowlist; Install/Delete keep their fan-to-everyone semantics by passing nil. If no node reports the backend installed, UpgradeBackend now returns a clear "not installed on any node" error instead of producing a stuck queue.
Adds Ginkgo coverage for the smart fan-out: backend on a subset of nodes goes only to those nodes; backend on no node returns the new error and never sends a NATS install request.
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Description
This PR fixes #
Notes for Reviewers
Signed commits