fix(nodes): make per-node backend install async via gallery job queue#9928
Merged
Conversation
…talls Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
…t empty nodeID Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
… node via TargetNodeID Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
…rvice job queue The handler previously called unloader.InstallBackend synchronously and blocked the browser for up to 3 minutes waiting on the NATS reply. It now enqueues a TargetNodeID-scoped ManagementOp on BackendGalleryChannel and returns HTTP 202 + jobID immediately, matching /api/backends/install/:id. The opcache key is built via NodeScopedKey(nodeID, backend) so concurrent installs of the same backend across different nodes do not stomp each other. galleryService/opcache/appConfig are threaded through RegisterNodeAdminRoutes for this. Assisted-by: Claude:opus-4-7 [Edit] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
…t drain goroutine Assisted-by: Claude:opus-4-7 [Edit] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Node-scoped backend installs land in opcache under "node:<nodeID>:<backend>" keys. Without splitting that prefix back out, the operations panel renders the full key as the display name and has no structured way to label which worker an install is targeting. Detect the prefix, surface nodeID as its own response field, and reduce the display name back to the bare backend slug. Bare (non-scoped) ops are left untouched so legacy installs do not gain a misleading empty nodeID. Assisted-by: Claude:opus-4-7 [Edit] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4-7 [Edit] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
…cancellations as errors Assisted-by: Claude:opus-4-7 [Edit] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4-7 [Edit] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
…ch codebase precedent Assisted-by: Claude:opus-4-7 [Edit] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
POST /api/nodes/:id/backends/installnow returns HTTP 202 with ajobIDimmediately, instead of blocking the request for up to 3 minutes while the worker downloads and registers the backend. This unfreezes the React UI when installing on one or more nodes from the Backends picker.The change is wired through the gallery service's existing async job queue, the same pattern
/api/backends/install/:idalready uses:galleryop.ManagementOpgains aTargetNodeIDfield so a singleManagementOpenqueued onBackendGalleryChannelcan be scoped to one workerDistributedBackendManager.InstallBackendbuilds a one-elementtargetNodeIDsallowlist whenTargetNodeIDis set, reusing the same pathUpgradeBackendalready takesnode:<nodeID>:<backend>) so concurrent installs on different nodes don't collide, and returns{ jobID, statusUrl, message }/api/operationsnow surfaces anodeIDfield for node-scoped ops so the Operations panel can render attribution (and the bare backend slug shows innameinstead of the prefixed key)NodeInstallPickerdispatches all installs in parallel, then polls/api/backends/job/:uidper job (1.5s interval, 6 min hard cap) until each settles; the modal stays closeable mid-installTest plan
jobID(Network tab), row shows "Installing" immediately, Operations panel surfaces the job withnodeID, row eventually flips to "Installed"POST /api/backends/install/:id(global, no node target) still fans out across the cluster unchangedFollow-ups (not blockers)
/api/operations/:jobID/dismisslets users clear them manually(nodeID, backend)will leak the firstjobIDingalleryService.statusesuntil process restart. JS picker dedupes via theselectedSet so realistic UI flow is safe; no server-side dedupe added in this PRAssisted-by: Claude:opus-4-7 [Edit] [Bash] [Agent]