Skip to content

fix(model): track intentional stops, stop misreading clean shutdowns as crashes#10060

Merged
mudler merged 1 commit into
masterfrom
fix/backend-process-cleanup
May 29, 2026
Merged

fix(model): track intentional stops, stop misreading clean shutdowns as crashes#10060
mudler merged 1 commit into
masterfrom
fix/backend-process-cleanup

Conversation

@richiejp
Copy link
Copy Markdown
Collaborator

Description

Track when we intentionally kill a backend and use that to decide if its death is unexpected.
Otherwise we get error messages because the backend process returned a non-zero status we didn't
expect and there are no standards for status codes so we'd have to ensure every backend handles signals
in the same way and explicitly uses the exit codes we expect.

Notes for Reviewers

Two separate issues made graceful backend shutdown look ungraceful in the
logs, even though the processes were being terminated correctly
(go-processmanager defaults to process-group SIGTERM + 15s grace + SIGKILL):

  1. "failed to read PID" — startProcess registers a per-process graceful-
    termination handler that calls Stop(), but StopAllGRPC (registered
    earlier, via app.Shutdown) already stopped and released store-tracked
    backends first. The second Stop() then failed reading the removed
    pidfile. Guard the handler with IsAlive() so it skips already-stopped
    processes; it still covers backends StopAllGRPC doesn't track (worker-
    supervised ones).

  2. "Backend process exited unexpectedly" exitCode=-1 — the exit watcher
    treated only exit codes 0/143 as clean. But a child killed by our own
    SIGTERM/SIGKILL is reported by Go as exitCode -1 (signal termination),
    not the shell's 128+signal convention, so every intentional stop logged
    a false crash warning. The exit code can't distinguish an intended stop
    from a signal-induced crash.

Track intent directly instead: a stoppingProcs sync.Map (keyed by the
*process.Process pointer) is marked wherever LocalAI calls Stop() on
purpose, and the exit watcher uses it to pick the log level — Info
"stopped" when intentional, Warn "exited unexpectedly" otherwise (still
catching real crashes). The raw exit code is reported as a field but no
longer interpreted.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Richard Palethorpe io@richiejp.com

Signed commits

  • Yes, I signed my commits.

…as crashes

Two separate issues made graceful backend shutdown look ungraceful in the
logs, even though the processes were being terminated correctly
(go-processmanager defaults to process-group SIGTERM + 15s grace + SIGKILL):

1. "failed to read PID" — startProcess registers a per-process graceful-
   termination handler that calls Stop(), but StopAllGRPC (registered
   earlier, via app.Shutdown) already stopped and released store-tracked
   backends first. The second Stop() then failed reading the removed
   pidfile. Guard the handler with IsAlive() so it skips already-stopped
   processes; it still covers backends StopAllGRPC doesn't track (worker-
   supervised ones).

2. "Backend process exited unexpectedly" exitCode=-1 — the exit watcher
   treated only exit codes 0/143 as clean. But a child killed by our own
   SIGTERM/SIGKILL is reported by Go as exitCode -1 (signal termination),
   not the shell's 128+signal convention, so every intentional stop logged
   a false crash warning. The exit code can't distinguish an intended stop
   from a signal-induced crash.

Track intent directly instead: a stoppingProcs sync.Map (keyed by the
*process.Process pointer) is marked wherever LocalAI calls Stop() on
purpose, and the exit watcher uses it to pick the log level — Info
"stopped" when intentional, Warn "exited unexpectedly" otherwise (still
catching real crashes). The raw exit code is reported as a field but no
longer interpreted.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
@mudler mudler merged commit 4647770 into master May 29, 2026
57 checks passed
@mudler mudler deleted the fix/backend-process-cleanup branch May 29, 2026 16:54
@mudler mudler added the bug Something isn't working label May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants