Skip to content

Commit

Permalink
fix: always shutdown maintenance API service
Browse files Browse the repository at this point in the history
The problem was that `GracefulStop()` will hang forever if there is a
running API call. So if there is a running streaming call, the
maintenance service might hang until it is finished.

The problem shows up with 'Upgrade' API in the maintenance mode if there
is a concurrent streaming API call, e.g.:

1. Watch API is running against maintenance mode.
2. Upgrade API is issued, it tries to run the MaintenanceUpgrade
   sequence, which tries to take over the Initialize sequence. The
   Initialize sequence is canceled, maintenance API service context is
   canceled, but the service doesn't terminate, as it's stuck in
   `GracefulStop`. The sequence take over times out, as even the
   sequence is canceled, it hasn't terminated yet.

Sample log:

```
[talos] upgrade request received: "ghcr.io/siderolabs/installer:v1.3.3"
[talos] upgrade failed: failed to acquire lock: timeout
[talos] task loadConfig (1/1): failed: failed to receive config via maintenance service: maintenance service failed: context canceled
[talos] phase config (6/7): failed
[talos] initialize sequence: failed
<stuck here>
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
  • Loading branch information
smira committed Mar 23, 2023
1 parent a0a5db5 commit cf2ccc5
Showing 1 changed file with 7 additions and 7 deletions.
14 changes: 7 additions & 7 deletions internal/app/maintenance/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,12 @@ func Run(ctx context.Context, logger *log.Logger) ([]byte, error) {
return nil, err
}

defer server.GracefulStop()
defer func() {
shutdownCtx, shutdownCancel := context.WithTimeout(ctx, 5*time.Second)
defer shutdownCancel()

factory.ServerGracefulStop(server, shutdownCtx)
}()

go func() {
//nolint:errcheck
Expand Down Expand Up @@ -156,12 +161,7 @@ func Run(ctx context.Context, logger *log.Logger) ([]byte, error) {

select {
case cfg := <-cfgCh:
shutdownCtx, shutdownCancel := context.WithTimeout(ctx, 5*time.Second)
defer shutdownCancel()

factory.ServerGracefulStop(server, shutdownCtx)

return cfg, err
return cfg, nil
case <-ctx.Done():
return nil, ctx.Err()
}
Expand Down

0 comments on commit cf2ccc5

Please sign in to comment.