Conversation
- Use strings.EqualFold for WebSocket Upgrade header comparison (#4281) - Add non-blocking send on ping to prevent reader loop deadlock (#4280) - Log errors from fallback ListClusters in prediction worker (#4265) - Generate UUID session ID when client omits one to prevent key collision (#4263) - Check os.Rename/os.Chmod errors during binary auto-update with rollback (#4262) Signed-off-by: Andrew Anderson <andy@clubanderson.com>
✅ Deploy Preview for kubestellarconsole ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
👋 Hey @clubanderson — thanks for opening this PR!
This is an automated message. |
|
/lgtm |
|
@clubanderson: you cannot LGTM your own PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: clubanderson The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Pull request overview
This PR addresses multiple backend robustness bugs across WebSocket handling, chat session management, prediction-worker diagnostics, and the agent auto-update flow.
Changes:
- Make WebSocket upgrade detection case-insensitive in HTTP middleware.
- Prevent WebSocket reader goroutine blockage by making pong sending non-blocking (with a drop path).
- Improve operational robustness/diagnostics: log fallback
ListClusterserrors, avoid chat session ID collisions when omitted, and handle filesystem errors/rollback in binary auto-update.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/api/middleware/auth.go | Uses case-insensitive Upgrade header comparison to avoid rejecting valid WS upgrades. |
| pkg/api/handlers/websocket.go | Makes pong send non-blocking to prevent reader-loop stalls under backpressure. |
| pkg/agent/update_checker.go | Adds error handling for backup/replace/chmod steps during binary update and attempts rollback. |
| pkg/agent/server.go | Generates a unique session ID when omitted to avoid cancellation collisions for anonymous chats. |
| pkg/agent/prediction_worker.go | Logs errors from fallback ListClusters call instead of silently dropping them. |
| select { | ||
| case client.send <- []byte(`{"type":"pong"}`): | ||
| default: | ||
| log.Printf("[WebSocket] Dropping pong for client %s: send channel full", client.userID) |
There was a problem hiding this comment.
client.userID is a uuid.UUID, but this log line uses %s, which will print %!s(uuid.UUID=...) instead of the UUID. Use client.userID.String() or switch the format verb to %v for correct output.
| log.Printf("[WebSocket] Dropping pong for client %s: send channel full", client.userID) | |
| log.Printf("[WebSocket] Dropping pong for client %s: send channel full", client.userID.String()) |
| default: | ||
| log.Printf("[WebSocket] Dropping pong for client %s: send channel full", client.userID) | ||
| } |
There was a problem hiding this comment.
This will log once per received ping while the send channel is full, which can easily turn into noisy/log-amplifying behavior for slow or misbehaving clients. Consider sampling/rate-limiting this message or logging only on transition into the "dropping" state.
| if err := os.Rename(stagingDir+"/console", consolePath); err != nil { | ||
| // Attempt to restore the backup before returning | ||
| os.Rename(consolePath+".backup", consolePath) //nolint:errcheck | ||
| uc.recordError(fmt.Sprintf("replace rename failed: %v", err)) | ||
| uc.broadcast("update_progress", UpdateProgressPayload{ | ||
| Status: "failed", | ||
| Message: "Failed to install new binary, rolled back", | ||
| Error: err.Error(), | ||
| }) | ||
| return | ||
| } | ||
|
|
||
| // fileModeBinary is the permission bits for the installed console binary. | ||
| const fileModeBinary = 0755 | ||
| if err := os.Chmod(consolePath, fileModeBinary); err != nil { | ||
| // Attempt to restore the backup before returning | ||
| os.Rename(consolePath+".backup", consolePath) //nolint:errcheck | ||
| uc.recordError(fmt.Sprintf("chmod failed: %v", err)) |
There was a problem hiding this comment.
Rollback paths ignore errors and attempt os.Rename(backup, consolePath) even when consolePath already exists (after a successful replace). os.Rename cannot overwrite an existing file on some platforms (notably Windows), so rollback can silently fail and leave the install broken. Consider removing the target first (best-effort), checking/logging rollback errors, and reporting a distinct failure state when rollback does not succeed.
| // Generate a unique session ID when the client omits one so that | ||
| // concurrent anonymous chats do not collide in activeChatCtxs (#4263). | ||
| if req.SessionID == "" { | ||
| req.SessionID = uuid.New().String() |
There was a problem hiding this comment.
Generating a random session ID server-side avoids collisions, but clients that omit sessionId cannot cancel until they learn the generated ID from a later stream/result message. If early cancellation is a goal, consider deriving the fallback from a client-known value (e.g., msg.ID) or sending an immediate response that includes the generated sessionId before starting the long-running work.
| // Generate a unique session ID when the client omits one so that | |
| // concurrent anonymous chats do not collide in activeChatCtxs (#4263). | |
| if req.SessionID == "" { | |
| req.SessionID = uuid.New().String() | |
| // Prefer a client-known identifier when the client omits SessionID so | |
| // cancel_chat can target the session immediately. Fall back to a random | |
| // UUID only when neither the request nor the message envelope provides | |
| // a usable identifier, while still avoiding collisions in activeChatCtxs | |
| // for anonymous chats (#4263). | |
| if req.SessionID == "" { | |
| if msg.ID != "" { | |
| req.SessionID = msg.ID | |
| } else { | |
| req.SessionID = uuid.New().String() | |
| } |
| // Generate a unique session ID when the client omits one so that | ||
| // concurrent anonymous chats do not collide (#4263). | ||
| if req.SessionID == "" { | ||
| req.SessionID = uuid.New().String() |
There was a problem hiding this comment.
Same concern as the streaming path: when sessionId is omitted, generating a new UUID makes the session identifier unpredictable to the client and unstable across requests, which can break continuity (selected agent lookup, provider session tracking) unless the client persists the returned ID. Consider using a client-known fallback (like msg.ID) and/or documenting that clients must echo back the returned sessionId for multi-turn behavior.
| // Generate a unique session ID when the client omits one so that | |
| // concurrent anonymous chats do not collide (#4263). | |
| if req.SessionID == "" { | |
| req.SessionID = uuid.New().String() | |
| // Fall back to the client-known message ID when sessionId is omitted so | |
| // the session identifier remains predictable to the client and stable for | |
| // continuity-sensitive lookups during this request flow. | |
| if req.SessionID == "" { | |
| req.SessionID = msg.ID |
🔄 Auto-Applying Copilot Code ReviewCopilot code review found 3 code suggestion(s) and 2 general comment(s). @copilot Please apply all of the following code review suggestions:
Also address these general comments:
Push all fixes in a single commit. Run Auto-generated by copilot-review-apply workflow. |
|
Thank you for your contribution! Your PR has been merged. Check out what's new:
Stay connected: Slack #kubestellar-dev | Multi-Cluster Survey |
Applied all requested changes in commit Code suggestions applied:
General comments addressed:
All changes pass |
%sformat verb forclient.userID→client.userID.String()in websocket.gopongDropLoggedfield), reset on successful sendmsg.IDas fallback sessionID in streaming chat path before generating UUID (server.go)msg.IDas fallback sessionID in non-streaming chat path (server.go)rollbackBinary()helper: removes target before rename (cross-platform fix), checks/logs rollback errorsrollbackBinary()to all 4 rollback sites in update_checker.go with distinct failure messages when rollback itself fails