fix: single owner for Slack bridge lifecycle (stop dual-launch race)#164
Merged
Conversation
Greptile SummaryEliminated the Slack bridge dual-launch race condition by removing bridge startup from Key changes:
Minor issue found:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Admin
participant start.sh
participant pi
participant control-agent
participant startup-cleanup.sh
participant bridge
Note over Admin,bridge: OLD: Dual Launch (race condition)
Admin->>start.sh: sudo -u baudbot_agent start.sh
start.sh->>bridge: Launch in background subshell
Note over bridge: Port 7890 (wrong/missing PI_SESSION_ID)
start.sh->>pi: Start pi agent
pi->>control-agent: Register session UUID
control-agent->>startup-cleanup.sh: Run cleanup script
startup-cleanup.sh->>bridge: Launch in tmux (correct PI_SESSION_ID)
Note over bridge: Port 7890 conflict (EADDRINUSE)
Note over Admin,bridge: NEW: Single Owner (this PR)
Admin->>start.sh: sudo -u baudbot_agent start.sh
start.sh->>start.sh: Kill stale PID file
start.sh->>start.sh: Kill tmux session slack-bridge
start.sh->>start.sh: Force-release port 7890
start.sh->>pi: Start pi agent
pi->>control-agent: Register session UUID
control-agent->>startup-cleanup.sh: Run cleanup script
startup-cleanup.sh->>bridge: Launch in tmux with PI_SESSION_ID
Note over bridge: Port 7890 (correct session UUID)
Last reviewed commit: 3771d6b |
Comment on lines
+17
to
+18
| # bridge-restart-policy.sh no longer needed — bridge is started by | ||
| # startup-cleanup.sh, not start.sh (see PR #XXX) |
There was a problem hiding this comment.
placeholder PR #XXX should be replaced with actual PR number #164
Suggested change
| # bridge-restart-policy.sh no longer needed — bridge is started by | |
| # startup-cleanup.sh, not start.sh (see PR #XXX) | |
| # bridge-restart-policy.sh no longer needed — bridge is started by | |
| # startup-cleanup.sh, not start.sh (see PR #164) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: start.sh
Line: 17-18
Comment:
placeholder `PR #XXX` should be replaced with actual PR number `#164`
```suggestion
# bridge-restart-policy.sh no longer needed — bridge is started by
# startup-cleanup.sh, not start.sh (see PR #164)
```
How can I resolve this? If you propose a fix, please make it concise.1024a36 to
f1b7f9d
Compare
f1b7f9d to
dc4bfe9
Compare
Two problems fixed: 1. Dual bridge launch: start.sh launched a bridge as a background subshell before pi started, then startup-pi.sh launched another in tmux after. This caused port conflicts, orphaned supervisors, and dropped messages. start.sh now only cleans up stale processes — startup-pi.sh is the sole bridge owner. 2. Infinite restart loop: the bridge restart loop had no max retries or backoff. A fatal config error would spin forever at 5s intervals. Now tracks consecutive fast failures (<60s runtime), backs off (5s + 2s per failure, capped at 60s), gives up after 10, and kills port holders before retrying. Also renames startup-cleanup.sh → startup-pi.sh to clarify that this is the agent-side startup script (called automatically by the control-agent on every session start), not a manual cleanup tool.
dc4bfe9 to
f1c249d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The Slack bridge is launched in two places that fight each other:
start.sh— launches bridge as( bb_bridge_supervise ... ) &background subshell before pi startsstartup-cleanup.sh— launches bridge in aslack-bridgetmux session after control-agent is liveThis causes recurring issues:
PI_SESSION_IDis wrong/missing, messages get droppedChanges
1.
start.sh: Remove bridge launch (cleanup only)startup-cleanup.sh's job2.
startup-cleanup.sh: Add max retries + backoff to restart loop3.
bin/ci/smoke-agent-runtime.sh: Relax bridge status checkstart.shstart.shis a protected file (root-owned). Admin needs to review and deploy.