Skip to content

ci: fix Arch integration control-socket readiness check#204

Merged
benvinegar merged 2 commits intomainfrom
fix/arch-integration-socket-readiness
Apr 29, 2026
Merged

ci: fix Arch integration control-socket readiness check#204
benvinegar merged 2 commits intomainfrom
fix/arch-integration-socket-readiness

Conversation

@benvinegar
Copy link
Copy Markdown
Member

Summary

  • wait for the control-agent socket to become connectable before runtime smoke tests proceed
  • align Arch runtime smoke readiness behavior with the inference smoke test
  • avoid transient startup races where the alias and socket file exist before the RPC server accepts connections

Testing

  • npm test
  • npm run lint
  • npm run typecheck

This PR description was generated by Pi using GPT-5 Codex

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 29, 2026

Greptile Summary

This PR fixes a startup race in CI by adding a socket_is_connectable Python probe that validates the control-agent Unix socket is actually accepting connections, not just that the socket file exists. The wait_for_control_socket polling loop now gates on both the filesystem check and the new connectivity probe, aligning the runtime smoke test with the inference smoke test's readiness behavior.

Confidence Score: 4/5

Safe to merge; the fix correctly eliminates the startup race and has no functional issues.

Only a P2 style finding (missing finally in the Python probe); no logic or security issues. Score is 4/5 per P2-only ceiling.

No files require special attention.

Important Files Changed

Filename Overview
bin/ci/smoke-agent-runtime.sh Adds socket_is_connectable helper using an inline Python connect probe and gates wait_for_control_socket on actual RPC reachability; logic is correct but the probe omits a finally close on the error path, inconsistent with probe_rpc_get_message.

Sequence Diagram

sequenceDiagram
    participant CI as CI Runner
    participant Shell as smoke-agent-runtime.sh
    participant FS as Filesystem
    participant Py as python3 (socket_is_connectable)
    participant Sock as control-agent socket

    CI->>Shell: run
    Shell->>Shell: baudbot start
    loop every 1s until deadline (60s)
        Shell->>FS: -L CONTROL_ALIAS?
        FS-->>Shell: symlink exists
        Shell->>FS: readlink → target path
        Shell->>FS: -S target? (socket file exists)
        FS-->>Shell: yes
        Shell->>Py: sudo -u baudbot_agent python3 -c ... target
        Py->>Sock: s.connect(target)
        alt socket accepts connection
            Sock-->>Py: connected
            Py-->>Shell: exit 0
            Shell-->>CI: socket path (ready)
        else socket not yet listening
            Sock-->>Py: refused / timeout
            Py-->>Shell: exit 1
            Shell->>Shell: sleep 1, retry
        end
    end
    Shell->>Shell: probe_rpc_get_message (RPC check)
    Shell->>Shell: baudbot stop
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: bin/ci/smoke-agent-runtime.sh
Line: 47-59

Comment:
**Missing `finally` block leaves socket unclosed on failure**

If `s.connect()` raises, the socket is never explicitly closed before `sys.exit(1)`. The OS reclaims the fd on process exit so it isn't a real leak, but it's inconsistent with the `probe_rpc_get_message` function in the same file which uses a `finally: client.close()` guard. Worth aligning to the established pattern.

```suggestion
socket_is_connectable() {
  local sock="$1"
  sudo -u "$AGENT_USER" python3 -c "
import socket, sys
s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
try:
    s.settimeout(2)
    s.connect(sys.argv[1])
except Exception:
    sys.exit(1)
finally:
    s.close()
" "$sock" 2>/dev/null
}
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "ci: wait for connectable control socket ..." | Re-trigger Greptile

Comment on lines +47 to +59
socket_is_connectable() {
local sock="$1"
sudo -u "$AGENT_USER" python3 -c "
import socket, sys
s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
try:
s.settimeout(2)
s.connect(sys.argv[1])
s.close()
except Exception:
sys.exit(1)
" "$sock" 2>/dev/null
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing finally block leaves socket unclosed on failure

If s.connect() raises, the socket is never explicitly closed before sys.exit(1). The OS reclaims the fd on process exit so it isn't a real leak, but it's inconsistent with the probe_rpc_get_message function in the same file which uses a finally: client.close() guard. Worth aligning to the established pattern.

Suggested change
socket_is_connectable() {
local sock="$1"
sudo -u "$AGENT_USER" python3 -c "
import socket, sys
s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
try:
s.settimeout(2)
s.connect(sys.argv[1])
s.close()
except Exception:
sys.exit(1)
" "$sock" 2>/dev/null
}
socket_is_connectable() {
local sock="$1"
sudo -u "$AGENT_USER" python3 -c "
import socket, sys
s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
try:
s.settimeout(2)
s.connect(sys.argv[1])
except Exception:
sys.exit(1)
finally:
s.close()
" "$sock" 2>/dev/null
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: bin/ci/smoke-agent-runtime.sh
Line: 47-59

Comment:
**Missing `finally` block leaves socket unclosed on failure**

If `s.connect()` raises, the socket is never explicitly closed before `sys.exit(1)`. The OS reclaims the fd on process exit so it isn't a real leak, but it's inconsistent with the `probe_rpc_get_message` function in the same file which uses a `finally: client.close()` guard. Worth aligning to the established pattern.

```suggestion
socket_is_connectable() {
  local sock="$1"
  sudo -u "$AGENT_USER" python3 -c "
import socket, sys
s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
try:
    s.settimeout(2)
    s.connect(sys.argv[1])
except Exception:
    sys.exit(1)
finally:
    s.close()
" "$sock" 2>/dev/null
}
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — I aligned socket_is_connectable() with the existing RPC probe pattern and added a finally: s.close() guard.

Responded by dev-agent-modem-a8b7b331 using GPT-5 Codex.

@benvinegar benvinegar merged commit 9f6671e into main Apr 29, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant