You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When an agent disconnects, automatically handle its sessions:
func (h*AgentHub) UnregisterAgent(agentIDstring) {
// Get all active sessions for this agentsessions:=db.Query(` SELECT id FROM sessions WHERE agent_id = $1 AND state IN ('running', 'hibernated') `, agentID)
for_, session:=rangesessions {
// Mark as "terminated" with reason "agent_disconnected"db.Exec(` UPDATE sessions SET state = 'terminated', termination_reason = 'agent_disconnected', terminated_at = NOW() WHERE id = $1 `, session.ID)
}
}
Acceptance Criteria:
Agent disconnect triggers session cleanup
Sessions marked as "terminated" with reason
Audit log event created
Optional: Grace period for agent reconnection (30s)
Add a K8s controller or CronJob that cleans up orphaned resources:
# kubernetes-cleaner cronjobschedule: "*/15 * * * *"# Every 15 minutescommand: | # Delete deployments without corresponding session records # Or where session.state = 'terminated'
Problem
Sessions can get stuck in 'terminating' state when the agent that created them is disconnected or replaced before processing the stop command.
Reproduction:
Observed Behavior:
admin-brave-f7b5e0f5stuck in "terminating" for 48+ minuteskubectl delete deployment,service -l session=<id>Root Cause:
No reconciliation mechanism for orphaned sessions when agent availability changes.
Impact
Proposed Solutions
Solution 1: Session Reconciliation Loop (Recommended for v2.0-beta.1)
Add a background goroutine in the API that periodically checks for stuck sessions:
Acceptance Criteria:
sessions_stuck_terminatingcounterSolution 2: Agent Disconnect Cleanup (Future: v2.1.0)
When an agent disconnects, automatically handle its sessions:
Acceptance Criteria:
Solution 3: Kubernetes Resource Garbage Collection (Future: v2.1.0+)
Add a K8s controller or CronJob that cleans up orphaned resources:
Acceptance Criteria:
Recommended Implementation Plan
v2.0-beta.1 (P1 - MUST FIX):
v2.1.0 (P2):
v2.2.0 (P3):
Files to Modify
api/internal/services/session_reconciler.go(new)api/cmd/main.go- Start reconciler goroutineapi/internal/websocket/hub.go- Agent disconnect cleanupchart/templates/kubernetes-cleaner-cronjob.yaml(future)Testing
Related Issues