RFC: Session Management
Tracking issue: #75
Status: Draft
Author: @thepagent
Summary
Comprehensive session management for agent-broker covering lifecycle control, isolation, observability, security, and multi-agent support.
Current State
1. Session Lifecycle
1a. /close command (#40)
- Discord handler intercepts
/close message
- Calls
pool.remove(thread_id) → drops AcpConnection → kill_on_drop kills child process
- Replies "Session closed." and archives thread
1b. Session timeout / auto-expiry
- Split current
session_ttl_hours (hard max) from new idle_timeout_minutes (e.g. 30 min)
- On idle timeout → post "⏰ Session expired due to inactivity" in thread → remove session
1c. Per-user session limits
- Track
HashMap<UserId, Vec<ThreadId>> for active sessions per user
- Exceeding limit → reply "You have too many active sessions. Use
/close to free one."
1d. Graceful shutdown
- On shutdown, post "🔄 Broker restarting..." in each active thread before clearing pool
- Phase 2: persist session metadata to disk/S3 for restart recovery
2. Session Isolation & Stability
2a. Per-thread working directories (#38)
- Change
working_dir to {base_working_dir}/{thread_id}/
- Each session gets its own filesystem namespace
- Cleanup deletes working dir along with session
2b. Cross-session deadlock fix (#58)
- Change from
RwLock<HashMap<K, AcpConnection>> to RwLock<HashMap<K, Arc<Mutex<AcpConnection>>>>
- Outer
RwLock only protects map insert/remove (released immediately)
- Per-connection
Mutex protects streaming — session A no longer blocks session B
pub struct SessionPool {
connections: RwLock<HashMap<String, Arc<Mutex<AcpConnection>>>>,
}
pub async fn with_connection(&self, thread_id: &str, f: F) -> Result<R> {
let conn = {
let conns = self.connections.read().await;
conns.get(thread_id).cloned()
.ok_or_else(|| anyhow!("no connection"))?
};
let mut guard = conn.lock().await; // per-session lock only
f(&mut guard).await
}
3. Session State
3a. Session metadata
struct SessionMetadata {
thread_id: String,
user_id: String,
agent_name: String,
created_at: Instant,
last_active: Instant,
message_count: u64,
status: SessionStatus, // Active, Idle, Expired
}
- Phase 1: in-memory, used for observability and lifecycle decisions
- Phase 2: serialize to disk/S3 for restart recovery
3b. Context window management
- Track
message_count per session, warn user when approaching limits
- Conversation summarization deferred to Phase 2 (requires extra LLM call)
4. Session Observability (#39)
4a. Management API
Lightweight HTTP server on separate port (e.g. 9090) using axum:
GET /sessions — list all active sessions
GET /sessions/:thread_id — session detail
DELETE /sessions/:thread_id — force terminate
GET /health — broker health + pool stats
GET /metrics — prometheus-compatible metrics
4b. Metrics
active_sessions (gauge)
total_sessions_created (counter)
session_duration_seconds (histogram)
messages_per_session (histogram)
pool_exhaustion_events (counter)
4c. Audit trail
- Structured logging via existing
tracing — add fields: thread_id, user_id, event
- Events:
session_created, session_prompt, session_closed, session_expired
5. Session Security & Access Control
5a. Session ownership
SessionMetadata records owner_user_id
/close restricted to session owner or admin
- Other users can still interact in thread (Discord threads are public)
5b. Rate limiting per session
- Per-session sliding window: configurable
max_messages_per_minute (default: 10)
- Exceeding limit → reply "⏳ Rate limited, please wait."
6. Multi-agent
6a. Session routing
- Extend config from single
[agent] to [agents] table with multiple agent configs
- Routing by Discord channel or
/agent <name> command
- Pool becomes
HashMap<String, (AgentConfig, AcpConnection)>
6b. Session handoff (Phase 2)
/handoff <agent> → close current connection, respawn with new agent
- Optional: carry conversation summary to new agent system prompt
Implementation Phases
| Phase |
Scope |
Complexity |
| Phase 1 |
#58 deadlock fix, #40 /close, idle timeout notification, session metadata |
Low-Med |
| Phase 2 |
#39 management API, metrics, per-user limits, rate limiting |
Medium |
| Phase 3 |
#38 per-thread working dirs, session ownership, audit trail |
Medium |
| Phase 4 |
Multi-agent routing, persistence/recovery, handoff |
High |
Open Questions
- Should the management API require auth (API key / mTLS)?
- Should we support session resume after agent crash (requires agent-side support)?
- Multi-agent routing: per-channel config vs. user command vs. both?
- Rate limiting: per-session or per-user?
Comments and feedback welcome.
RFC: Session Management
Tracking issue: #75
Status: Draft
Author: @thepagent
Summary
Comprehensive session management for agent-broker covering lifecycle control, isolation, observability, security, and multi-agent support.
Current State
SessionPoolusesHashMap<String, AcpConnection>keyed by Discord thread_idcleanup_idle(TTL-based) andshutdownexist1. Session Lifecycle
1a.
/closecommand (#40)/closemessagepool.remove(thread_id)→ dropsAcpConnection→kill_on_dropkills child process1b. Session timeout / auto-expiry
session_ttl_hours(hard max) from newidle_timeout_minutes(e.g. 30 min)1c. Per-user session limits
HashMap<UserId, Vec<ThreadId>>for active sessions per user/closeto free one."1d. Graceful shutdown
2. Session Isolation & Stability
2a. Per-thread working directories (#38)
working_dirto{base_working_dir}/{thread_id}/2b. Cross-session deadlock fix (#58)
RwLock<HashMap<K, AcpConnection>>toRwLock<HashMap<K, Arc<Mutex<AcpConnection>>>>RwLockonly protects map insert/remove (released immediately)Mutexprotects streaming — session A no longer blocks session B3. Session State
3a. Session metadata
3b. Context window management
message_countper session, warn user when approaching limits4. Session Observability (#39)
4a. Management API
Lightweight HTTP server on separate port (e.g.
9090) usingaxum:4b. Metrics
active_sessions(gauge)total_sessions_created(counter)session_duration_seconds(histogram)messages_per_session(histogram)pool_exhaustion_events(counter)4c. Audit trail
tracing— add fields:thread_id,user_id,eventsession_created,session_prompt,session_closed,session_expired5. Session Security & Access Control
5a. Session ownership
SessionMetadatarecordsowner_user_id/closerestricted to session owner or admin5b. Rate limiting per session
max_messages_per_minute(default: 10)6. Multi-agent
6a. Session routing
[agent]to[agents]table with multiple agent configs/agent <name>commandHashMap<String, (AgentConfig, AcpConnection)>6b. Session handoff (Phase 2)
/handoff <agent>→ close current connection, respawn with new agentImplementation Phases
/close, idle timeout notification, session metadataOpen Questions
Comments and feedback welcome.