feat: lightweight management API for session observability and lifecycle control#57
feat: lightweight management API for session observability and lifecycle control#57
Conversation
…cle control
Add optional HTTP management server (no framework deps) with endpoints:
- GET /healthz — uptime + discord connection status for K8s probes
- GET /sessions — list active sessions with idle time
- DELETE /sessions/{id} — graceful single-session teardown
- DELETE /sessions — terminate all sessions
Config opt-in via [management] section (enabled = false by default).
K8s deployment and Helm chart updated with livenessProbe/readinessProbe.
Closes #39
|
Thanks for pushing this forward — we've been running a very similar always-on management API in production (bare metal + Docker) for the past few days, so wanted to share what we've hit. Happy to rebase onto this design once it lands. 1. The serenity Discord gateway reconnects its WebSocket periodically (roughly every ~30 minutes, sooner under packet loss). During the reconnect, Our Docker healthcheck runs into the same class of issue for a different reason ( Suggested split:
The current PR's 2. Genuine question: what's the production scenario? We've never needed it. Our
Bulk delete would wipe every active user conversation indiscriminately. If someone hits If there's a real use case (e.g. before broker upgrade), a 3. Our This is currently fork-specific (our 4. Default bind + auth.
Options we'd consider, ordered by disruption:
We don't feel strongly; the token option is what we'd add if this were being shipped to users we don't control. None of these are blockers — the core approach (raw TCP, opt-in, zero deps) matches what we independently arrived at, and we'd love to drop our inline implementation in favor of this once it lands. Let me know if any of these would be useful as follow-up PRs vs folded in here. |
Community Triage Review — 超渡法師 + 普渡法師Verdict: Changes required before merge The feature is needed — main has no management API, no K8s health probes, and no session observability. The core design (raw TCP, opt-in, zero deps) is solid and independently validated by production experience from other deployments. However, there are compile-breaking and production-breaking issues that need to be addressed first. 🔴 Suggested Changes[1] pool.rs field name conflict — needs rebase (compile error)The diff references [2] k8s/deployment.yaml adds probes unconditionally (restart loop when management disabled)The Helm chart correctly wraps probes in Fix: wrap in the same conditional, or remove probes from the raw manifest and only include them in the Helm chart. [3] /healthz as both liveness and readiness — restart storm on Discord reconnectAs @ruan330 noted from production experience: serenity's Discord gateway reconnects its WebSocket roughly every ~30 minutes. During reconnect, Suggested split:
🟡 NIT[4] DELETE /sessions bulk delete — no blast-radius protectionA single mistyped [5] 0.0.0.0:8090 bind + no auth — undocumented security assumptionWith [6] remove_session / remove_all_sessions bypass graceful teardownBoth methods do a raw [7] read_line has no size limitThe HTTP request parser uses ✅ What's good
Summary
|
|
@ruan330 feel free to continue. Let me know when you are ready for review. |
Summary
Add an optional lightweight HTTP management server (zero new dependencies) for session observability and lifecycle control.
Endpoints
GET/healthzGET/sessionsDELETE/sessions/{thread_id}DELETE/sessionsConfig
Opt-in via
[management]section inconfig.toml:Defaults to
enabled = false— no behavior change for existing deployments.Changes
src/config.rs—ManagementConfigstruct (enabled, bind)src/acp/pool.rs—list_sessions(),remove_session(),remove_all_sessions(),max_sessions()src/discord.rs—discord_connectedAtomicBoolset onready()src/management.rs— Raw TCP HTTP server (tokio::net::TcpListener, no framework)src/main.rs— Wire management server, shared stateconfig.toml.example—[management]sectionk8s/deployment.yaml—livenessProbe+readinessProbeon port 8090management.enabled/management.bindin values, configmap, deployment templateDesign decisions
TcpListenerkeeps the dependency tree unchangedAtomicBoolfor discord status — lightweight, lock-free signaling from serenity handlerCloses #39