v0.2.0: WebSocket Streaming
What's new
- Multi-stage WebSocket streaming via
/api/ws - Query cancellation mid-stream with partial answer preserved in audit
- Real-time status events: retrieving → generating → verifying_trust
- TTFT: 3-8s → <500ms (measured in E2E tests)
- Groq rate limit (429) → clear
GROQ_RATE_LIMITerror with retry_after_ms - Exponential backoff auto-reconnect (1s → 2s → 4s → 8s cap)
- uvicorn WS ping/pong keepalive (20s interval)
Test coverage
- 14 backend tests (unit + E2E)
- 12 frontend tests (hook state machine + stale ID filtering)
- ruff lint clean
Why this matters
Users previously waited 3-8s staring at a spinner. Now they see first token in <500ms and can cancel expensive queries mid-stream.
Phase 1 tasks
- P1-1: Backend WS endpoint + connection manager
- P1-2: QueryTask async state machine with cancellation
- P1-3: Groq streaming integration + error handling
- P1-4: Frontend useWebSocketQuery hook + reconnect
- P1-5: Streaming UI components with multi-stage status
- P1-GATE: E2E verification + metrics + this release