Skip to content

perf: 10K concurrent sessions scalability + staging chat fixes#51

Merged
nedasvi merged 5 commits into
mainfrom
feat/design-system
May 28, 2026
Merged

perf: 10K concurrent sessions scalability + staging chat fixes#51
nedasvi merged 5 commits into
mainfrom
feat/design-system

Conversation

@nedasvi
Copy link
Copy Markdown
Contributor

@nedasvi nedasvi commented May 28, 2026

Summary

Performance — 10K concurrent sessions

  • DB pool: pool_size=100, max_overflow=50, pool_pre_ping, pool_recycle=3600 (was 5+10=15)
  • Singleton LLM HTTP client per (api_key, base_url) — eliminates per-call TLS handshakes
  • Singleton KB httpx client with Limits(max_connections=200)
  • Shared Redis pub/sub listener for tool results — O(1) asyncio tasks per process instead of O(N)
  • Redis sliding-window rate limiter for SDK client tokens (correct across workers)
  • App org_id cache in usage tracking — 1 DB hit per unique app_id per process lifetime
  • Multi-worker: --workers ${BACKEND_WORKERS:-4} in docker-compose.prod.yml
  • DB migration 023: partial index on chat_sessions(app_id, device_id, last_activity_at) WHERE status='active'

Bug fixes

  • Orchestrator concurrent session fix: asyncio.gather with shared AsyncSession caused IllegalStateChangeError — serialized to sequential calls
  • SDK origin wildcard: cors_origins=["*"] now correctly bypasses origin check
  • Exception logging in _run_turn (was silently swallowed)
  • Session expiry task: logs failures + exponential backoff (was silent pass)
  • Startup warning when Redis not configured

Infra

  • Singleton KB/LLM HTTP clients properly init/closed in FastAPI lifespan

🤖 Generated with Claude Code

nedasvi and others added 5 commits May 27, 2026 16:16
- DB pool: pool_size=100, max_overflow=50, pool_pre_ping, pool_recycle=3600
  (was 5+10=15 connections total — exhausted at ~100 concurrent turns)
- Singleton LLM client per (api_key, base_url) — eliminates per-call TLS handshakes
- Singleton KB httpx.AsyncClient with Limits(max_connections=200)
  — eliminates per-request client creation for internal KB calls
- Shared Redis psubscribe listener for tool results — 1 asyncio task per process
  instead of 1 per pending tool call (O(N) → O(1))
- Redis sliding-window rate limiter for SDK client tokens — correct across workers
  with in-process deque fallback when Redis unavailable
- App org_id cache in usage_tracking — 1 DB hit per unique app_id per process
  lifetime instead of per LLM call
- Session expiry task: log failures + exponential backoff (was silent pass)
- Startup warning when Redis not configured (in-memory fallbacks active)
- Proper lifespan hooks for singleton HTTP client and shared listener init/close
- docker-compose.prod.yml: --workers ${BACKEND_WORKERS:-4} (was 1 process)
- Migration 023: partial index on chat_sessions(app_id, device_id, last_activity_at)
  WHERE status='active' for device-based session reuse lookup

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
_is_allowed_origin did literal set membership — CORS_ORIGINS=["*"]
never matched any real origin. Now "*" bypasses the check.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
asyncio.gather with two coroutines sharing the same SQLAlchemy async
session causes IllegalStateChangeError (concurrent _connection_for_bind).
Make the two db-using coroutines sequential.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@nedasvi nedasvi merged commit c90eb93 into main May 28, 2026
6 checks passed
@nedasvi nedasvi deleted the feat/design-system branch May 28, 2026 09:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant