perf: 10K concurrent sessions scalability + staging chat fixes#51
Merged
Conversation
- DB pool: pool_size=100, max_overflow=50, pool_pre_ping, pool_recycle=3600
(was 5+10=15 connections total — exhausted at ~100 concurrent turns)
- Singleton LLM client per (api_key, base_url) — eliminates per-call TLS handshakes
- Singleton KB httpx.AsyncClient with Limits(max_connections=200)
— eliminates per-request client creation for internal KB calls
- Shared Redis psubscribe listener for tool results — 1 asyncio task per process
instead of 1 per pending tool call (O(N) → O(1))
- Redis sliding-window rate limiter for SDK client tokens — correct across workers
with in-process deque fallback when Redis unavailable
- App org_id cache in usage_tracking — 1 DB hit per unique app_id per process
lifetime instead of per LLM call
- Session expiry task: log failures + exponential backoff (was silent pass)
- Startup warning when Redis not configured (in-memory fallbacks active)
- Proper lifespan hooks for singleton HTTP client and shared listener init/close
- docker-compose.prod.yml: --workers ${BACKEND_WORKERS:-4} (was 1 process)
- Migration 023: partial index on chat_sessions(app_id, device_id, last_activity_at)
WHERE status='active' for device-based session reuse lookup
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
_is_allowed_origin did literal set membership — CORS_ORIGINS=["*"] never matched any real origin. Now "*" bypasses the check. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
asyncio.gather with two coroutines sharing the same SQLAlchemy async session causes IllegalStateChangeError (concurrent _connection_for_bind). Make the two db-using coroutines sequential. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Performance — 10K concurrent sessions
pool_size=100, max_overflow=50, pool_pre_ping, pool_recycle=3600(was 5+10=15)Limits(max_connections=200)--workers ${BACKEND_WORKERS:-4}in docker-compose.prod.ymlchat_sessions(app_id, device_id, last_activity_at) WHERE status='active'Bug fixes
asyncio.gatherwith sharedAsyncSessioncausedIllegalStateChangeError— serialized to sequential callscors_origins=["*"]now correctly bypasses origin check_run_turn(was silently swallowed)pass)Infra
🤖 Generated with Claude Code