Sonic is a WebSocket Responses gateway for vLLM (/v1/chat/completions) with Phase 2 agentic behavior.
Implemented in this repo:
- Multi-step agent loop (
response.step.*events) - Tool call protocol via strict JSON (
{"tool_call":...}) - Client tool loop (
tool_result.submit) and optional server tools - Structured output mode (
response_format.type=json_schema) with validation/retry - Durable SQLite state for threads, responses, steps, messages, tool calls/results
response.cancelsupport
Repository:
- GitHub:
https://github.com/mitkox/sonic - License: MIT
main.py: FastAPI + WebSocket session handlingagent_loop.py: core multi-step agent execution loopstate_store.py: in-memory response state backed by persistencepersistence.py: SQLite schema and persistence operationsschemas.py: protocol validation/parsing and system prompt contracttools/registry.py: tool registry + server tool execution policiestools/builtins/: built-in server tools scaffoldingvllm_client.py: streamed backend SSE parsingscripts/test_ws_client.py: plain and agentic streaming test clientscripts/demo_tool_client.py: tool-loop demoscripts/demo_structured_output.py: structured-output demo
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txtsource venv/bin/activate
python -m uvicorn main:app --host 0.0.0.0 --port 9000Health check:
curl http://localhost:9000/healthzVLLM_URL(default:http://localhost:8000)MODEL_NAME(default:mitko)ALLOWED_MODELS(default:MODEL_NAME)PORT(default:9000)STATE_DB_PATH(default:./sonic_state.db)MAX_STEPS(default:8)MAX_TOOL_CALLS(default:16)TOOL_WAIT_TIMEOUT_SECONDS(default:120)BACKEND_CONNECT_TIMEOUT/BACKEND_READ_TIMEOUTDEFAULT_TEMPERATURE(default:0.2)DEFAULT_TOP_P(default:0.9)MAX_INPUT_BYTES(default:262144)RATE_LIMIT_PER_MINUTE(default:120)REQUIRE_API_KEY/API_KEYTOOL_ALLOWLIST(comma-separated, required for server tool execution)ENABLE_SHELL_EXEC(default:false)ENABLE_HTTP_GET(default:false)FILESYSTEM_ROOT(default:.)
Performance notes:
- Gateway now reuses persistent HTTP connections to vLLM (better batching/throughput).
- For lowest latency, keep
STATE_DB_PATHon fast local disk (or tmpfs if durability is not needed). - Keep
DEFAULT_TEMPERATURE<=0.2andDEFAULT_TOP_P<=0.9for stable tool behavior on Qwen3-Coder-Next.
Plain streaming and continuation:
python scripts/test_ws_client.py --model mitkoAgentic tool streaming (auto-responds to calc tool calls):
python scripts/test_ws_client.py --agentic --first "Calculate 12*7 using tool calc"Dedicated demos:
python scripts/demo_tool_client.py
python scripts/demo_structured_output.py
python scripts/showcase_sonic.py --url ws://localhost:9000/v1/responses --model mitko --concurrent-clients 8
scripts/run_showcase.shshowcase_sonic.py runs a full value demo:
- stateful continuation + exact 5-word summary
- agentic tool loop
- structured JSON generation
- streaming profile with TTFT/token-rate stats
- cancellation
- concurrent request throughput (useful when vLLM batching is enabled)
run_showcase.sh is the all-in-one entrypoint:
- runs the full automated test suite (
pytestover all tests) - then runs the live showcase scorecard
- then runs live trace demos for
agentic tool callingandstructured output
Optional env vars for run_showcase.sh:
SONIC_CONCURRENCY=16(live concurrency level)SONIC_RUN_TESTS=0(skip pytest phase)SONIC_RUN_LIVE=0(tests-only mode)SONIC_RUN_DEMO_TRACES=0(skip trace demos)SONIC_PYTEST_ARGS="-q -k gateway"(custom pytest args)
Inbound:
response.createtool_result.submitresponse.cancel
Outbound:
response.createdresponse.in_progressresponse.output_text.deltaresponse.step.createdresponse.step.completedresponse.tool_call.createdresponse.tool_call.completedresponse.tool_result.waitingresponse.tool_result.receivedresponse.completederror
Each response-scoped event includes response_id and thread_id; step events include step_id.
source venv/bin/activate
pytest -q
pytest -q tests/test_acceptance_wow.py
scripts/run_showcase.shCI:
- GitHub Actions runs
pytest -qon pushes/PRs.
Service file: systemd/sonic-ws-gateway.service
Example env file (/etc/sonic-ws-gateway.env):
VLLM_URL=http://localhost:8000
MODEL_NAME=mitko
PORT=9000
STATE_DB_PATH=/var/lib/sonic/state.db
MAX_STEPS=8
MAX_TOOL_CALLS=16
TOOL_WAIT_TIMEOUT_SECONDS=120
REQUIRE_API_KEY=falseSee CONTRIBUTING.md for development workflow and PR expectations.