Lightweight Prefill-Decode disaggregated proxy for LLM serving.
xPyD-proxy routes inference requests between prefill and decode nodes, enabling PD-disaggregated LLM serving with load balancing, health monitoring, and fault tolerance.
- PD disaggregation — separate prefill and decode nodes for optimal resource utilization
- Multiple scheduling policies — round-robin, consistent hash, cache-aware, power-of-two
- Resilience — circuit breaker, health monitoring, automatic failover
- Multi-model routing — serve multiple models through a single proxy
- OpenAI-compatible API — drop-in replacement for vLLM/OpenAI endpoints
- YAML configuration — declarative topology and settings
pip install xpyd-proxyOr as part of the full xPyD toolkit:
pip install xpyd# Start with YAML config
xpyd proxy --config proxy.yaml
# Or with CLI args
xpyd proxy --model my-model \
--prefill 127.0.0.1:8001 \
--decode 127.0.0.1:8002| Component | Description |
|---|---|
| xpyd-proxy | PD-disaggregated proxy |
| xpyd-sim | OpenAI-compatible inference simulator |
| xpyd-bench | Benchmarking & planning tool |
📖 Full Guide → | 💡 Examples → | 🏗️ Contributing →
Apache 2.0 — see LICENSE