Is there a example about deepseek-v4-pro pd disaggregation? #29150
Replies: 2 comments 2 replies
-
|
看这个报错,第一优先级我会先排查 router 注入的 你现在的 prefill worker 里设置了: --disaggregation-bootstrap-port 9889但 router 启动参数是: --prefill http://<prefill-host>:30000 \
--decode http://<decode-host>:30001在当前 router 参数里, --prefill URL [BOOTSTRAP_PORT]如果第二个参数省略,代码里会把 我建议先把 router 改成显式带 bootstrap port: python3 -m sglang_router.launch_router \
--pd-disaggregation \
--prefill http://<prefill-host>:30000 9889 \
--decode http://<decode-host>:30001 \
--host 0.0.0.0 \
--port 13784 \
--disable-circuit-breaker \
--health-check-interval-secs 999999然后逐项确认:
export SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT=600
export SGLANG_DISAGGREGATION_WAITING_TIMEOUT=600这不是根治,只是判断是“慢/拥塞”还是“bootstrap 信息/网络不通”。官方文档里这两个默认都是 300s,你的日志 另外,Mooncake 的 最小化验证路径我会这样走:
如果补上 |
Beta Was this translation helpful? Give feedback.
-
|
这组命令里还有两个容易踩坑的点:
# prefill all nodes
--disaggregation-bootstrap-port 9889
# decode all nodes
--disaggregation-bootstrap-port 9890另外你的 bench 命令里 并发压测时 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
有关于deepseek-v4-pro pd分离的部署实践吗,参考官方的https://docs.sglang.io/cookbook/autoregressive/DeepSeek/DeepSeek-V4#hw=h100&variant=pro&quant=fp4&strategy=high-throughput&nodes=multi-2 进行部署发现有问题。
1p1d部署:
可以正常推理,但是并发评测报错:
Beta Was this translation helpful? Give feedback.
All reactions