You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
env
master node
hostname -I # eg aa.aa.aa.aa TP_SOCKET_IFNAME=th0 GLOO_SOCKET_IFNAME=th0 NCCL_SOCKET_IFNAME=th0 python -m sglang.launch_server --model-path path/to/qwq --tp 2 --dist-init-addr aa.aa.aa.aa:20000 --nnodes 2 --node-rank 0master host ip is: bb.bb.bb.bb
worker node
hostname -I # eg cc.cc.cc.cc TP_SOCKET_IFNAME=th0 GLOO_SOCKET_IFNAME=th0 NCCL_SOCKET_IFNAME=th0 python -m sglang.launch_server --model-path path/to/qwq --tp 2 --dist-init-addr bb.bb.bb.bb:20000 --nnodes 2 --node-rank 1result
netstatand show tcp connection is connected with port 20000.tcpdumpcapture the tcp packets, and found that the master was triying to establish connection to the worker node.aa.aa.aa.aa > cc.cc.cc.cc: Flags [S], seq 34745689, win 64240, options [mss 1460, sackOk, TS val...]cc.cc.cc.ccis obviously unreachable on master.Beta Was this translation helpful? Give feedback.
All reactions