Need help for multi nodes(machines) multi cards Ray + deepspeed z3 /fsdp #10306
Unanswered
wizounovziki
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
My cases:
2 machines,16 GPUs (A800)
Seems that deepspeed z3 is not supported in Ray + Llama-factory, so I turned to fsdp and followed the steps from https://llamafactory.readthedocs.io/en/latest/advanced/distributed.html#id31, and take the config file for examples/accelerate/fsdp_config_multiple_nodes.yaml, but always get connection failed. I was thinking that since I set up ray and started with ray, the connection params should be ignored.
Wondering how should I set these connection params?
Beta Was this translation helpful? Give feedback.
All reactions