What's the accurate way to run multi-node inference at the end of deepspeed training? #2688

Vincent-Li-9701 · 2024-04-18T22:57:36Z

Hi, I'm a bit confused on how to run multi-node inference at the end of the training. I'm using deepspeed zero 3. What I have now is that at the end of training, I let each process output their outputs to local. And the local main process will aggregate the results and write to each node.

But I wonder is there anyway I can gather all the results across all the nodes?

Thank you

github-actions · 2024-05-19T15:06:06Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Vincent-Li-9701 closed this as completed May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's the accurate way to run multi-node inference at the end of deepspeed training? #2688

What's the accurate way to run multi-node inference at the end of deepspeed training? #2688

Vincent-Li-9701 commented Apr 18, 2024

github-actions bot commented May 19, 2024

What's the accurate way to run multi-node inference at the end of deepspeed training? #2688

What's the accurate way to run multi-node inference at the end of deepspeed training? #2688

Comments

Vincent-Li-9701 commented Apr 18, 2024

github-actions bot commented May 19, 2024