Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When will large model frameworks be supported. deepspeed for example #1792

Open
PeterChg opened this issue Apr 21, 2023 · 7 comments
Open

When will large model frameworks be supported. deepspeed for example #1792

PeterChg opened this issue Apr 21, 2023 · 7 comments

Comments

@PeterChg
Copy link
Contributor

No description provided.

@johnugeorge
Copy link
Member

Can you add more info and update description? We love to add support for frameworks like Deepspeed and LLM examples. EBay are your thoughts?

@PeterChg
Copy link
Contributor Author

PeterChg commented Apr 23, 2023

Can you add more info and update description? We love to add support for frameworks like Deepspeed and LLM examples. EBay are your thoughts?

With the open source of deepspeed, More and more companies use deepspeed to train LLM。but deepspeed framework has some differences with pytorch.

@tenzen-y
Copy link
Member

@PeterChg You might be interested in this: kubeflow/mpi-operator#549.

@Syulin7
Copy link
Contributor

Syulin7 commented Apr 24, 2023

Deepspeed supports various parallel launchers, such as pdsh (default, machines accessible via passwordless SSH), OpenMPI, slurm, and so on.

The mpi-operator in the training operator is executed through kubectl exec, and it is uncertain whether Deepspeed can support it. Currently, using mpi v2 (via passwordless SSH) would be more appropriate.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions
Copy link

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

@johnugeorge
Copy link
Member

/lifecycle frozen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants