Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the difference between this and deepspeed-moe? #213

Closed
Hap-Zhang opened this issue Aug 18, 2023 · 2 comments
Closed

What is the difference between this and deepspeed-moe? #213

Hap-Zhang opened this issue Aug 18, 2023 · 2 comments

Comments

@Hap-Zhang
Copy link

Hi,
I found out that Microsoft has another project named deepspeed-moe(https://www.deepspeed.ai/tutorials/mixture-of-experts/) that supports moe, and is there any difference in the focus of these two projects?

@ghostplant
Copy link
Contributor

This project is not bonded to DeepSpeed, so it is also compatible for other language models and frameworks not depending on DeepSpeed (e.g. SWIN, Fairseq, etc).

In the meanwhile, DeepSpeed's Top-1 gating can be also boosted if you have Tutel project installed in your environment: https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/moe/sharded_moe.py#L46

But that would just benefit from part of Tutel's kernel optimizations and new features since Tutel >= 0.2.x would not be leveraged.

@Hap-Zhang
Copy link
Author

Ok, I see. Thank you very much for your reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants