What is the difference between this and deepspeed-moe? #213

Hap-Zhang · 2023-08-18T06:59:45Z

Hi,
I found out that Microsoft has another project named deepspeed-moe(https://www.deepspeed.ai/tutorials/mixture-of-experts/) that supports moe, and is there any difference in the focus of these two projects?

ghostplant · 2023-08-19T17:52:11Z

This project is not bonded to DeepSpeed, so it is also compatible for other language models and frameworks not depending on DeepSpeed (e.g. SWIN, Fairseq, etc).

In the meanwhile, DeepSpeed's Top-1 gating can be also boosted if you have Tutel project installed in your environment: https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/moe/sharded_moe.py#L46

But that would just benefit from part of Tutel's kernel optimizations and new features since Tutel >= 0.2.x would not be leveraged.

Hap-Zhang · 2023-08-21T01:38:21Z

Ok, I see. Thank you very much for your reply.

Hap-Zhang closed this as completed Aug 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the difference between this and deepspeed-moe? #213

What is the difference between this and deepspeed-moe? #213

Hap-Zhang commented Aug 18, 2023

ghostplant commented Aug 19, 2023

Hap-Zhang commented Aug 21, 2023

What is the difference between this and deepspeed-moe? #213

What is the difference between this and deepspeed-moe? #213

Comments

Hap-Zhang commented Aug 18, 2023

ghostplant commented Aug 19, 2023

Hap-Zhang commented Aug 21, 2023