[REQUEST] Expert Choice Routing for MoE #2517

clumsy · 2022-11-17T18:03:57Z

Is your feature request related to a problem? Please describe.
A paper was published regarding potentially better token-expert routing for MoE that leaves less experts under-trained.

Describe the solution you'd like
In addition to GShard's top2 and SwitchTransformer's top1 per token expert routing add expert choice routing option.

Describe alternatives you've considered
N/A

Additional context
N/A

clumsy · 2023-02-13T16:56:48Z

The authors claim 2x convergence rate with EC routing: https://ai.googleblog.com/2022/11/mixture-of-experts-with-expert-choice.html

I hope this incentivizes implementing it in DeepSpeed.

awan-10 · 2023-02-14T18:34:26Z

Thank you @clumsy for sharing this paper.

@ykim362, have you seen this paper? Is anyone in your team or any interns interested in implementing this feature?

clumsy · 2023-02-17T00:29:09Z

In case this helps, TL;DR is in Lilian Weng's blog post.

ykim362 · 2023-02-17T21:52:44Z

Hi @awan-10 .
I have an implementation of this paper. But, we didn't see the gains mentioned in the paper.
Actually, the accuracy was quite worse than the original top-1 and top-2 gating.

@clumsy have you actually done any experiments with this expert choice gating?

clumsy · 2023-02-22T16:13:20Z

No @ykim362, but I would like to experiment with it and share the results.
Is it possible to share the snippet with the implementation you used?

ykim362 · 2023-07-10T23:25:59Z

@clumsy you can take a look at this experimental branch. https://github.com/ykim362/DeepSpeed/tree/youki/expc

Misterion777 · 2024-03-22T13:44:50Z

hey, google has implementation of expert choice routing here: https://github.com/google/flaxformer/blob/main/flaxformer/architectures/moe/routing.py#L647-L717

They have a note that it should not be used in decoder blocks, maybe that was reason for poor results during your experiments?

clumsy added the enhancement New feature or request label Nov 17, 2022

tjruwase added the training label Nov 18, 2022

martincai assigned awan-10 Dec 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REQUEST] Expert Choice Routing for MoE #2517

[REQUEST] Expert Choice Routing for MoE #2517

clumsy commented Nov 17, 2022

clumsy commented Feb 13, 2023

awan-10 commented Feb 14, 2023

clumsy commented Feb 17, 2023

ykim362 commented Feb 17, 2023

clumsy commented Feb 22, 2023

ykim362 commented Jul 10, 2023

Misterion777 commented Mar 22, 2024 •

edited

[REQUEST] Expert Choice Routing for MoE #2517

[REQUEST] Expert Choice Routing for MoE #2517

Comments

clumsy commented Nov 17, 2022

clumsy commented Feb 13, 2023

awan-10 commented Feb 14, 2023

clumsy commented Feb 17, 2023

ykim362 commented Feb 17, 2023

clumsy commented Feb 22, 2023

ykim362 commented Jul 10, 2023

Misterion777 commented Mar 22, 2024 • edited

Misterion777 commented Mar 22, 2024 •

edited