Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REQUEST] Expert Choice Routing for MoE #2517

Open
clumsy opened this issue Nov 17, 2022 · 7 comments
Open

[REQUEST] Expert Choice Routing for MoE #2517

clumsy opened this issue Nov 17, 2022 · 7 comments
Assignees
Labels
enhancement New feature or request training

Comments

@clumsy
Copy link
Contributor

clumsy commented Nov 17, 2022

Is your feature request related to a problem? Please describe.
A paper was published regarding potentially better token-expert routing for MoE that leaves less experts under-trained.

Describe the solution you'd like
In addition to GShard's top2 and SwitchTransformer's top1 per token expert routing add expert choice routing option.

Describe alternatives you've considered
N/A

Additional context
N/A

@clumsy clumsy added the enhancement New feature or request label Nov 17, 2022
@clumsy
Copy link
Contributor Author

clumsy commented Feb 13, 2023

The authors claim 2x convergence rate with EC routing: https://ai.googleblog.com/2022/11/mixture-of-experts-with-expert-choice.html

I hope this incentivizes implementing it in DeepSpeed.

@awan-10
Copy link
Contributor

awan-10 commented Feb 14, 2023

Thank you @clumsy for sharing this paper.

@ykim362, have you seen this paper? Is anyone in your team or any interns interested in implementing this feature?

@clumsy
Copy link
Contributor Author

clumsy commented Feb 17, 2023

In case this helps, TL;DR is in Lilian Weng's blog post.

@ykim362
Copy link
Member

ykim362 commented Feb 17, 2023

Hi @awan-10 .
I have an implementation of this paper. But, we didn't see the gains mentioned in the paper.
Actually, the accuracy was quite worse than the original top-1 and top-2 gating.

@clumsy have you actually done any experiments with this expert choice gating?

@clumsy
Copy link
Contributor Author

clumsy commented Feb 22, 2023

No @ykim362, but I would like to experiment with it and share the results.
Is it possible to share the snippet with the implementation you used?

@ykim362
Copy link
Member

ykim362 commented Jul 10, 2023

@clumsy you can take a look at this experimental branch. https://github.com/ykim362/DeepSpeed/tree/youki/expc

@Misterion777
Copy link

Misterion777 commented Mar 22, 2024

hey, google has implementation of expert choice routing here: https://github.com/google/flaxformer/blob/main/flaxformer/architectures/moe/routing.py#L647-L717

They have a note that it should not be used in decoder blocks, maybe that was reason for poor results during your experiments?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request training
Projects
None yet
Development

No branches or pull requests

5 participants