Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] Mixture of Experts #181

Merged
merged 3 commits into from Jan 26, 2022
Merged

[feat] Mixture of Experts #181

merged 3 commits into from Jan 26, 2022

Conversation

blefaudeux
Copy link
Contributor

@blefaudeux blefaudeux commented Jan 17, 2022

What does this PR do?

Implements Mixture of Experts as a simple Feedforward option. Uses the great MoE implementation from FairScale, implemented by @msbaines back in the days

This is really for fun and completeness' sake. Example usecase: Sparse ViT

TODO:

  • Dedicated feedforward option
  • Unit tests, unit tests, unit tests
  • microGPT demo (runs completely fine!)

Before submitting

  • Did you have fun?
    • Make sure you had fun coding 馃檭
  • Did you read the contributor guideline?
  • Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
    • N/A
  • Did you make sure to update the docs?
    • N/A
  • Did you write any new necessary tests?
    • N/A
  • Did you update the changelog? (if needed)
    • N/A

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 17, 2022
@@ -71,10 +71,12 @@ def __init__(
},
},
"feedforward_config": {
"name": "FusedMLP", # Use MLP if Triton is not available
"name": "MixtureOfExperts", # Use MLP if Triton is not available
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pulling in MoE becomes as simple as that (though distributed training adds another layer of complication)

@blefaudeux
Copy link
Contributor Author

Sneak peek, MoE vs. MLP, 4 experts 1/4th the size. Not proving anything on the dense vs. MoE front, but at least the loss looks very reasonable

Screenshot from 2022-01-17 13-48-39

@codecov-commenter
Copy link

codecov-commenter commented Jan 18, 2022

Codecov Report

Merging #181 (f448abe) into main (b24f222) will decrease coverage by 0.22%.
The diff coverage is 94.28%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #181      +/-   ##
==========================================
- Coverage   90.81%   90.58%   -0.23%     
==========================================
  Files          57       58       +1     
  Lines        2852     2922      +70     
==========================================
+ Hits         2590     2647      +57     
- Misses        262      275      +13     
Flag Coverage 螖
Python 90.58% <94.28%> (-0.23%) 猬囷笍

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage 螖
xformers/components/feedforward/fused_mlp.py 91.30% <酶> (酶)
...rmers/components/feedforward/mixture_of_experts.py 94.28% <94.28%> (酶)
xformers/sparse/_csr_ops.py 85.33% <0.00%> (-12.00%) 猬囷笍

Continue to review full report at Codecov.

Legend - Click here to learn more
螖 = absolute <relative> (impact), 酶 = not affected, ? = missing data
Powered by Codecov. Last update b24f222...f448abe. Read the comment docs.

@blefaudeux blefaudeux changed the title [DRAFT][feat] Mixture of Experts [feat] Mixture of Experts Jan 18, 2022
@@ -29,8 +29,6 @@ class FusedMlpConfig(FeedforwardConfig):
class FusedMLP(Feedforward):
"""
A MLP using fused linear layers.

.. warning: This is not currently competitive with PyTorch in terms of training speed
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not true anymore :D

@blefaudeux blefaudeux requested review from jieru-hu, dianaml0 and fmassa and removed request for jieru-hu January 18, 2022 03:26
@blefaudeux
Copy link
Contributor Author

Codecov Report

Merging #181 (da276f4) into main (04bb6c1) will decrease coverage by 0.22%.
The diff coverage is 94.28%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #181      +/-   ##
==========================================
- Coverage   90.58%   90.36%   -0.23%     
==========================================
  Files          56       57       +1     
  Lines        2837     2907      +70     
==========================================
+ Hits         2570     2627      +57     
- Misses        267      280      +13     

Flag Coverage 螖
Python 90.36% <94.28%> (-0.23%) arrow_down

Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files Coverage 螖
xformers/components/feedforward/fused_mlp.py 91.30% <酶> (酶)
...rmers/components/feedforward/mixture_of_experts.py 94.28% <94.28%> (酶)
xformers/sparse/_csr_ops.py 85.33% <0.00%> (-12.00%) arrow_down

Continue to review full report at Codecov.

Legend - Click here to learn more
螖 = absolute <relative> (impact), 酶 = not affected, ? = missing data
Powered by Codecov. Last update 04bb6c1...da276f4. Read the comment docs.

looks like this is wrong, the coverage loss is on _csr_ops, which is not changed..

xformers/components/feedforward/mixture_of_experts.py Outdated Show resolved Hide resolved
xformers/components/feedforward/mixture_of_experts.py Outdated Show resolved Hide resolved

self.moe = MOELayer(gate=self.gate, experts=local_experts, group=group)

self.requires_cuda = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm missing context here, is this used somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's an "old" flag, makes it easier for CI to test something or not test it depending on the HW needs without maintaining an escape list in different places (I think that it came from the Triton parts). We can change that, it was "a" way to solve this

Copy link
Contributor

@dianaml0 dianaml0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@blefaudeux
Copy link
Contributor Author

rebased, conflict resolved

@blefaudeux blefaudeux merged commit fefd3b8 into main Jan 26, 2022
@blefaudeux blefaudeux deleted the moe branch January 26, 2022 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants