[feat] Mixture of Experts #181

blefaudeux · 2022-01-17T21:34:31Z

What does this PR do?

Implements Mixture of Experts as a simple Feedforward option. Uses the great MoE implementation from FairScale, implemented by @msbaines back in the days

This is really for fun and completeness' sake. Example usecase: Sparse ViT

TODO:

Dedicated feedforward option
Unit tests, unit tests, unit tests
microGPT demo (runs completely fine!)

Before submitting

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

blefaudeux · 2022-01-17T21:35:08Z

examples/microGPT.py

@@ -71,10 +71,12 @@ def __init__(
                    },
                },
                "feedforward_config": {
-                    "name": "FusedMLP",  # Use MLP if Triton is not available
+                    "name": "MixtureOfExperts",  # Use MLP if Triton is not available


pulling in MoE becomes as simple as that (though distributed training adds another layer of complication)

blefaudeux · 2022-01-17T21:50:07Z

Sneak peek, MoE vs. MLP, 4 experts 1/4th the size. Not proving anything on the dense vs. MoE front, but at least the loss looks very reasonable

codecov-commenter · 2022-01-18T02:21:07Z

Codecov Report

Merging #181 (f448abe) into main (b24f222) will decrease coverage by 0.22%.
The diff coverage is 94.28%.

@@            Coverage Diff             @@
##             main     #181      +/-   ##
==========================================
- Coverage   90.81%   90.58%   -0.23%     
==========================================
  Files          57       58       +1     
  Lines        2852     2922      +70     
==========================================
+ Hits         2590     2647      +57     
- Misses        262      275      +13

Flag	Coverage Δ
Python	`90.58% <94.28%> (-0.23%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
xformers/components/feedforward/fused_mlp.py	`91.30% <ø> (ø)`
...rmers/components/feedforward/mixture_of_experts.py	`94.28% <94.28%> (ø)`
xformers/sparse/_csr_ops.py	`85.33% <0.00%> (-12.00%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b24f222...f448abe. Read the comment docs.

blefaudeux · 2022-01-18T03:23:14Z

xformers/components/feedforward/fused_mlp.py

@@ -29,8 +29,6 @@ class FusedMlpConfig(FeedforwardConfig):
        class FusedMLP(Feedforward):
            """
            A MLP using fused linear layers.
-
-            .. warning: This is not currently competitive with PyTorch in terms of training speed


not true anymore :D

blefaudeux · 2022-01-18T03:27:23Z

Codecov Report

Merging #181 (da276f4) into main (04bb6c1) will decrease coverage by 0.22%.
The diff coverage is 94.28%.
@@            Coverage Diff             @@
##             main     #181      +/-   ##
==========================================
- Coverage   90.58%   90.36%   -0.23%     
==========================================
  Files          56       57       +1     
  Lines        2837     2907      +70     
==========================================
+ Hits         2570     2627      +57     
- Misses        267      280      +13     
Flag Coverage Δ
Python 90.36% <94.28%> (-0.23%) arrow_down

Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files Coverage Δ
xformers/components/feedforward/fused_mlp.py 91.30% <ø> (ø)
...rmers/components/feedforward/mixture_of_experts.py 94.28% <94.28%> (ø)
xformers/sparse/_csr_ops.py 85.33% <0.00%> (-12.00%) arrow_down

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 04bb6c1...da276f4. Read the comment docs.

looks like this is wrong, the coverage loss is on _csr_ops, which is not changed..

xformers/components/feedforward/mixture_of_experts.py

fmassa · 2022-01-19T17:23:16Z

xformers/components/feedforward/mixture_of_experts.py

+
+            self.moe = MOELayer(gate=self.gate, experts=local_experts, group=group)
+
+            self.requires_cuda = True


I'm missing context here, is this used somewhere?

it's an "old" flag, makes it easier for CI to test something or not test it depending on the HW needs without maintaining an escape list in different places (I think that it came from the Triton parts). We can change that, it was "a" way to solve this

dianaml0

LGTM!

blefaudeux · 2022-01-26T18:28:43Z

rebased, conflict resolved

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 17, 2022

blefaudeux commented Jan 17, 2022

View reviewed changes

blefaudeux force-pushed the moe branch from 8d5654d to 4456396 Compare January 17, 2022 21:45

blefaudeux force-pushed the moe branch 4 times, most recently from ca9d016 to da276f4 Compare January 18, 2022 01:53

blefaudeux changed the title ~~[DRAFT][feat] Mixture of Experts~~ [feat] Mixture of Experts Jan 18, 2022

blefaudeux commented Jan 18, 2022

View reviewed changes

blefaudeux requested review from jieru-hu, dianaml0 and fmassa and removed request for jieru-hu January 18, 2022 03:26

blefaudeux force-pushed the moe branch 2 times, most recently from 2a321ef to f448abe Compare January 19, 2022 03:35

fmassa reviewed Jan 19, 2022

View reviewed changes

blefaudeux force-pushed the moe branch from d480ed2 to eaaa080 Compare January 19, 2022 20:54

blefaudeux requested a review from fmassa January 19, 2022 22:50

dianaml0 approved these changes Jan 26, 2022

View reviewed changes

Running microGPT example ! Needs some proper testing

da1e714

blefaudeux force-pushed the moe branch from eaaa080 to 73573d2 Compare January 26, 2022 18:28

code review, cleaner

56f0ea0

blefaudeux force-pushed the moe branch from 73573d2 to 56f0ea0 Compare January 26, 2022 18:38

update to match current main

1afdd45

blefaudeux merged commit fefd3b8 into main Jan 26, 2022

blefaudeux deleted the moe branch January 26, 2022 19:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] Mixture of Experts #181

[feat] Mixture of Experts #181

blefaudeux commented Jan 17, 2022 •

edited

blefaudeux Jan 17, 2022

blefaudeux commented Jan 17, 2022

codecov-commenter commented Jan 18, 2022 •

edited

blefaudeux Jan 18, 2022

blefaudeux commented Jan 18, 2022

Codecov Report

fmassa Jan 19, 2022

blefaudeux Jan 19, 2022

dianaml0 left a comment

blefaudeux commented Jan 26, 2022


		self.moe = MOELayer(gate=self.gate, experts=local_experts, group=group)

		self.requires_cuda = True

[feat] Mixture of Experts #181

[feat] Mixture of Experts #181

Conversation

blefaudeux commented Jan 17, 2022 • edited

What does this PR do?

Before submitting

PR review

blefaudeux Jan 17, 2022

Choose a reason for hiding this comment

blefaudeux commented Jan 17, 2022

codecov-commenter commented Jan 18, 2022 • edited

Codecov Report

blefaudeux Jan 18, 2022

Choose a reason for hiding this comment

blefaudeux commented Jan 18, 2022

Codecov Report

fmassa Jan 19, 2022

Choose a reason for hiding this comment

blefaudeux Jan 19, 2022

Choose a reason for hiding this comment

dianaml0 left a comment

Choose a reason for hiding this comment

blefaudeux commented Jan 26, 2022

blefaudeux commented Jan 17, 2022 •

edited

codecov-commenter commented Jan 18, 2022 •

edited