Add JetMoE model #30005

yikangshen · 2024-04-02T18:38:24Z

What does this PR do?

Add support to JetMoE architecture by Yikang Shen and MyShell AI.
JetMoE is a new sparsely activated architecture inspired by the ModuleFormer. Each JetMoE block consists of two MoE layers: a mixture of Attention Heads and a Mixture of MLP Experts. Given the input tokens, JetMoE activates a subset of its experts to process them. This sparse activation schema enables JetMoE to achieve much better training throughput than similar-sized dense models.

Who can review?

@ArthurZucker and @younesbelkada

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

ArthurZucker · 2024-05-10T06:04:26Z

Feel free to ping me whenever for another review! 🤗

yikangshen · 2024-05-10T17:32:07Z

Feel free to ping me whenever for another review! 🤗

Thanks @ArthurZucker. I have updated the code according to your suggestions. I hope the extra comments will make the code more clear.

ArthurZucker · 2024-05-13T06:48:05Z

Thanks! having a look 😉

ArthurZucker

🔥 Looks great! Thanks a lot for adressing all the comments and taking it into account! Left 2 nits but good to merge!

src/transformers/models/jetmoe/configuration_jetmoe.py

src/transformers/models/jetmoe/modeling_jetmoe.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

ArthurZucker · 2024-05-14T10:20:46Z

Failing test is unrelated, should I merge? 🔥

yikangshen · 2024-05-14T14:22:55Z

Failing test is unrelated, should I merge? 🔥

Yes! I have tested offline with the following command:
RUN_SLOW=1 python -m pytest tests/models/jetmoe/test_modeling_jetmoe.py -vv
and all the tests are passed.

ArthurZucker · 2024-05-14T14:32:20Z

Congrats for this great work! We'll do a release on Thursday!

yikangshen · 2024-05-14T14:54:08Z

Congrats for this great work! We'll do a release on Thursday!

Thanks a lot for the review and comments! @ArthurZucker @gante @younesbelkada

* init jetmoe code * update archive maps * remove flax import * fix import error * update README * ruff fix * update readme * fix * update config * fix issue * merge files * fix model bug * fix test * auto fix * model size * add comments * fix form * add flash attention support * fix attention head number * fix init * fix support list * sort auto mapping * fix test * fix docs * update test * fix test * fix test * change variable name * fix config * fix init * update format * clean code * fix config * fix config * change default config * update config * fix issues * update formate * update config argument * update format * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * change to mixtral aux loss * change to cache_position * debug * fix bugs * debug * fix format * fix format * fix copy * fix format * fix format * fix sort * fix sort * fix sort * add copy comment * add copy from * remove debug code * revert readme update * add copy * debug * remove debug code * fix flash attention * add comments * clean code * clean format * fix format * fix format * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * change variable name * add copied from * fix variable name * remove deprecated functinos * sync to llama implementation * fix format * fix copy * fix format * update format * remove repr * add comment for moe weight * fix copy * Update src/transformers/models/jetmoe/configuration_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * add comments and reformat config * fix format * fix format * fix format * update test * update doc string in config * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * update config doc * update attention cache * fix format * fix copy --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

yikangshen added 30 commits April 2, 2024 00:46

init jetmoe code

b63bcaf

Merge branch 'huggingface:main' into main

03f646e

update archive maps

ed52b57

remove flax import

150cd93

fix import error

436a44c

update README

bcf597f

ruff fix

5c0400e

update readme

e61d131

fix

57b13eb

update config

1f27ad4

fix issue

2ea5542

merge files

109a8c2

fix model bug

21a4c2d

fix test

9d542ac

auto fix

c5092b4

model size

41f2436

add comments

3052ce8

fix form

539cfb9

add flash attention support

0f6af1d

fix attention head number

165e20d

fix init

68633f9

fix support list

d39a0e9

sort auto mapping

ef62bf3

fix test

c0a3076

fix docs

4d79ce6

update test

e5336b5

fix test

67aedd1

fix test

c87de94

change variable name

2f02e7e

fix config

fc39dcc

yikangshen and others added 8 commits May 9, 2024 20:53

Update src/transformers/models/jetmoe/modeling_jetmoe.py

077e46a

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Update src/transformers/models/jetmoe/modeling_jetmoe.py

193a9ef

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Update src/transformers/models/jetmoe/modeling_jetmoe.py

14512fc

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Update src/transformers/models/jetmoe/modeling_jetmoe.py

0bbfc87

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Update src/transformers/models/jetmoe/modeling_jetmoe.py

ecb0337

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Update src/transformers/models/jetmoe/modeling_jetmoe.py

7f6d529

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

add comments and reformat config

4b327ba

fix format

7f44751

yikangshen added 5 commits May 10, 2024 10:37

fix format

6c0ea95

Merge branch 'main' into main

a9e2c22

fix format

9c8081d

update test

cf17204

update doc string in config

41d1a70

ArthurZucker approved these changes May 13, 2024

View reviewed changes

src/transformers/models/jetmoe/configuration_jetmoe.py Outdated Show resolved Hide resolved

src/transformers/models/jetmoe/modeling_jetmoe.py Outdated Show resolved Hide resolved

src/transformers/models/jetmoe/modeling_jetmoe.py Show resolved Hide resolved

yikangshen and others added 7 commits May 13, 2024 10:57

Update src/transformers/models/jetmoe/modeling_jetmoe.py

58e5627

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

update config doc

8341eea

Merge branch 'main' of https://github.com/yikangshen/transformers

71a2939

update attention cache

9e8b759

Merge branch 'huggingface:main' into main

5c21dfe

fix format

1b8ed08

fix copy

060af34

ArthurZucker merged commit ccdabc5 into huggingface:main May 14, 2024
22 of 24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add JetMoE model #30005

Add JetMoE model #30005

yikangshen commented Apr 2, 2024

ArthurZucker commented May 10, 2024

yikangshen commented May 10, 2024

ArthurZucker commented May 13, 2024

ArthurZucker left a comment

ArthurZucker commented May 14, 2024

yikangshen commented May 14, 2024

ArthurZucker commented May 14, 2024

yikangshen commented May 14, 2024

Add JetMoE model #30005

Add JetMoE model #30005

Conversation

yikangshen commented Apr 2, 2024

What does this PR do?

Who can review?

ArthurZucker commented May 10, 2024

yikangshen commented May 10, 2024

ArthurZucker commented May 13, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker commented May 14, 2024

yikangshen commented May 14, 2024

ArthurZucker commented May 14, 2024

yikangshen commented May 14, 2024