[shardformer] update transformers #5583

wangbluo · 2024-04-11T02:00:21Z

🚨 Issue number

[FEATURE]: Upgrade the transformers version from 4.33.0 to 4.36.0 for Shardformer. #5505

📝 What does this PR do?

Merge all the transformers updates commits into main branch.

Update llama2

* update bloom model * remove the version restriction

* update vit model * remove the output_hidden_states

for more information, see https://pre-commit.ci

[shardformer] fix llama modeling

* [zero] support multiple (partial) backward passes * [misc] update requirements

* fix readme * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update bloom model * remove the version restriction

for more information, see https://pre-commit.ci

Merge feature/update-transformers with main

* fix no pad token bug * fixed some auto parallel codegen bug, but might not run on torch 2.1 --------- Co-authored-by: Edenzzzz <wtan45@wisc.edu>

* [shardformer] fix pipeline grad ckpt

[test] fix llama model test

* update transformers update transformers fix fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

colossalai/shardformer/modeling/opt.py

colossalai/shardformer/modeling/falcon.py

colossalai/shardformer/modeling/llama.py

colossalai/shardformer/modeling/mistral.py

colossalai/shardformer/modeling/llama.py

* fix llama model * fix the mistral * fix the shardformer model * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [shardformer] fix attn replacement (#5636) * [shardformer] update flashattention replacement (#5637) * update transformers update transformers fix fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [Feature] Support LLaMA-3 CPT and ST (#5619) * support LLaMA-3 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Run pre-commit --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [exampe] update llama example (#5626) * [plugin] support dp inside for hybriad parallel * [example] update llama benchmark * [example] update llama benchmark * [example] update llama readme * [example] update llama readme * [example] llama3 (#5631) * release llama3 * [release] llama3 * [release] llama3 * [release] llama3 * [release] llama3 * [test] fix llama test (#5638) * [gemini] fix buffer cast (#5639) * support pp for mistral * fix * fix fix fix * fix --------- Co-authored-by: Hongxin Liu <lhx0217@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Tong Li <tong.li352711588@gmail.com> Co-authored-by: binmakeswell <binmakeswell@gmail.com>

* flash_attention forward upgrade * llama_model_forward * remove useless comment * update the requirements.txt * add the transformers version requirements * remove the LATEST VERSION try * [shardformer] update bloom model (hpcaitech#5518) * update bloom model * remove the version restriction * [shardformer] update_falcon (hpcaitech#5520) * [shardformer] update mistral model (hpcaitech#5511) * [shardformer] update gpt2 (hpcaitech#5502) * [shardformer] update gptj model (hpcaitech#5503) * [shardformer] update opt (hpcaitech#5522) * [shardformer] update t5 model (hpcaitech#5524) * [shardformer] update whisper model (hpcaitech#5529) * [shardformer] update vit model (hpcaitech#5530) * update vit model * remove the output_hidden_states * [shardformer] fix llama modeling * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [zero] support multiple (partial) backward passes (hpcaitech#5596) * [zero] support multiple (partial) backward passes * [misc] update requirements * [zero] support multiple (partial) backward passes (hpcaitech#5596) * [zero] support multiple (partial) backward passes * [misc] update requirements * fix conflicts * [doc] fix ColossalMoE readme (hpcaitech#5599) * fix readme * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * merge with main * merge with main * llama_model_forward * remove useless comment * remove the LATEST VERSION try * [shardformer] update bloom model (hpcaitech#5518) * update bloom model * remove the version restriction * [shardformer] update mistral model (hpcaitech#5511) * [shardformer] update opt (hpcaitech#5522) * [shardformer] update whisper model (hpcaitech#5529) * [shardformer] fix llama modeling * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [hotfix] Fix examples no pad token & auto parallel codegen bug; (hpcaitech#5606) * fix no pad token bug * fixed some auto parallel codegen bug, but might not run on torch 2.1 --------- Co-authored-by: Edenzzzz <wtan45@wisc.edu> * [shardformer] fix pipeline grad ckpt (hpcaitech#5620) * [shardformer] fix pipeline grad ckpt * [shardformer] fix whisper (hpcaitech#5628) * [test] fix llama model test * fix the opt upgrade (hpcaitech#5634) * [shardformer] fix attn replacement (hpcaitech#5636) * [shardformer] update flashattention replacement (hpcaitech#5637) * update transformers update transformers fix fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [test] fix llama test (hpcaitech#5638) * [gemini] fix buffer cast (hpcaitech#5639) * Fix shardformer upgrade (hpcaitech#5640) * fix llama model * fix the mistral * fix the shardformer model * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [shardformer]support pipeline parallelism for mistral. (hpcaitech#5642) * [shardformer] fix attn replacement (hpcaitech#5636) * [shardformer] update flashattention replacement (hpcaitech#5637) * update transformers update transformers fix fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [Feature] Support LLaMA-3 CPT and ST (hpcaitech#5619) * support LLaMA-3 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Run pre-commit --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [exampe] update llama example (hpcaitech#5626) * [plugin] support dp inside for hybriad parallel * [example] update llama benchmark * [example] update llama benchmark * [example] update llama readme * [example] update llama readme * [example] llama3 (hpcaitech#5631) * release llama3 * [release] llama3 * [release] llama3 * [release] llama3 * [release] llama3 * [test] fix llama test (hpcaitech#5638) * [gemini] fix buffer cast (hpcaitech#5639) * support pp for mistral * fix * fix fix fix * fix --------- Co-authored-by: Hongxin Liu <lhx0217@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Tong Li <tong.li352711588@gmail.com> Co-authored-by: binmakeswell <binmakeswell@gmail.com> --------- Co-authored-by: Hongxin Liu <lhx0217@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Camille Zhong <44392324+Camille7777@users.noreply.github.com> Co-authored-by: Edenzzzz <wenxuan.tan@wisc.edu> Co-authored-by: Edenzzzz <wtan45@wisc.edu> Co-authored-by: flybird11111 <1829166702@qq.com> Co-authored-by: Tong Li <tong.li352711588@gmail.com> Co-authored-by: binmakeswell <binmakeswell@gmail.com>

wangbluo and others added 16 commits March 25, 2024 13:59

flash_attention forward upgrade

739af90

llama_model_forward

976396c

remove useless comment

63ef374

update the requirements.txt

b00f9ea

add the transformers version requirements

dc8b9d4

remove the LATEST VERSION try

9206dd1

Merge pull request #5499 from wangbluo/update_llama2

cdb166c

Update llama2

[shardformer] update bloom model (#5518)

f1ebe54

* update bloom model * remove the version restriction

[shardformer] update_falcon (#5520)

2cdca4d

[shardformer] update mistral model (#5511)

7686f4e

[shardformer] update gpt2 (#5502)

fd44440

[shardformer] update gptj model (#5503)

9a5edc3

[shardformer] update opt (#5522)

cbff8c0

[shardformer] update t5 model (#5524)

46479fb

[shardformer] update whisper model (#5529)

d7af2d8

[shardformer] update vit model (#5530)

02d9b88

* update vit model * remove the output_hidden_states

wangbluo requested a review from a team as a code owner April 11, 2024 02:00

Merge branch 'main' into feature/update-transformers

2006339

ver217 changed the title ~~Feature/update transformers~~ [shardformer] update transformers Apr 12, 2024

ver217 and others added 11 commits April 12, 2024 13:14

[shardformer] fix llama modeling

c3e8215

[pre-commit.ci] auto fixes from pre-commit.com hooks

8b72eab

for more information, see https://pre-commit.ci

Merge pull request #5592 from ver217/hotfix/shard-llama

c2fab31

[shardformer] fix llama modeling

[zero] support multiple (partial) backward passes (#5596)

46b90f7

* [zero] support multiple (partial) backward passes * [misc] update requirements

[zero] support multiple (partial) backward passes (#5596)

b15b964

* [zero] support multiple (partial) backward passes * [misc] update requirements

fix conflicts

4f5fee4

[doc] fix ColossalMoE readme (#5599)

b323f0a

* fix readme * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

merge with main

7cecde1

merge with main

98eff6d

llama_model_forward

267efc8

remove useless comment

0bdcc84

wangbluo and others added 19 commits April 18, 2024 10:26

remove the LATEST VERSION try

e520e0b

[shardformer] update bloom model (#5518)

2d9a21d

* update bloom model * remove the version restriction

[shardformer] update mistral model (#5511)

50b4c86

[shardformer] update opt (#5522)

1233fc2

[shardformer] update whisper model (#5529)

ab160a8

[shardformer] fix llama modeling

16a29ff

[pre-commit.ci] auto fixes from pre-commit.com hooks

06d7c30

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

b427fee

for more information, see https://pre-commit.ci

Merge pull request #5607 from hpcaitech/merge-main

31b8ff4

Merge feature/update-transformers with main

[hotfix] Fix examples no pad token & auto parallel codegen bug; (#5606)

0b2584d

* fix no pad token bug * fixed some auto parallel codegen bug, but might not run on torch 2.1 --------- Co-authored-by: Edenzzzz <wtan45@wisc.edu>

[shardformer] fix pipeline grad ckpt (#5620)

cbea063

* [shardformer] fix pipeline grad ckpt

[shardformer] fix whisper (#5628)

46190f4

[test] fix llama model test

4a0b2de

Merge pull request #5635 from ver217/hotfix/llama-upgrade

1556840

[test] fix llama model test

fix the opt upgrade (#5634)

2e2d1c1

[shardformer] fix attn replacement (#5636)

e021cea

Merge branch 'main' into feature/update-transformers

d98ac05

[test] fix llama test (#5638)

52f4d3a

flybird11111 reviewed Apr 24, 2024

View reviewed changes

colossalai/shardformer/modeling/opt.py Outdated Show resolved Hide resolved

[gemini] fix buffer cast (#5639)

fcceb78

flybird11111 reviewed Apr 24, 2024

View reviewed changes

colossalai/shardformer/modeling/falcon.py Outdated Show resolved Hide resolved

colossalai/shardformer/modeling/llama.py Outdated Show resolved Hide resolved

colossalai/shardformer/modeling/mistral.py Outdated Show resolved Hide resolved

ver217 reviewed Apr 24, 2024

View reviewed changes

colossalai/shardformer/modeling/llama.py Outdated Show resolved Hide resolved

wangbluo and others added 2 commits April 24, 2024 17:12

ver217 approved these changes Apr 24, 2024

View reviewed changes

ver217 merged commit 0d0a582 into main Apr 24, 2024
4 checks passed

ver217 deleted the feature/update-transformers branch April 24, 2024 14:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[shardformer] update transformers #5583

[shardformer] update transformers #5583

wangbluo commented Apr 11, 2024 •

edited

Loading

[shardformer] update transformers #5583

[shardformer] update transformers #5583

Conversation

wangbluo commented Apr 11, 2024 • edited Loading

🚨 Issue number

📝 What does this PR do?

wangbluo commented Apr 11, 2024 •

edited

Loading