Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pipeline] fix return_dict & fix pure_pipeline_test #4331

Merged
merged 1 commit into from
Jul 27, 2023

Conversation

Fridge003
Copy link
Contributor

@Fridge003 Fridge003 commented Jul 26, 2023

📌 Checklist before creating the PR

  • I have created an issue for this PR for traceability
  • The title follows the standard format: [doc/gemini/tensor/...]: A concise description
  • I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

  • fix the misconfiguration of setting return_dict to False in bert.py and bloom.py under shardformer/modeling
  • fix import warning in opt.py
  • fix test_pure_pipeline test in test_shardformer

💥 Checklist before requesting a review

  • I have linked my PR to an issue (instruction)
  • My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
  • I have performed a self-review of my code
  • I have added thorough tests.
  • I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

  • 🌝 Yes, I do.
  • 🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

@Fridge003
Copy link
Contributor Author

截屏2023-07-26 15 00 38

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


截屏2023-07-26 15 00 38

Copy link
Contributor

@CjhHa1 CjhHa1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@CjhHa1 CjhHa1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass

@github-actions
Copy link
Contributor

The code coverage for the changed files is %.

Click me to view the complete report
Name                                                                    Stmts   Miss  Cover
-------------------------------------------------------------------------------------------
colossalai/amp/naive_amp/mixed_precision_optimizer.py                      98     33    66%
colossalai/booster/booster.py                                              66      9    86%
colossalai/booster/plugin/__init__.py                                      11      0   100%
colossalai/booster/plugin/hybrid_parallel_plugin.py                       137     26    81%
colossalai/booster/plugin/pp_plugin_base.py                                 9      1    89%
colossalai/cluster/__init__.py                                              5      0   100%
colossalai/cluster/process_group_mesh.py                                   72      2    97%
colossalai/interface/optimizer.py                                          43      6    86%
colossalai/lazy/lazy_init.py                                              306     45    85%
colossalai/pipeline/p2p.py                                                 99      7    93%
colossalai/pipeline/policy/__init__.py                                     12     12     0%
colossalai/pipeline/policy/base.py                                         54     54     0%
colossalai/pipeline/policy/bert.py                                        226    226     0%
colossalai/pipeline/policy/bloom.py                                       109    109     0%
colossalai/pipeline/schedule/__init__.py                                    3      0   100%
colossalai/pipeline/schedule/_utils.py                                     50     10    80%
colossalai/pipeline/schedule/base.py                                       10      1    90%
colossalai/pipeline/schedule/one_f_one_b.py                               110      6    95%
colossalai/pipeline/stage_manager.py                                       68      4    94%
colossalai/shardformer/layer/__init__.py                                    8      0   100%
colossalai/shardformer/layer/embedding.py                                 124     23    81%
colossalai/shardformer/layer/linear.py                                    171     50    71%
colossalai/shardformer/layer/normalization.py                              51     10    80%
colossalai/shardformer/layer/qkv_fused_linear.py                          210     56    73%
colossalai/shardformer/modeling/bert.py                                   349    128    63%
colossalai/shardformer/modeling/bloom.py                                  283    103    64%
colossalai/shardformer/modeling/gpt2.py                                   248     73    71%
colossalai/shardformer/modeling/llama.py                                  170     59    65%
colossalai/shardformer/modeling/t5.py                                     215     97    55%
colossalai/shardformer/modeling/vit.py                                    123     24    80%
colossalai/shardformer/policies/auto_policy.py                             27      2    93%
colossalai/shardformer/policies/base_policy.py                             87      9    90%
colossalai/shardformer/policies/bert.py                                   250      2    99%
colossalai/shardformer/policies/bloom.py                                  143      2    99%
colossalai/shardformer/policies/gpt2.py                                   178      3    98%
colossalai/shardformer/policies/llama.py                                  112      3    97%
colossalai/shardformer/policies/opt.py                                    354     52    85%
colossalai/shardformer/policies/t5.py                                     195     12    94%
colossalai/shardformer/policies/vit.py                                    103      1    99%
colossalai/shardformer/shard/shard_config.py                               24      2    92%
colossalai/shardformer/shard/sharder.py                                    84      3    96%
colossalai/shardformer/shard/shardformer.py                                15      0   100%
colossalai/shardformer/shard/utils.py                                      11      0   100%
colossalai/tensor/d_tensor/api.py                                         147     24    84%
tests/kit/model_zoo/transformers/__init__.py                                8      0   100%
tests/kit/model_zoo/transformers/bert.py                                   50      0   100%
tests/kit/model_zoo/transformers/gpt.py                                    36      0   100%
tests/kit/model_zoo/transformers/vit.py                                    24      0   100%
tests/test_booster/test_plugin/test_3d_plugin.py                           64      7    89%
tests/test_booster/test_plugin/test_gemini_plugin.py                       74     10    86%
tests/test_booster/test_plugin/test_low_level_zero_plugin.py               60      6    90%
tests/test_cluster/test_process_group_mesh.py                              86      1    99%
tests/test_fx/test_tracer/test_hf_model/hf_tracer_utils.py                 21      2    90%
tests/test_fx/test_tracer/test_hf_model/test_hf_bert.py                    17      1    94%
tests/test_fx/test_tracer/test_hf_model/test_hf_gpt.py                     17      1    94%
tests/test_lazy/test_distribute.py                                         74     47    36%
tests/test_lazy/test_models.py                                             13      1    92%
tests/test_pipeline/test_p2p_communication.py                              44      1    98%
tests/test_pipeline/test_policy/test_bert_for_pretraining_model.py         39      1    97%
tests/test_pipeline/test_policy/test_bert_lm_head_model.py                 39      1    97%
tests/test_pipeline/test_policy/test_bert_model.py                         36      1    97%
tests/test_pipeline/test_policy/test_bloom_model.py                        38      1    97%
tests/test_pipeline/test_policy/test_t5_pipeline_utils.py                  17      0   100%
tests/test_pipeline/test_schedule/test_pipeline_schedule_utils.py          40      0   100%
tests/test_pipeline/test_stage_manager.py                                  52      1    98%
tests/test_shardformer/test_layer/test_embedding.py                        37      1    97%
tests/test_shardformer/test_layer/test_layernorm.py                        35      1    97%
tests/test_shardformer/test_layer/test_linear_1d.py                       110      1    99%
tests/test_shardformer/test_layer/test_qkv_fused_linear_1d.py              89      1    99%
tests/test_shardformer/test_layer/test_vocab_parallel_embedding_1d.py      39      1    97%
tests/test_shardformer/test_model/_utils.py                                49      1    98%
tests/test_shardformer/test_model/test_pure_pipeline.py                   100     57    43%
tests/test_shardformer/test_model/test_shard_bert.py                       58      1    98%
tests/test_shardformer/test_model/test_shard_bert_pipeline.py              55      2    96%
tests/test_shardformer/test_model/test_shard_bloom.py                      58      1    98%
tests/test_shardformer/test_model/test_shard_bloom_pipeline.py             47      2    96%
tests/test_shardformer/test_model/test_shard_gpt2.py                       60      1    98%
tests/test_shardformer/test_model/test_shard_gpt2_pipeline.py              51      2    96%
tests/test_shardformer/test_model/test_shard_llama.py                      60      1    98%
tests/test_shardformer/test_model/test_shard_llama_pipeline.py             47      2    96%
tests/test_shardformer/test_model/test_shard_opt.py                        61      1    98%
tests/test_shardformer/test_model/test_shard_opt_pipeline.py               49      2    96%
tests/test_shardformer/test_model/test_shard_t5.py                         66      1    98%
tests/test_shardformer/test_model/test_shard_t5_pipeline.py                74      6    92%
tests/test_shardformer/test_model/test_shard_vit.py                        48      1    98%
tests/test_shardformer/test_model/test_shard_vit_pipeline.py               49      2    96%
tests/test_shardformer/test_shard_utils.py                                 21      0   100%
tests/test_shardformer/test_with_torch_ddp.py                              52      1    98%
-------------------------------------------------------------------------------------------
TOTAL                                                                    7334   1460    80%

@ver217 ver217 merged commit d547377 into hpcaitech:feature/pipeline Jul 27, 2023
6 checks passed
ver217 pushed a commit to ver217/ColossalAI that referenced this pull request Aug 15, 2023
ver217 pushed a commit to ver217/ColossalAI that referenced this pull request Aug 15, 2023
Xu-Kai added a commit that referenced this pull request Sep 12, 2023
* [gemini] fix tensor storage cleaning in state dict collection (#4396)

* [hotfix] fix unsafe async comm in zero (#4404)

* improve stablility of zero

* fix wrong index

* add record stream

* [doc] update Coati README (#4405)

* style: apply formatter

* fix: add outdated warnings

* docs: add dataset format and polish

* docs: polish README

* fix: fix json format

* fix: fix typos

* revert: revert 7b example

* [doc] fix a typo in examples/tutorial/auto_parallel/README.md (#4430)

Co-authored-by: Siyuan Tian <siyuant@vmware.com>

* [cluster] add process group mesh (#4039)

* [cluster] add process group mesh

* [test] add process group mesh test

* force sync

* [pipeline] add stage manager (#4093)

* [pipeline] add stage manager

* [test] add pipeline stage manager test

* [pipeline] add docstring for stage manager

* [pipeline] implement p2p communication (#4100)

* [pipeline] add p2p communication

* [test] add p2p communication test

* [test] add rerun decorator

* [test] rename to avoid conflict

* [pipeline] refactor 1f1b schedule (#4115)

* [api] update optimizer wrapper to fit pipeline

* [pipeline] add base schedule

* [pipeline] add 1f1b schedule

* [test] add pipeline schedule utils test

* [pipeline] fix import

* [pipeline]add pipeline policy and bert forward (#4130)

* add pipeline policy and bert forward to be done

* add bertmodel pipeline forward and make tests

* add Bert_Policy and test for policy

* update formatting

* update formatting

* update the code

* fix bugs

* fix name confilt

* [pipeline] add stage manager (#4093)

* [pipeline] add stage manager

* [test] add pipeline stage manager test

* [pipeline] add docstring for stage manager

* [pipeline]add pipeline policy and bert forward (#4130)

* add pipeline policy and bert forward to be done

* add bertmodel pipeline forward and make tests

* add Bert_Policy and test for policy

* update formatting

* update formatting

* update the code

* fix bugs

* fix name confilt

* [pipeline] build bloom model and policy , revise the base class of policy (#4161)

* add pipeline policy and bert forward to be done

* add bertmodel pipeline forward and make tests

* add Bert_Policy and test for policy

* update formatting

* update formatting

* update the code

* fix bugs

* fix name confilt

* add bloom model and policy ,revise the base class of policy

* revise

* revision

* add bert_for_pretraining

* [pipeline] update shardformer policy

* [pipeline] update shardformer docstring

* [test] update shardformer tests

* [test] add shard util tests

* [shardformer] rename policy file name

* [shardformer] fix type hint

* [pipeline] add bert_for_pretraining bert_lmhead forward and policy (#4172)

* add pipeline policy and bert forward to be done

* add bertmodel pipeline forward and make tests

* add Bert_Policy and test for policy

* update formatting

* update formatting

* update the code

* fix bugs

* fix name confilt

* add bloom model and policy ,revise the base class of policy

* revise

* revision

* add bert_for_pretraining

* add bert_for_pretraining forward and policy

* fix typos

* cancel warning

* change the imediate output to default dict

* change the default output of get_shared_params

* [pipeline] move bert related pipeline components to shardformer (#4187)

* move bert related pipeline components to shardformer

* fix bugs

* revision

* fix bert model tests

* fix bert_lm_head model tests

* fix tests

* fix tests

* done checks

* skip bloom

* [shardformer] support lazy init (#4202)

* [shardformer] support lazy init

* [shardformer] linear support lazy init

* [shardformer] embedding support lazy init

* [shardformer] norm support lazy init

* [shardformer] fused linear support lazy init

* [test] update shardformer test layer

* [test] shardformer with lazy init fit ddp

* [lazy] hotfix deepcopy of param

* [shardformer] fix bert policy and update test

* [shardformer] fix bloom policy and update test

* [shardformer] fix opt policy and update test

* [shardformer] fix t5 policy and update test

* [shardformer] fix gpt2 policy and update test

* [shardformer] fix llama policy and update test

* [pipeline] Bert pipeline for shardformer and its tests (#4197)

* add pipeline forward

* complete pipeline forward check

* fix bert forward without pipeline

* fix comments

* discard useless line

* add todo

* clean prints

* fix distribute layers

* [pipeline]  Llama pipeline (#4205)

* bloom policy

* llama pipeline forward and tests

* fix the output and attention_mask

* fix name

* bind argument to policy

* Revert "bloom policy"

This reverts commit 8dee68a0a22568dbeed6d4563372b25e1e825fb0.

This policy should be revert and copied to feature/bloom

* revert the bloom changes

* cancel unneeded inputs

* gpt

* [pipeline] Llama causal lm and llama for sequence classification pipeline (#4208)

* bloom policy

* llama pipeline forward and tests

* fix the output and attention_mask

* fix name

* bind argument to policy

* Revert "bloom policy"

This reverts commit 8dee68a0a22568dbeed6d4563372b25e1e825fb0.

This policy should be revert and copied to feature/bloom

* revert the bloom changes

* cancel unneeded inputs

* gpt

* finish llama

* causal lm and sequence classification

* revision

* [pipeline] add bloom model pipeline  (#4210)

* bloom policy

* llama pipeline forward and tests

* fix the output and attention_mask

* fix name

* bind argument to policy

* finish bloom model

* test shard gpt2

* clear cache

* [pipeline] Add Pipeline Forward for GPT2Model Shardformer (#4224)

* * fix typehint & docstring in sharder.py

* * update pipeline forward for GPT2Model

* * add test for pipeline forward of GPT2Model

* * add cache cleaning in gpt2 test

* * change assert to raise command

* [shardformer] fix base policy (#4229)

* [pipeline] add pipeline forward for variants of gpt2 (#4238)

* add forward for GPTLMHeadModel

* add test for gpt_lm

* arranging get_held_layers method

* arrange forward replacement

* add forward for GPT2ForTokenClassification

* add forward for GPT2ForSequenceClassification

* fix test_shard_gpt2.py

* add GPT2DoubleHeadsmodel & fix bugs

* add id checking in get_shared_params

* [pipeline]  All bert models (#4233)

* bloom policy

* llama pipeline forward and tests

* fix the output and attention_mask

* fix name

* bind argument to policy

* Revert "bloom policy"

This reverts commit 8dee68a0a22568dbeed6d4563372b25e1e825fb0.

This policy should be revert and copied to feature/bloom

* revert the bloom changes

* cancel unneeded inputs

* gpt

* finish llama

* causal lm and sequence classification

* revision

* add pure pipeline test

* finish some bert models

* finish all bert models

* finish bert tests

* fix bugs

* fix bugs

* fix test pipeline

* fix data gen for qa

* update the set pipeline forward

* shared params

* fix bugs

* [pipeline] finish bloom models pipeline and tests (#4223)

* bloom policy

* llama pipeline forward and tests

* fix the output and attention_mask

* fix name

* bind argument to policy

* finish bloom model

* test shard gpt2

* clear cache

* support all bloom models

* add bloom models policies

* finish bloom pipeline and tests

* add set pipeline

* finish bloom

* [bugs] hot fix some testing bugs for new models (#4268)

* hot fix

* hot fx tracer

* [pipeline] support shardformer for GPT2ForQuestionAnswering & complete pipeline support for GPT2 (#4245)

* change for transformers loggers

* add forward for GPT2ForQuestionAnswering

* fix assert

* fix torchrec test

* [shardformer] support inplace sharding (#4251)

* [shardformer] embedding support inplace sharding

* [shardformer] linear support inplace sharding

* [shardformer] layernorm support inplace sharding

* [shardformer] qkv support inplace sharding

* [test] update shardformer layer test

* [shardformer] fix shared param sharding

* [shardformer] fix bert policy

* [shardformer] fix bloom policy

* [shardformer] fix llama policy

* [shardformer] fix opt policy

* [shardformer] fix t5 policy

* [shardformer] fix fused qkv linear

* [shardformer] fix bugs

* force sync

* [test] fix bugs

* [test] fix transformer version

* [pipeline] refactor gpt2 pipeline forwards (#4287)

* move gpt2 pipeline forwards to modeling folder

* check pipeline status when adding replacing policy

* fix typehint

* fix arguments processing in gpt2_model_forward

* [pipeline] OPT model pipeline (#4258)

* opt forward and test

* pause

* finish opt model pipeline

* finish opt pipeline

* opt forward and test

* pause

* finish opt model pipeline

* finish opt pipeline

* fix opt

* set transformers version

* refactor the test pipeline

* [hotfix] fix opt pipeline  (#4293)

* opt forward and test

* pause

* finish opt model pipeline

* finish opt pipeline

* opt forward and test

* pause

* finish opt model pipeline

* finish opt pipeline

* fix opt

* set transformers version

* refactor the test pipeline

* fix bug

* [pipeline] reformat for unified design (#4283)

* bert_reformat

* reformat

* reformat

* fix a typo

* format

* format

* fix bug

* [pipeline] add pipeline support for T5Stack/T5EncoderModel (#4300)

* modify t5 policy & add test

* pipeline stage distribution for t5

* complete t5 base policy

* t5 stack: halfway

* modify gpt2 pipeline test

* complete pipeline forward for T5Stack/T5EncoderModel

* fix docstring

* move t5 util tests to test_pipeline

* [pipeline] test pure pipeline process using llama (#4218)

* bloom policy

* llama pipeline forward and tests

* fix the output and attention_mask

* fix name

* bind argument to policy

* Revert "bloom policy"

This reverts commit 8dee68a0a22568dbeed6d4563372b25e1e825fb0.

This policy should be revert and copied to feature/bloom

* revert the bloom changes

* cancel unneeded inputs

* gpt

* finish llama

* causal lm and sequence classification

* revision

* add pure pipeline test

* fixed version

* fixed version

* pure pipeline

* [pipeline] add pipeline support for all T5 models (#4310)

* complete policy for T5Model & T5ForConditionalGeneration

* modify function signature in forwards

* add forward for T5model

* add forward for T5ForConditionalGeneration

* fix a bug

* fix hidden_states transporting in decoder

* fix the passing of encoder_outputs

* [shardformer] support pipeline base vit model  (#4284)

* Feature/vit support (#4182)

* [shardformer] added tests

* [shardformer] vit test finish and support

* fix attention dropout

* support base vit pipeline

* support vit downstream model

* fix vit shard test

* modify hidden states return type

---------

Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>

* [plugin] add 3d parallel plugin (#4295)

* [amp] add mixed precision optimizer

* [plugin] add 3d parallel plugin

* [booster] support pipeline

* [plugin] 3d parallel plugin support clip grad norm

* [shardformer] fix sharder and add plugin test

* [plugin] rename 3d parallel plugin

* [ci] support testmon core pkg change detection (#4305)

* [hotfix] debug testmon

* [hotfix] fix llama

* [hotfix] fix p2p bugs

* [hotfix] fix requirements

* [hotfix] fix gemini and zero test (#4333)

* [hotfix] fix gemini and zero test

* [hotfix] fix lazy init test

* [hotfix] fix lazy init test

* [pipeline] fix return_dict/fix pure_pipeline_test (#4331)

* [pipeline] add unit test for 1f1b (#4303)

* add unit test for 1f1b

* polish code

* polish code and update ut version

* fix

* [pipeline] refactor test pipeline and remove useless utils in pipeline (#4324)

* refactor tests

* refactor bloom model

* finish policy tests

* refactor tests

* fix test pure pipeline

* remove test pipeline and cutdown launch process

* refactor tests

* refactor bloom model

* finish policy tests

* refactor tests

* fix test pure pipeline

* remove test pipeline and cutdown launch process

* [pipeline] support fp32 for HybridPlugin/merge shardformer test and pipeline test into one file (#4354)

* add naive optimizer for 3DPlugin/refactor gpt2 shardformer test

* merge tests of PP/DP/TP combinations into one test file

* fix bug when sync grad for dp in HybridPlugin

* update supported precisions for 3DPlugin/fix bug when shifting tp_degree

* improve the passing of lazy_init

* modify lazy_init/use sync_shared_params

* Feature/vit support (#4182)

* [shardformer] added tests

* [shardformer] vit test finish and support

* fix attention dropout

* [shardformer] support SAM (#4231)

* 1.support sam 2.add fused qkv for nn.Linear

* update utils support set element in list

* overtwrite SamVisionAttention foward to use DropoutForParallelInput

* remove unused code

* [shardformer] support whisper (#4212)

* support whisper

* fix bug in vocabembedding

* support downstream model of whisper

* update readme

* Feature/chatglm (#4240)

* [shardformer] added tests

* [shardformer] vit test finish and support

* [shardformer] chatglm ready

* import chatglm

* [shardformer] add test kit in model zoo for chatglm

* [sharformer] add first version of policy of chatglm

* [shardformer] polish chatglm code

* [shardformer] polish code

* [shardformer] support chatglm without layernorm

* [shardformer] chatglm shard without mlp sharding

* [shardformer] delete some file

* [shardformer] ChatGLM support layernorm sharding

* [shardformer] register without auto policy

* [shardformer] pre-commit check files

* [shardformer] fix chatglm configuration with pre-commit

* [shardformer] added tests

* [shardformer] vit test finish and support

* import chatglm

* [shardformer] add test kit in model zoo for chatglm

* [sharformer] add first version of policy of chatglm

* [shardformer] polish chatglm code

* [shardformer] polish code

* [shardformer] support chatglm without layernorm

* [shardformer] delete some file

* [shardformer] ChatGLM support layernorm sharding

* [shardformer] register without auto policy

* [shardformer] pre-commit check files

* [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit

* [shardformer] support Blip2 (#4243)

* support base blip2

* add support for downstream blip2 model

* update readme

* add forward injection

* skip not compatible models test

* fix test for gemini and low_level_zero_pugin

* update some module with new api version

* [test] skip some not compatible models

* [test] Hotfix/fix some model test and refactor check util api (#4369)

* fix llama test

* fix test bug of bert, blip2, bloom, gpt2

* fix llama test

* fix opt test

* fix sam test

* fix sam test

* fix t5 test

* fix vit test

* fix whisper test

* fix whisper test

* polish code

* adjust allclose parameter

* Add mistakenly deleted code

* addjust allclose

* change loss function for some base model

* [shardformer] add util functions for shardformer tests/fix sync_shared_param (#4366)

* add util functions for shardformer tests & rewrite gpt2 test

* fix shared_params & embedding/merging

* fix precision

* [pipeline] add chatglm  (#4363)

* add pipeline policy and bert forward to be done

* add bertmodel pipeline forward and make tests

* add Bert_Policy and test for policy

* update formatting

* update formatting

* update the code

* fix bugs

* fix name confilt

* add bloom model and policy ,revise the base class of policy

* revise

* revision

* add bert_for_pretraining

* add bert_for_pretraining forward and policy

* fix typos

* cancel warning

* change the imediate output to default dict

* change the default output of get_shared_params

* add chatglm

* add

* chatglm

* chatglm

* finish chatglm

* deletes

* fix rmsnorm

* chatglm

* fix chatglm shard

* init

* [Shardformer] Merge flash attention branch to pipeline branch (#4362)

* [shardformer] supported flash attention test dependency (#4158)

* [shardformer] fix flash attention utils test (#4180)

* [shardformer] opt support flash attention (#4163)

* [shardformer] opt support flash attention

* [shardformer] opt support flash attention

* [shardformer] opt support flash attention

* [shardformer] opt support flash attention

* [shardformer] move to modeling

* [shardformer] move to modeling

* [shardformer] add performance benchmark of shardformer (#4175)

* [shardformer] opt support flash attention

* [shardformer] opt support flash attention

* [shardformer] opt support flash attention

* [shardformer] benchmark fix

* [shardformer] benchmark fix

* [shardformer] llama support flash attention (#4185)

* [shardformer] opt support flash attention

* [shardformer] opt support flash attention

* [shardformer] opt support flash attention

* [shardformer] opt support flash attention

* [shardformer] move to modeling

* [shardformer] move to modeling

* [shardformer] llama support flash attention

* [shardformer] llama support flash attention

* [shardformer] Move the import statement for xformer outside the forward function.

* [shardformer] gpt2 support flash attention. (#4191)

* [shardformer] opt support flash attention

* [shardformer] opt support flash attention

* [shardformer] opt support flash attention

* [shardformer] opt support flash attention

* [shardformer] move to modeling

* [shardformer] move to modeling

* [shardformer] gpt2 support flash attention

* [shardformer] gpt2 support flash attention

* [shardformer] gpt2 support flash attention

* [shardformer] bloom support flash attention (#4188)

* [shardformer] opt support flash attention

* [shardformer] opt support flash attention

* [shardformer] opt support flash attention

* [shardformer] opt support flash attention

* [shardformer] move to modeling

* [shardformer] move to modeling

* [shardformer] bloom suport flash attention

* [shardformer] add assert to sequence length

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

* [shardformer] bert support flash attention. (#4206)

* [shardformer] opt support flash attention

* [shardformer] opt support flash attention

* [shardformer] opt support flash attention

* [shardformer] opt support flash attention

* [shardformer] move to modeling

* [shardformer] move to modeling

* [shardformer] bert support flash attention

* [shardformer] t5 support flash attention. (#4216)

* [shardformer] opt support flash attention

* [shardformer] opt support flash attention

* [shardformer] opt support flash attention

* [shardformer] opt support flash attention

* [shardformer] move to modeling

* [shardformer] move to modeling

* [shardformer] t5 support flash attention

* [shardformer] t5 support flash attention

* fix typo

* fix typo

* fix typo

* fix typo

* fix typo

* fix typo

* [shardformer] support 'paddedcausal'  type of attention mask in Coloattention. (#4215)

* added padded causal attn mask type for ColoAttention

* [shardformer]t5 flash attention fix (#4239)

* [shardformer] opt support flash attention

* [shardformer] opt support flash attention

* [shardformer] opt support flash attention

* [shardformer] opt support flash attention

* [shardformer] move to modeling

* [shardformer] move to modeling

* [shardformer] t5 flash attention fix

* [shardformer] update gpt2 to use coloattention. (#4234)

* [shardformer] opt support flash attention

* [shardformer] opt support flash attention

* [shardformer] opt support flash attention

* [shardformer] opt support flash attention

* [shardformer] move to modeling

* [shardformer] move to modeling

* [shardformer] update gpt2 to use coloattention

* [shardformer] update gpt2 to use coloattention

* [shardformer] update gpt2 to use coloattention

* [shardformer] update gpt2 to use coloattention

* [shardformer] update gpt2

* [shardformer] update opt and llama to use coloattention. (#4226)

* [shardformer] opt support flash attention

* [shardformer] opt support flash attention

* [shardformer] opt support flash attention

* [shardformer] opt support flash attention

* [shardformer] move to modeling

* [shardformer] move to modeling

* update opt to use coloattention

* [shardformer]update opt to use coloattention

* [shardformer]update opt to use coloattention

* [shardformer]update opt to use coloattention

* [shardformer]update opt to use coloattention

* [shardformer]update opt to use coloattention

* [shardformer]update opt to use coloattention

* [shardformer]update opt

* [shardformer] shardformer support jit fused operator. (#4236)

* [shardformer] opt support flash attention

* [shardformer] opt support flash attention

* [shardformer] opt support flash attention

* [shardformer] opt support flash attention

* [shardformer] move to modeling

* [shardformer] move to modeling

* [shardformer] bloom support jit fused operator

* [shardformer] bloom support jit fused operator

* [shardformer] bloom support jit fused operator

* [shardformer] t5 support jit fused operator

* [shardformer] t5 support jit fused operator

* [shardformer] t5 support jit fused operator

* [shardformer] add roadmap of flash attention

* [shardformer] add roadmap of flash attention

* [shardformer] add roadmap of flash attention

* [shardformer] add type hint to 'self' param of forward

* [shardformer] merge feature/shardformer-models branch to feature/flash-attention-shardformer branch. (#4290)

* Feature/vit support (#4182)

* [shardformer] added tests

* [shardformer] vit test finish and support

* fix attention dropout

* [shardformer] support SAM (#4231)

* 1.support sam 2.add fused qkv for nn.Linear

* update utils support set element in list

* overtwrite SamVisionAttention foward to use DropoutForParallelInput

* remove unused code

* [shardformer] support whisper (#4212)

* support whisper

* fix bug in vocabembedding

* support downstream model of whisper

* update readme

* Feature/chatglm (#4240)

* [shardformer] added tests

* [shardformer] vit test finish and support

* [shardformer] chatglm ready

* import chatglm

* [shardformer] add test kit in model zoo for chatglm

* [sharformer] add first version of policy of chatglm

* [shardformer] polish chatglm code

* [shardformer] polish code

* [shardformer] support chatglm without layernorm

* [shardformer] chatglm shard without mlp sharding

* [shardformer] delete some file

* [shardformer] ChatGLM support layernorm sharding

* [shardformer] register without auto policy

* [shardformer] pre-commit check files

* [shardformer] fix chatglm configuration with pre-commit

---------

Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>
Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com>

* [shardformer] whisper support flash attention (#4301)

* Feature/vit support (#4182)

* [shardformer] added tests

* [shardformer] vit test finish and support

* fix attention dropout

* [shardformer] support SAM (#4231)

* 1.support sam 2.add fused qkv for nn.Linear

* update utils support set element in list

* overtwrite SamVisionAttention foward to use DropoutForParallelInput

* remove unused code

* [shardformer] support whisper (#4212)

* support whisper

* fix bug in vocabembedding

* support downstream model of whisper

* update readme

* Feature/chatglm (#4240)

* [shardformer] added tests

* [shardformer] vit test finish and support

* [shardformer] chatglm ready

* import chatglm

* [shardformer] add test kit in model zoo for chatglm

* [sharformer] add first version of policy of chatglm

* [shardformer] polish chatglm code

* [shardformer] polish code

* [shardformer] support chatglm without layernorm

* [shardformer] chatglm shard without mlp sharding

* [shardformer] delete some file

* [shardformer] ChatGLM support layernorm sharding

* [shardformer] register without auto policy

* [shardformer] pre-commit check files

* [shardformer] fix chatglm configuration with pre-commit

* [shardformer] whisper support flash attention

* [shardformer] whisper support flash attention

* [shardformer]whisper support jit operator

---------

Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>
Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com>

* [shardformer] sam support flash attention (#4316)

* Feature/vit support (#4182)

* [shardformer] added tests

* [shardformer] vit test finish and support

* fix attention dropout

* [shardformer] support SAM (#4231)

* 1.support sam 2.add fused qkv for nn.Linear

* update utils support set element in list

* overtwrite SamVisionAttention foward to use DropoutForParallelInput

* remove unused code

* [shardformer] support whisper (#4212)

* support whisper

* fix bug in vocabembedding

* support downstream model of whisper

* update readme

* Feature/chatglm (#4240)

* [shardformer] added tests

* [shardformer] vit test finish and support

* [shardformer] chatglm ready

* import chatglm

* [shardformer] add test kit in model zoo for chatglm

* [sharformer] add first version of policy of chatglm

* [shardformer] polish chatglm code

* [shardformer] polish code

* [shardformer] support chatglm without layernorm

* [shardformer] chatglm shard without mlp sharding

* [shardformer] delete some file

* [shardformer] ChatGLM support layernorm sharding

* [shardformer] register without auto policy

* [shardformer] pre-commit check files

* [shardformer] fix chatglm configuration with pre-commit

* [shardformer] sam support flash attention

---------

Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>
Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com>

* [shardformer] merge blip2/chatglm  (#4321)

* Feature/vit support (#4182)

* [shardformer] added tests

* [shardformer] vit test finish and support

* fix attention dropout

* [shardformer] support SAM (#4231)

* 1.support sam 2.add fused qkv for nn.Linear

* update utils support set element in list

* overtwrite SamVisionAttention foward to use DropoutForParallelInput

* remove unused code

* [shardformer] support whisper (#4212)

* support whisper

* fix bug in vocabembedding

* support downstream model of whisper

* update readme

* Feature/chatglm (#4240)

* [shardformer] added tests

* [shardformer] vit test finish and support

* [shardformer] chatglm ready

* import chatglm

* [shardformer] add test kit in model zoo for chatglm

* [sharformer] add first version of policy of chatglm

* [shardformer] polish chatglm code

* [shardformer] polish code

* [shardformer] support chatglm without layernorm

* [shardformer] chatglm shard without mlp sharding

* [shardformer] delete some file

* [shardformer] ChatGLM support layernorm sharding

* [shardformer] register without auto policy

* [shardformer] pre-commit check files

* [shardformer] fix chatglm configuration with pre-commit

* [shardformer] added tests

* [shardformer] vit test finish and support

* import chatglm

* [shardformer] add test kit in model zoo for chatglm

* [sharformer] add first version of policy of chatglm

* [shardformer] polish chatglm code

* [shardformer] polish code

* [shardformer] support chatglm without layernorm

* [shardformer] delete some file

* [shardformer] ChatGLM support layernorm sharding

* [shardformer] register without auto policy

* [shardformer] pre-commit check files

* [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit

* [shardformer] support Blip2 (#4243)

* support base blip2

* add support for downstream blip2 model

* update readme

* add forward injection

* skip not compatible models test

* fix test for gemini and low_level_zero_pugin

---------

Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>
Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: klhhhhh <1412841649@qq.com>

* [shardformer] blip2 support flash attention and jit operator (#4325)

* Feature/vit support (#4182)

* [shardformer] added tests

* [shardformer] vit test finish and support

* fix attention dropout

* [shardformer] support SAM (#4231)

* 1.support sam 2.add fused qkv for nn.Linear

* update utils support set element in list

* overtwrite SamVisionAttention foward to use DropoutForParallelInput

* remove unused code

* [shardformer] support whisper (#4212)

* support whisper

* fix bug in vocabembedding

* support downstream model of whisper

* update readme

* Feature/chatglm (#4240)

* [shardformer] added tests

* [shardformer] vit test finish and support

* [shardformer] chatglm ready

* import chatglm

* [shardformer] add test kit in model zoo for chatglm

* [sharformer] add first version of policy of chatglm

* [shardformer] polish chatglm code

* [shardformer] polish code

* [shardformer] support chatglm without layernorm

* [shardformer] chatglm shard without mlp sharding

* [shardformer] delete some file

* [shardformer] ChatGLM support layernorm sharding

* [shardformer] register without auto policy

* [shardformer] pre-commit check files

* [shardformer] fix chatglm configuration with pre-commit

* [shardformer] added tests

* [shardformer] vit test finish and support

* import chatglm

* [shardformer] add test kit in model zoo for chatglm

* [sharformer] add first version of policy of chatglm

* [shardformer] polish chatglm code

* [shardformer] polish code

* [shardformer] support chatglm without layernorm

* [shardformer] delete some file

* [shardformer] ChatGLM support layernorm sharding

* [shardformer] register without auto policy

* [shardformer] pre-commit check files

* [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit

* [shardformer] support Blip2 (#4243)

* support base blip2

* add support for downstream blip2 model

* update readme

* add forward injection

* skip not compatible models test

* fix test for gemini and low_level_zero_pugin

* [shardformer] blip2 support flash attention and jit operator

* [shardformer] blip2 support flash attention and jit operator

* [shardformer] blip2 support flash attention and jit operator

---------

Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>
Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: klhhhhh <1412841649@qq.com>

* [shardformer] chatglm support flash attention and jit operator (#4330)

* Feature/vit support (#4182)

* [shardformer] added tests

* [shardformer] vit test finish and support

* fix attention dropout

* [shardformer] support SAM (#4231)

* 1.support sam 2.add fused qkv for nn.Linear

* update utils support set element in list

* overtwrite SamVisionAttention foward to use DropoutForParallelInput

* remove unused code

* [shardformer] support whisper (#4212)

* support whisper

* fix bug in vocabembedding

* support downstream model of whisper

* update readme

* Feature/chatglm (#4240)

* [shardformer] added tests

* [shardformer] vit test finish and support

* [shardformer] chatglm ready

* import chatglm

* [shardformer] add test kit in model zoo for chatglm

* [sharformer] add first version of policy of chatglm

* [shardformer] polish chatglm code

* [shardformer] polish code

* [shardformer] support chatglm without layernorm

* [shardformer] chatglm shard without mlp sharding

* [shardformer] delete some file

* [shardformer] ChatGLM support layernorm sharding

* [shardformer] register without auto policy

* [shardformer] pre-commit check files

* [shardformer] fix chatglm configuration with pre-commit

* [shardformer] added tests

* [shardformer] vit test finish and support

* import chatglm

* [shardformer] add test kit in model zoo for chatglm

* [sharformer] add first version of policy of chatglm

* [shardformer] polish chatglm code

* [shardformer] polish code

* [shardformer] support chatglm without layernorm

* [shardformer] delete some file

* [shardformer] ChatGLM support layernorm sharding

* [shardformer] register without auto policy

* [shardformer] pre-commit check files

* [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit

* [shardformer] support Blip2 (#4243)

* support base blip2

* add support for downstream blip2 model

* update readme

* add forward injection

* skip not compatible models test

* fix test for gemini and low_level_zero_pugin

* [shardformer] chatglm support flash attention and jit operator

* [shardformer] chatglm support flash attention and jit operator

* [shardformer] chatglm support flash attention and jit operator

* [shardformer] chatglm support flash attention and jit operator

---------

Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>
Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: klhhhhh <1412841649@qq.com>

* [shardformer] vit support flash attention and jit operator (#4334)

* Feature/vit support (#4182)

* [shardformer] added tests

* [shardformer] vit test finish and support

* fix attention dropout

* [shardformer] support SAM (#4231)

* 1.support sam 2.add fused qkv for nn.Linear

* update utils support set element in list

* overtwrite SamVisionAttention foward to use DropoutForParallelInput

* remove unused code

* [shardformer] support whisper (#4212)

* support whisper

* fix bug in vocabembedding

* support downstream model of whisper

* update readme

* Feature/chatglm (#4240)

* [shardformer] added tests

* [shardformer] vit test finish and support

* [shardformer] chatglm ready

* import chatglm

* [shardformer] add test kit in model zoo for chatglm

* [sharformer] add first version of policy of chatglm

* [shardformer] polish chatglm code

* [shardformer] polish code

* [shardformer] support chatglm without layernorm

* [shardformer] chatglm shard without mlp sharding

* [shardformer] delete some file

* [shardformer] ChatGLM support layernorm sharding

* [shardformer] register without auto policy

* [shardformer] pre-commit check files

* [shardformer] fix chatglm configuration with pre-commit

* [shardformer] added tests

* [shardformer] vit test finish and support

* import chatglm

* [shardformer] add test kit in model zoo for chatglm

* [sharformer] add first version of policy of chatglm

* [shardformer] polish chatglm code

* [shardformer] polish code

* [shardformer] support chatglm without layernorm

* [shardformer] delete some file

* [shardformer] ChatGLM support layernorm sharding

* [shardformer] register without auto policy

* [shardformer] pre-commit check files

* [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit

* [shardformer] support Blip2 (#4243)

* support base blip2

* add support for downstream blip2 model

* update readme

* add forward injection

* skip not compatible models test

* fix test for gemini and low_level_zero_pugin

* [shardformer] vit support flash attention and jit operator

* [shardformer] vit support flash attention and jit operator

---------

Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>
Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: klhhhhh <1412841649@qq.com>

* [pipeline] merge flash attention branch

* [pipeline] merge flash attention branch

* [pipeline] merge flash attention branch

* [pipeline] fix conflict

* [pipeline] fix conflict

* Merge branch 'feature/pipeline' into feature/pipeline

* Merge branch 'feature/pipeline' into feature/pipeline

* Merge branch 'feature/pipeline' into feature/pipeline

* activate checks

* activate checks

* activate checks

* activate checks

* activate checks

* activate checks

* activate checks

* activate checks

* fix flash attention tests

* gemini ignore whisper

* fix vit

* fix xformers import handle

---------

Co-authored-by: Frank Lee <somerlee.9@gmail.com>
Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com>
Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: klhhhhh <1412841649@qq.com>

* [pipeline] rewrite t5 tests & support multi-tensor transmitting in pipeline (#4388)

* fix remaining t5 bugs/rewrite t5 tests

* fix multi-tensor communication in pipeline

* rearrange test_config

* fix keyerror in sync_shared_params

* fix get_held_layers & Randomnizer, complete t5 tests

* erase printing

* fix get_held_layers through modifying _release_unheld_layers

* fix _get_recursive_held_layers bug

* [shardformer] update shardformer to use flash attention 2 (#4392)

* cherry-pick flash attention 2

cherry-pick flash attention 2

* [shardformer] update shardformer to use flash attention 2

[shardformer] update shardformer to use flash attention 2, fix

[shardformer] update shardformer to use flash attention 2, fix

[shardformer] update shardformer to use flash attention 2, fix

* [shardformer] test all optimizations (#4399)

[shardformer] test all optimizations

[shardformer] test all optimizations

[shardformer] test all optimizations

* [pipeline] rewrite bert tests and fix some bugs (#4409)

* add pipeline policy and bert forward to be done

* add bertmodel pipeline forward and make tests

* add Bert_Policy and test for policy

* update formatting

* update formatting

* update the code

* fix bugs

* fix name confilt

* add bloom model and policy ,revise the base class of policy

* revise

* revision

* add bert_for_pretraining

* add bert_for_pretraining forward and policy

* fix typos

* cancel warning

* change the imediate output to default dict

* change the default output of get_shared_params

* rewrite bert test

* rewrite bert test

* fix some bugs

* del pipeline tests

* del pipeline tests

* del useless print

* del useless print

* rewrite data repeats

* [shardformer]fix, test gpt2 for AMP+TP (#4403)

* [shardformer] gpt2 tests fix

[shardformer] test all optimizations (#4399)

[shardformer] test all optimizations

[shardformer] test all optimizations

[shardformer] test all optimizations

[shardformer] gpt2 tests fix

* [shardformer] gpt2 tests fix

* [shardformer] rewrite tests for opt/bloom/llama/vit/chatglm (#4395)

* rewrite opt tests

* rewrite llama tests

* rewrite bloom & vit tests

* rewrite chatglm tests

* fix LinearCol for classfiers

* add judge for other tp layers, fix lazy init in util

* [shardformer] update tests for all optimization (#4413)

[shardformer] update tests for all optimization

* [shardformer]update t5 tests for using all optimizations. (#4407)

* [shardformer] gpt2 tests fix

[shardformer] test all optimizations (#4399)

[shardformer] test all optimizations

[shardformer] test all optimizations

[shardformer] test all optimizations

[shardformer] gpt2 tests fix

* [shardformer]update t5 to use all optimizations

* [shardformer] update bloom/llama/vit/chatglm tests (#4420)

[shardformer] update bloom/llama/vit/chatglm tests

[shardformer] update opt tests

[shardformer] update opt tests

[shardformer] update bloom/llama/vit/chatglm tests

[shardformer] update bloom/llama/vit/chatglm tests

[shardformer] update bloom/llama/vit/chatglm tests

* [misc] resolve code factor issues (#4433)

* [misc] update requirements

* [shardformer] fix embedding

* [shardformer] fix import

* [format] applied code formatting on changed files in pull request 4441 (#4445)

Co-authored-by: github-actions <github-actions@github.com>

* [shardformer/sequence parallel] Cherry pick commit to new branch (#4450)

* [shardformer/sequence parallel] Support sequence parallel for gpt2 (#4384)

* [sequence parallel] add sequence parallel linear col/row support (#4336)

* add sequence parallel linear col/row support

* add annotation

* add annotation

* add support for gpt2 fused qkv linear layer

* support sequence parallel in GPT2

* add docstring and note

* add requirments

* remove unused flash-attb

* modify flash attn test

* modify flash attn setting

* modify flash attn code

* add assert before divide, rename forward function

* [shardformer/test] fix gpt2 test with seq-parallel

* [shardformer/sequence parallel] Overlap input gather and grad computation during col backward (#4401)

* overlap gather input / grad computing during col backward

* modify test for overlap

* simplify code

* fix code and modify cuda stream synchronize

* [shardformer/sequence parallel] polish code

* [shardformer] support DDP in HybridPlugin/add tp+dp tests (#4446)

* support DDP for HybridPlugin/add tp+dp tests

* add docstring for HybridParallelPlugin

* [devops] add large-scale distributed test marker (#4452)

* [test] remove cpu marker

* [test] remove gpu marker

* [test] update pytest markers

* [ci] update unit test ci

* [shardformer] support interleaved pipeline (#4448)

* support interleaved pipeline

* fix unit test

* remove virtual stage test in stage mgr

* add droped type hint and updated bwd

* [shardformer/sequence parallel] support gpt2 seq parallel with pp/dp/tp (#4460)

* support gpt2 seq parallel with pp/dp/tp

* fix a bug when waiting for stream done

* delete unused gpt2_seq file

* [shardformer] bloom support sequence parallel (#4465)

[shardformer] bloom support sequence parallel

* [shardformer] bert support sequence parallel. (#4455)

* [shardformer] bert support sequence parallel

[shardformer] bert support sequence parallel

[shardformer] bert support sequence parallel

[shardformer] bert support sequence parallel

[shardformer] bert support sequence parallel

[shardformer] bert support sequence parallel

[shardformer] bert support sequence parallel

[shardformer] bert support sequence parallel

[shardformer] bert support sequence parallel

* [shardformer] bert support sequence parallel

[shardformer] bert support sequence parallel

[shardformer] bert support sequence parallel

* [shardformer] bert support sequence parallel

* [shardformer] Pipeline/whisper (#4456)

* add some base tests and policies

* finish whisper base model

* add conditional generation

* finish basic tests

* whisper

* finish whisper

* finish whisper

* del useless  whisper test

* fix

* add argmin to replace

* finish revision

* [shardformer] support tp+zero for shardformer (#4472)

* support tp+zero/input type cast for hybridplugin

* add tp+zero tests

* fix bucket arguments

* [chat] update config and prompt (#4139)

* update config and prompt

* update config

---------

Co-authored-by: Qianran Ma <qianranm@luchentech.com>

* rename chatglm to chatglm2 (#4484)

* [shardformer/sequence parallel] not support opt of seq-parallel, add warning and fix a bug in gpt2 pp (#4488)

* [shardformer] chatglm support sequence parallel (#4482)

* [shardformer] chatglm support sequence parallel

[shardformer] chatglm support sequence parallel

[shardformer] chatglm support sequence parallel

[shardformer] chatglm support sequence parallel

[shardformer] chatglm support sequence parallel

[shardformer] chatglm support sequence parallel

* fix

fix

fix

fix

* [shardformer] tests for 3d parallel (#4493)

* [gemini] improve compatibility and add static placement policy (#4479)

* [gemini] remove distributed-related part from colotensor (#4379)

* [gemini] remove process group dependency

* [gemini] remove tp part from colo tensor

* [gemini] patch inplace op

* [gemini] fix param op hook and update tests

* [test] remove useless tests

* [test] remove useless tests

* [misc] fix requirements

* [test] fix model zoo

* [test] fix model zoo

* [test] fix model zoo

* [test] fix model zoo

* [test] fix model zoo

* [misc] update requirements

* [gemini] refactor gemini optimizer and gemini ddp (#4398)

* [gemini] update optimizer interface

* [gemini] renaming gemini optimizer

* [gemini] refactor gemini ddp class

* [example] update gemini related example

* [example] update gemini related example

* [plugin] fix gemini plugin args

* [test] update gemini ckpt tests

* [gemini] fix checkpoint io

* [example] fix opt example requirements

* [example] fix opt example

* [example] fix opt example

* [example] fix opt example

* [gemini] add static placement policy (#4443)

* [gemini] add static placement policy

* [gemini] fix param offload

* [test] update gemini tests

* [plugin] update gemini plugin

* [plugin] update gemini plugin docstr

* [misc] fix flash attn requirement

* [test] fix gemini checkpoint io test

* [example] update resnet example result (#4457)

* [example] update bert example result (#4458)

* [doc] update gemini doc (#4468)

* [example] update gemini related examples (#4473)

* [example] update gpt example

* [example] update dreambooth example

* [example] update vit

* [example] update opt

* [example] update palm

* [example] update vit and opt benchmark

* [hotfix] fix bert in model zoo (#4480)

* [hotfix] fix bert in model zoo

* [test] remove chatglm gemini test

* [test] remove sam gemini test

* [test] remove vit gemini test

* [hotfix] fix opt tutorial example (#4497)

* [hotfix] fix opt tutorial example

* [hotfix] fix opt tutorial example

* [shardformer] vit/llama/t5 ignore the sequence parallelism flag and some fix. (#4498)

* [shardformer] chatglm support sequence parallel

[shardformer] chatglm support sequence parallel

[shardformer] chatglm support sequence parallel

[shardformer] chatglm support sequence parallel

[shardformer] chatglm support sequence parallel

[shardformer] chatglm support sequence parallel

* fix

fix

fix

fix

* [shardformer] jit fused fix

* [shardformer] jit fused fix

* [shardformer] jit fused fix

* [shardformer] jit fused fix

* [shardformer] jit fused fix

* [shardformer] jit fused fix

* [shardformer] jit fused fix

* activate checks

* [format] applied code formatting on changed files in pull request 4479 (#4504)

Co-authored-by: github-actions <github-actions@github.com>

* [zero]support zero2 with gradient accumulation (#4511)

* support gradient accumulation with zero2

* fix type

* [shardformer] opt fix. (#4514)

* [shardformer] chatglm support sequence parallel

[shardformer] chatglm support sequence parallel

[shardformer] chatglm support sequence parallel

[shardformer] chatglm support sequence parallel

[shardformer] chatglm support sequence parallel

[shardformer] chatglm support sequence parallel

* fix

fix

fix

fix

* [shardformer] jit fused fix

* [shardformer] jit fused fix

* [shardformer] jit fused fix

* [shardformer] jit fused fix

* [shardformer] jit fused fix

* [shardformer] jit fused fix

* [shardformer] jit fused fix

* activate checks

* [Test] test ci

* test ci

* test ci

* test ci

* test ci

* test ci

* test ci

* fix

* [shardformer] support sharded checkpoint IO for models of HybridParallelPlugin  (#4506)

* add APIs

* implement save_sharded_model

* add test for hybrid checkpointio

* implement naive loading for sharded model

* implement efficient sharded model loading

* open a new file for hybrid checkpoint_io

* small fix

* fix circular importing

* fix docstring

* arrange arguments and apis

* small fix

* [shardformer] zero1+pp and the corresponding tests (#4517)

* pause

* finish pp+zero1

* Update test_shard_vit.py

* [shardformer/fix overlap bug] fix overlap bug, add overlap as an option in shardco… (#4516)

* fix overlap bug and support bert, add overlap as an option in shardconfig

* support overlap for chatglm and bloom

* [example] add llama2 example (#4527)

* [example] transfer llama-1 example

* [example] fit llama-2

* [example] refactor scripts folder

* [example] fit new gemini plugin

* [cli] fix multinode runner

* [example] fit gemini optim checkpoint

* [example] refactor scripts

* [example] update requirements

* [example] update requirements

* [example] rename llama to llama2

* [example] update readme and pretrain script

* [example] refactor scripts

* [shardformer] fix emerged bugs after updating transformers (#4526)

* [coati] add chatglm model (#4539)

* update configuration of chatglm and add support in coati

* add unit test & update chatglm default config & fix bos index issue

* remove chatglm due to oom

* add dataset pkg in requirement-text

* fix parameter issue in test_models

* add ref in tokenize & rm unnessary parts

* separate source & target tokenization in chatglm

* add unit test to chatglm

* fix test dataset issue

* update truncation of chatglm

* fix Colossalai version

* fix colossal ai version in test

* [shardformer] Add overlap support for gpt2 (#4535)

* add overlap support for gpt2

* remove unused code

* remove unused code

* [coati] update ci

* fix colossalai version in coati examples

* [shardformer] fix opt test hanging (#4521)

* [shardformer] fix opt test hanging

* fix

* test

* test

* test

* fix test

* fix test

* remove print

* add fix

* keep requirements same with main branch

* fix runtime prepare pass (#4502)

Co-authored-by: lufang.chen <lufang.chen@nio.com>

* [shardformer] support pp+tp+zero1 tests (#4531)

* [shardformer] fix opt test hanging

* fix

* test

* test

* test

* fix test

* fix test

* remove print

* add fix

* [shardformer] pp+tp+zero1

[shardformer] pp+tp+zero1

[shardformer] pp+tp+zero1

[shardformer] pp+tp+zero1

[shardformer] pp+tp+zero1

[shardformer] pp+tp+zero1

* [shardformer] pp+tp+zero1

* [shardformer] pp+tp+zero1

* [shardformer] pp+tp+zero1

* [shardformer] pp+tp+zero1

* [example] update streamlit 0.73.1 to 1.11.1 (#4386)

* [example] change accelerate version (#4431)

Co-authored-by: Siyuan Tian <siyuant@vmware.com>
Co-authored-by: Hongxin Liu <lhx0217@gmail.com>

* [devops] cancel previous runs in the PR (#4546)

* [shardformer] fix submodule replacement bug when enabling pp (#4544)

* [shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540)

* implement sharded optimizer saving

* add more param info

* finish implementation of sharded optimizer saving

* fix bugs in optimizer sharded saving

* add pp+zero test

* param group loading

* greedy loading of optimizer

* fix bug when loading

* implement optimizer sharded saving

* add optimizer test & arrange checkpointIO utils

* fix gemini sharding state_dict

* add verbose option

* add loading of master params

* fix typehint

* fix master/working mapping in fp16 amp

* [shardformer] support from_pretrained when loading model with HybridParallelPlugin (#4575)

* hybrid plugin support huggingface from_pretrained

* add huggingface compatibility tests

* add folder cleaning

* fix bugs

* [zero]fix zero ckptIO with offload (#4529)

* fix zero ckptio with offload

* fix load device

* saved tensors in ckpt should be on CPU

* fix unit test

* fix unit test

* add clear cache

* save memory for CI

* Update Dockerfile (#4499)

fix dockerfile build

* [Fix] Fix compile error (#4357)

* [pipeline] 1f1b schedule receive microbatch size (#4589)

* [checkpointio] optimize zero optim checkpoint io (#4591)

* [zero] update checkpoint io to save memory

* [checkpointio] add device map to save memory

* [DOC] hotfix/llama2news (#4595)

* [doc] add llama2 news

* [doc] add llama2 news

* [doc] add llama2 news

* [doc] add llama2 benchmark (#4604)

* [doc] add llama2 benchmark

* [doc] add llama2 benchmark

* [shardformer] Pytree fix (#4533)

* pytree test

* test bert

* test bert

* test bert

* revise

* add register

* add register

* [shardformer] update bert finetune example with HybridParallelPlugin (#4584)

* [shardformer] fix opt test hanging

* fix

* test

* test

* test

* fix test

* fix test

* remove print

* add fix

* [shardformer] add bert finetune example

* [shardformer] add bert finetune example

* [shardformer] add bert finetune example

* [shardformer] add bert finetune example

* [shardformer] add bert finetune example

* [shardformer] add bert finetune example

* [shardformer] fix epoch change

* [shardformer] broadcast add pp group

* [shardformer] fix opt test hanging

* fix

* test

* test

* [shardformer] zero1+pp and the corresponding tests (#4517)

* pause

* finish pp+zero1

* Update test_shard_vit.py

* [shardformer/fix overlap bug] fix overlap bug, add overlap as an option in shardco… (#4516)

* fix overlap bug and support bert, add overlap as an option in shardconfig

* support overlap for chatglm and bloom

* [shardformer] fix emerged bugs after updating transformers (#4526)

* test

* fix test

* fix test

* remove print

* add fix

* [shardformer] add bert finetune example

* [shardformer] add bert finetune example

* [shardformer] Add overlap support for gpt2 (#4535)

* add overlap support for gpt2

* remove unused code

* remove unused code

* [shardformer] support pp+tp+zero1 tests (#4531)

* [shardformer] fix opt test hanging

* fix

* test

* test

* test

* fix test

* fix test

* remove print

* add fix

* [shardformer] pp+tp+zero1

[shardformer] pp+tp+zero1

[shardformer] pp+tp+zero1

[shardformer] pp+tp+zero1

[shardformer] pp+tp+zero1

[shardformer] pp+tp+zero1

* [shardformer] pp+tp+zero1

* [shardformer] pp+tp+zero1

* [shardformer] pp+tp+zero1

* [shardformer] pp+tp+zero1

* [shardformer] fix submodule replacement bug when enabling pp (#4544)

* [shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540)

* implement sharded optimizer saving

* add more param info

* finish implementation of sharded optimizer saving

* fix bugs in optimizer sharded saving

* add pp+zero test

* param group loading

* greedy loading of optimizer

* fix bug when loading

* implement optimizer sharded saving

* add optimizer test & arrange checkpointIO utils

* fix gemini sharding state_dict

* add verbose option

* add loading of master params

* fix typehint

* fix master/working mapping in fp16 amp

* [shardformer] add bert finetune example

* [shardformer] add bert finetune example

* [shardformer] add bert finetune example

* [shardformer] add bert finetune example

* [shardformer] fix epoch change

* [shardformer] broadcast add pp group

* rebase feature/shardformer

* update pipeline

* [shardformer] fix

* [shardformer] fix

* [shardformer] bert finetune fix

* [shardformer] add all_reduce operation to loss

add all_reduce operation to loss

* [shardformer] make compatible with pytree.

make compatible with pytree.

* [shardformer] disable tp

disable tp

* [shardformer] add 3d plugin to ci test

* [shardformer] update num_microbatches to None

* [shardformer] update microbatchsize

* [shardformer] update assert

* update scheduler

* update scheduler

---------

Co-authored-by: Jianghai <72591262+CjhHa1@users.noreply.github.com>
Co-authored-by: Bin Jia <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: Baizhou Zhang <eddiezhang@pku.edu.cn>

* [checkpointio] support huggingface from_pretrained for all plugins (#4606)

* [shardformer] Add overlap optional for HybridParallelPlugin (#4615)

* add optional overlap for plugin

* remove fixed todo

* [shardformer] update shardformer readme (#4617)

[shardformer] update shardformer readme

[shardformer] update shardformer readme

* [test] ignore gpt2 shardformer test (#4619)

* [zero] hotfix master param sync (#4618)

* [zero] add method to update master params

* [zero] update zero plugin

* [plugin] update low level zero plugin

* [test] fix gemini checkpoint and gpt test (#4620)

* [legacy] move trainer to legacy (#4545)

* [legacy] move trainer to legacy

* [doc] update docs related to trainer

* [test] ignore legacy test

* [legacy] move engine to legacy (#4560)

* [legacy] move engine to legacy

* [example] fix seq parallel example

* [example] fix seq parallel example

* [test] test gemini pluging hang

* [test] test gemini pluging hang

* [test] test gemini pluging hang

* [test] test gemini pluging hang

* [test] test gemini pluging hang

* [example] update seq parallel requirements

* [legacy] move builder and registry to legacy (#4603)

* [release] update version (#4623)

* [shardformer] Support customized policy for llamav2 based model with HybridParallelPlugin (#4624)

* Enable policy assignment in HybridPlugin and enable llama policy for llamav2

* Remove Policy from Plugin

* revert changes of plugin

HybridParallelModule

* revert changes in plugin

* upgrade transformers

* revert transformers version

---------

Co-authored-by: flybird11111 <1829166702@qq.com>

* [pipeline] set optimizer to optional in execute_pipeline (#4630)

* set optimizer to optional in execute_pipeline

* arrange device and mixed precision in booster init

* fix execute_pipeline in booster.py

* [example] update vit example for hybrid parallel plugin (#4641)

* update vit example for hybrid plugin

* reset tp/pp size

* fix dataloader iteration bug

* update optimizer passing in evaluation/add grad_accum

* change criterion

* wrap tqdm

* change grad_accum to grad_checkpoint

* fix pbar

* [devops] fix concurrency group and compatibility test (#4665)

* [devops] fix concurrency group

* [devops] fix compatibility test

* [devops] fix tensornvme install

* [devops] fix tensornvme install

* [devops] fix colossalai install

* [shardformer] update llama2/opt finetune example and fix llama2 policy (#4645)

* [shardformer] update shardformer readme

[shardformer] update shardformer readme

[shardformer] update shardformer readme

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] change dataset

* [shardformer] change dataset

* [shardformer] fix CI

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

[example] update opt example

[example] resolve comments

fix

fix

* [devops] fix concurrency group (#4667)

* [legacy] move communication and nn to legacy and refactor logger (#4671)

* [legacy] move communication to legacy (#4640)

* [legacy] refactor logger and clean up legacy codes (#4654)

* [legacy] make logger independent to gpc

* [legacy] make optim independent to registry

* [legacy] move test engine to legacy

* [legacy] move nn to legacy (#4656)

* [legacy] move nn to legacy

* [checkpointio] fix save hf config

* [test] remove useledd rpc pp test

* [legacy] fix nn init

* [example] skip tutorial hybriad parallel example

* [devops] test doc check

* [devops] test doc check

* [shardformer]fix gpt2 double head (#4663)

* [shardformer]fix gpt2 test

[shardformer]fix gpt2 test

[shardformer]fix gpt2 test

* fix

* [shardformer] add todo

* [shardformer] add todo

* [Feature] The first PR to Add TP inference engine, kv-cache manager and related kernels for our inference system (#4577)

* [infer] Infer/llama demo (#4503)

* add

* add infer example

* finish

* finish

* stash

* fix

* [Kernels]  add inference token attention kernel (#4505)

* add token forward

* fix tests

* fix comments

* add try import triton

* add adapted license

* add tests check

* [Kernels] add necessary kernels (llama & bloom) for attention forward and kv-cache manager  (#4485)

* added _vllm_rms_norm

* change place

* added tests

* added tests

* modify

* adding kernels

* added tests:

* adding kernels

* modify

* added

* updating kernels

* adding tests

* added tests

* kernel change

* submit

* modify

* added

* edit comments

* change name

* change commnets and fix import

* add

* added

* combine codes (#4509)

* [feature] add KV cache manager for llama & bloom inference (#4495)

* add kv cache memory manager

* add stateinfo during inference

* format

* format

* rename file

* add kv cache test

* revise on BatchInferState

* file dir change

* [Bug FIx] import llama context ops fix (#4524)

* added _vllm_rms_norm

* change place

* added tests

* added tests

* modify

* adding kernels

* added tests:

* adding kernels

* modify

* added

* updating kernels

* adding tests

* added tests

* kernel change

* submit

* modify

* added

* edit comments

* change name

* change commnets and fix import

* add

* added

* fix

* add ops into init.py

* add

* [Infer] Add TPInferEngine and fix file path (#4532)

* add engine for TP inference

* move file path

* update path

* fix TPInferEngine

* remove unused file

* add engine test demo

* revise TPInferEngine

* fix TPInferEngine, add test

* fix

* Add Inference test for llama (#4508)

* add kv cache memory manager

* add stateinfo during inference

* add

* add infer example

* finish

* finish

* format

* format

* rename file

* add kv cache test

* revise on BatchInferState

* add inference test for llama

* fix conflict

* feature: add some new features for llama engine

* adapt colossalai triton interface

* Change the parent class of llama  policy

* add nvtx

* move llama inference code to tensor_parallel

* fix __init__.py

* rm tensor_parallel

* fix: fix bugs in auto_policy.py

* fix:rm some unused codes

* mv colossalai/tpinference to colossalai/inference/tensor_parallel

* change __init__.py

* save change

* fix engine

* Bug fix: Fix hang

* remove llama_infer_engine.py

---------

Co-authored-by: yuanheng-zhao <jonathan.zhaoyh@gmail.com>
Co-authored-by: CjhHa1 <cjh18671720497@outlook.com>

* [infer] Add Bloom inference policy and replaced methods (#4512)

* add bloom inference methods and policy

* enable pass BatchInferState from model forward

* revise bloom infer layers/policies

* add engine for inference (draft)

* add test for bloom infer

* fix bloom infer policy and flow

* revise bloom test

* fix bloom file path

* remove unused codes

* fix bloom modeling

* fix dir typo

* fix trivial

* fix policy

* clean pr

* trivial fix

* Revert "[infer] Add Bloom inference policy and replaced methods (#4512)" (#4552)

This reverts commit 17cfa5714083a81a505c097f1c411cd28162d922.

* [Doc] Add colossal inference doc (#4549)

* create readme

* add readme.md

* fix typos

* [infer] Add Bloom inference policy and replaced methods (#4553)

* add bloom inference methods and policy

* enable pass BatchInferState from model forward

* revise bloom infer layers/policies

* add engine for inference (draft)

* add test for bloom infer

* fix bloom infer policy and flow

* revise bloom test

* fix bloom file path

* remove unused codes

* fix bloom modeling

* fix dir typo

* fix trivial

* fix policy

* clean pr

* triv…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants