Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[legacy] clean up gpc #4742

Merged
merged 6 commits into from
Sep 16, 2023

Conversation

ver217
Copy link
Member

@ver217 ver217 commented Sep 15, 2023

📌 Checklist before creating the PR

  • I have created an issue for this PR for traceability
  • The title follows the standard format: [doc/gemini/tensor/...]: A concise description
  • I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

image

💥 Checklist before requesting a review

  • I have linked my PR to an issue (instruction)
  • My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
  • I have performed a self-review of my code
  • I have added thorough tests.
  • I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

  • 🌝 Yes, I do.
  • 🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

@github-actions
Copy link
Contributor

The code coverage for the changed files is 24%.

Click me to view the complete report
Name                                                                                                      Stmts   Miss  Cover
-----------------------------------------------------------------------------------------------------------------------------
colossalai/__init__.py                                                                                        7      3    57%
colossalai/amp/__init__.py                                                                                    0      0   100%
colossalai/amp/naive_amp/__init__.py                                                                          0      0   100%
colossalai/auto_parallel/offload/amp_optimizer.py                                                            99     99     0%
colossalai/checkpoint_io/utils.py                                                                           318     60    81%
colossalai/cli/cli.py                                                                                        14     14     0%
colossalai/context/__init__.py                                                                                2      0   100%
colossalai/context/moe_context.py                                                                            65     37    43%
colossalai/fx/passes/shard_1d_pass.py                                                                        91     79    13%
colossalai/initialize.py                                                                                     45     22    51%
colossalai/legacy/__init__.py                                                                                 2      0   100%
colossalai/legacy/amp/__init__.py                                                                            20     10    50%
colossalai/legacy/amp/amp_type.py                                                                             5      0   100%
colossalai/legacy/amp/apex_amp/__init__.py                                                                    9      0   100%
colossalai/legacy/amp/apex_amp/apex_amp.py                                                                   15      4    73%
colossalai/legacy/amp/naive_amp/__init__.py                                                                  29     20    31%
colossalai/legacy/amp/naive_amp/_fp16_optimizer.py                                                          173    127    27%
colossalai/legacy/amp/naive_amp/_utils.py                                                                    21     17    19%
colossalai/legacy/amp/naive_amp/naive_amp.py                                                                 86     58    33%
colossalai/legacy/amp/torch_amp/__init__.py                                                                  15      7    53%
colossalai/legacy/amp/torch_amp/_grad_scaler.py                                                             234    187    20%
colossalai/legacy/amp/torch_amp/torch_amp.py                                                                 35     14    60%
colossalai/legacy/communication/collective.py                                                                92     78    15%
colossalai/legacy/communication/p2p.py                                                                      131    107    18%
colossalai/legacy/communication/p2p_v2.py                                                                   113    113     0%
colossalai/legacy/communication/ring.py                                                                      20     15    25%
colossalai/legacy/communication/utils.py                                                                     64     51    20%
colossalai/legacy/constants.py                                                                               11      0   100%
colossalai/legacy/context/__init__.py                                                                         4      0   100%
colossalai/legacy/context/parallel_context.py                                                               227    153    33%
colossalai/legacy/context/parallel_mode.py                                                                   21      0   100%
colossalai/legacy/context/process_group_initializer/__init__.py                                              11      0   100%
colossalai/legacy/context/process_group_initializer/initializer_1d.py                                        29     20    31%
colossalai/legacy/context/process_group_initializer/initializer_2d.py                                        72     55    24%
colossalai/legacy/context/process_group_initializer/initializer_2p5d.py                                     136    112    18%
colossalai/legacy/context/process_group_initializer/initializer_3d.py                                       155    129    17%
colossalai/legacy/context/process_group_initializer/initializer_data.py                                      27     19    30%
colossalai/legacy/context/process_group_initializer/initializer_model.py                                     28     20    29%
colossalai/legacy/context/process_group_initializer/initializer_pipeline.py                                  26     18    31%
colossalai/legacy/context/process_group_initializer/initializer_sequence.py                                  42     29    31%
colossalai/legacy/context/process_group_initializer/initializer_tensor.py                                    27     19    30%
colossalai/legacy/context/process_group_initializer/process_group_initializer.py                             14      8    43%
colossalai/legacy/context/random/__init__.py                                                                  2      0   100%
colossalai/legacy/context/random/_helper.py                                                                  53     34    36%
colossalai/legacy/context/random/seed_manager.py                                                             40     22    45%
colossalai/legacy/core.py                                                                                     2      0   100%
colossalai/legacy/engine/_base_engine.py                                                                     90     56    38%
colossalai/legacy/engine/gradient_accumulation/_gradient_accumulation.py                                    107     70    35%
colossalai/legacy/engine/gradient_handler/_data_parallel_gradient_handler.py                                 10      2    80%
colossalai/legacy/engine/gradient_handler/_moe_gradient_handler.py                                           20      9    55%
colossalai/legacy/engine/gradient_handler/_pipeline_parallel_gradient_handler.py                             24     14    42%
colossalai/legacy/engine/gradient_handler/_sequence_parallel_gradient_handler.py                             10      2    80%
colossalai/legacy/engine/schedule/_pipeline_schedule.py                                                     430    397     8%
colossalai/legacy/engine/schedule/_pipeline_schedule_v2.py                                                   78     78     0%
colossalai/legacy/global_variables.py                                                                        25      1    96%
colossalai/legacy/initialize.py                                                                             190    155    18%
colossalai/legacy/nn/__init__.py                                                                              3      0   100%
colossalai/legacy/nn/_ops/__init__.py                                                                         1      1     0%
colossalai/legacy/nn/_ops/_utils.py                                                                         156    156     0%
colossalai/legacy/nn/layer/base_layer.py                                                                     36     22    39%
colossalai/legacy/nn/layer/colossalai_layer/dropout.py                                                       17      9    47%
colossalai/legacy/nn/layer/parallel_1d/_operation.py                                                         53     35    34%
colossalai/legacy/nn/layer/parallel_1d/_utils.py                                                             96     52    46%
colossalai/legacy/nn/layer/parallel_1d/layers.py                                                            474    378    20%
colossalai/legacy/nn/layer/parallel_2d/_operation.py                                                        396    309    22%
colossalai/legacy/nn/layer/parallel_2d/_utils.py                                                             12      7    42%
colossalai/legacy/nn/layer/parallel_2d/layers.py                                                            484    407    16%
colossalai/legacy/nn/layer/parallel_2p5d/_operation.py                                                      431    336    22%
colossalai/legacy/nn/layer/parallel_2p5d/_utils.py                                                           14      9    36%
colossalai/legacy/nn/layer/parallel_2p5d/layers.py                                                          477    401    16%
colossalai/legacy/nn/layer/parallel_3d/_operation.py                                                        237    164    31%
colossalai/legacy/nn/layer/parallel_3d/_utils.py                                                             63     39    38%
colossalai/legacy/nn/layer/parallel_3d/layers.py                                                            513    431    16%
colossalai/legacy/nn/layer/parallel_sequence/_operation.py                                                   82     60    27%
colossalai/legacy/nn/layer/parallel_sequence/layers.py                                                       92     70    24%
colossalai/legacy/nn/layer/utils/common.py                                                                   48     18    62%
colossalai/legacy/nn/layer/vanilla/layers.py                                                                147    103    30%
colossalai/legacy/nn/layer/wrapper/pipeline_wrapper.py                                                       38     28    26%
colossalai/legacy/nn/loss/__init__.py                                                                        23      9    61%
colossalai/legacy/nn/loss/loss_1d.py                                                                         59     41    31%
colossalai/legacy/nn/loss/loss_2d.py                                                                         74     48    35%
colossalai/legacy/nn/loss/loss_2p5d.py                                                                       74     48    35%
colossalai/legacy/nn/loss/loss_3d.py                                                                         81     55    32%
colossalai/legacy/nn/metric/accuracy_3d.py                                                                   18      9    50%
colossalai/legacy/nn/parallel/data_parallel.py                                                               99     99     0%
colossalai/legacy/nn/parallel/layers/cache_embedding/parallel_cached_embedding.py                            56     56     0%
colossalai/legacy/nn/parallel/layers/cache_embedding/parallel_cached_embedding_tablewise.py                  97     97     0%
colossalai/legacy/nn/parallel/layers/cache_embedding/parallel_cached_embedding_tablewise_split_cache.py      73     73     0%
colossalai/legacy/nn/parallel/layers/colo_module.py                                                          30     30     0%
colossalai/legacy/nn/parallel/layers/embedding.py                                                            15     15     0%
colossalai/legacy/nn/parallel/layers/linear.py                                                               15     15     0%
colossalai/legacy/nn/parallel/layers/module_utils.py                                                         83     83     0%
colossalai/legacy/pipeline/__init__.py                                                                        3      3     0%
colossalai/legacy/pipeline/layer_spec.py                                                                     39     39     0%
colossalai/legacy/pipeline/middleware/__init__.py                                                             2      2     0%
colossalai/legacy/pipeline/middleware/adaptor/__init__.py                                                     2      2     0%
colossalai/legacy/pipeline/middleware/adaptor/fx.py                                                         109    109     0%
colossalai/legacy/pipeline/middleware/topo.py                                                               144    144     0%
colossalai/legacy/pipeline/pipelinable.py                                                                   161    161     0%
colossalai/legacy/pipeline/pipeline_process_group.py                                                        114    114     0%
colossalai/legacy/pipeline/rpc/__init__.py                                                                    3      3     0%
colossalai/legacy/pipeline/rpc/_pipeline_base.py                                                            896    896     0%
colossalai/legacy/pipeline/rpc/_pipeline_schedule.py                                                        204    204     0%
colossalai/legacy/pipeline/rpc/utils.py                                                                     100    100     0%
colossalai/legacy/pipeline/utils.py                                                                         183    183     0%
colossalai/legacy/tensor/__init__.py                                                                          7      0   100%
colossalai/legacy/tensor/compute_spec.py                                                                     15      5    67%
colossalai/legacy/tensor/const.py                                                                             4      4     0%
colossalai/legacy/tensor/dist_spec_mgr.py                                                                   118     83    30%
colossalai/legacy/tensor/distspec.py                                                                         31     17    45%
colossalai/legacy/tensor/op_wrapper.py                                                                       13     13     0%
colossalai/legacy/tensor/process_group.py                                                                   129     96    26%
colossalai/legacy/tensor/tensor_spec.py                                                                      10      0   100%
colossalai/legacy/trainer/_trainer.py                                                                       171    171     0%
colossalai/legacy/trainer/hooks/_checkpoint_hook.py                                                          31     31     0%
colossalai/legacy/trainer/hooks/_log_hook.py                                                                146    146     0%
colossalai/legacy/trainer/hooks/_metric_hook.py                                                             228    228     0%
colossalai/legacy/utils/__init__.py                                                                           5      0   100%
colossalai/legacy/utils/activation_checkpoint.py                                                            151    151     0%
colossalai/legacy/utils/checkpoint/__init__.py                                                                2      0   100%
colossalai/legacy/utils/checkpoint/module_checkpoint.py                                                      80     72    10%
colossalai/legacy/utils/checkpoint/utils.py                                                                  40     32    20%
colossalai/legacy/utils/checkpointing.py                                                                    141    120    15%
colossalai/legacy/utils/common.py                                                                           267    222    17%
colossalai/legacy/utils/data_sampler/__init__.py                                                              3      0   100%
colossalai/legacy/utils/data_sampler/base_sampler.py                                                         11      4    64%
colossalai/legacy/utils/data_sampler/data_parallel_sampler.py                                                56     41    27%
colossalai/legacy/utils/memory.py                                                                            85     62    27%
colossalai/legacy/utils/profiler/__init__.py                                                                  2      2     0%
colossalai/legacy/utils/profiler/extention.py                                                                14     14     0%
colossalai/legacy/utils/profiler/legacy/__init__.py                                                           5      5     0%
colossalai/legacy/utils/profiler/legacy/comm_profiler.py                                                    204    204     0%
colossalai/legacy/utils/profiler/legacy/pcie_profiler.py                                                    102    102     0%
colossalai/legacy/utils/profiler/legacy/prof_utils.py                                                        77     77     0%
colossalai/legacy/utils/profiler/profiler.py                                                                 62     62     0%
colossalai/legacy/utils/profiler/stateful_tensor_mem_extention.py                                            92     92     0%
colossalai/legacy/zero/__init__.py                                                                           20     10    50%
colossalai/legacy/zero/gemini/__init__.py                                                                     5      0   100%
colossalai/legacy/zero/gemini/gemini_context.py                                                              29      6    79%
colossalai/legacy/zero/gemini/ophooks/__init__.py                                                             2      0   100%
colossalai/legacy/zero/gemini/ophooks/_shard_grad_ophook.py                                                  19     19     0%
colossalai/legacy/zero/gemini/ophooks/_shard_param_ophook.py                                                 33     33     0%
colossalai/legacy/zero/gemini/ophooks/runtime_mem_tracer_hook.py                                             94     94     0%
colossalai/legacy/zero/gemini/ophooks/utils.py                                                               90     63    30%
colossalai/legacy/zero/gemini/paramhooks/__init__.py                                                          2      0   100%
colossalai/legacy/zero/gemini/paramhooks/_param_hookmgr.py                                                   18     11    39%
colossalai/legacy/zero/gemini/stateful_tensor.py                                                            123     83    33%
colossalai/legacy/zero/gemini/stateful_tensor_mgr.py                                                         69     48    30%
colossalai/legacy/zero/gemini/tensor_placement_policy.py                                                     82     53    35%
colossalai/legacy/zero/gemini/tensor_utils.py                                                                54     43    20%
colossalai/legacy/zero/init_ctx/__init__.py                                                                   2      0   100%
colossalai/legacy/zero/init_ctx/init_context.py                                                             147    105    29%
colossalai/legacy/zero/shard_utils/__init__.py                                                                4      0   100%
colossalai/legacy/zero/shard_utils/base_shard_strategy.py                                                    13      3    77%
colossalai/legacy/zero/shard_utils/bucket_tensor_shard_strategy.py                                           32     23    28%
colossalai/legacy/zero/shard_utils/commons.py                                                                13     10    23%
colossalai/legacy/zero/shard_utils/tensor_shard_strategy.py                                                  38     25    34%
colossalai/legacy/zero/sharded_model/__init__.py                                                              2      0   100%
colossalai/legacy/zero/sharded_model/_utils.py                                                               57     43    25%
colossalai/legacy/zero/sharded_model/reduce_scatter.py                                                       94     68    28%
colossalai/legacy/zero/sharded_model/sharded_model_v2.py                                                    299    239    20%
colossalai/legacy/zero/sharded_model/utils.py                                                                12     12     0%
colossalai/legacy/zero/sharded_model/zero_hook.py                                                            73     50    32%
colossalai/legacy/zero/sharded_optim/__init__.py                                                              2      0   100%
colossalai/legacy/zero/sharded_optim/sharded_optim_v2.py                                                    215    170    21%
colossalai/legacy/zero/sharded_param/__init__.py                                                              3      0   100%
colossalai/legacy/zero/sharded_param/sharded_param.py                                                        68     45    34%
colossalai/legacy/zero/sharded_param/sharded_tensor.py                                                       26     12    54%
colossalai/logging/logger.py                                                                                 82     27    67%
colossalai/nn/layer/__init__.py                                                                               1      0   100%
colossalai/nn/layer/moe/experts.py                                                                          127    127     0%
colossalai/nn/layer/moe/layers.py                                                                           103    103     0%
colossalai/nn/loss/__init__.py                                                                                0      0   100%
colossalai/nn/optimizer/__init__.py                                                                           8      0   100%
colossalai/pipeline/__init__.py                                                                               4      0   100%
colossalai/pipeline/schedule/__init__.py                                                                      4      0   100%
colossalai/tensor/__init__.py                                                                                 6      0   100%
colossalai/utils/__init__.py                                                                                  6      0   100%
colossalai/utils/common.py                                                                                   44      6    86%
colossalai/utils/cuda.py                                                                                     26     13    50%
colossalai/utils/moe.py                                                                                      29     20    31%
colossalai/zero/gemini/colo_init_context.py                                                                 101     86    15%
colossalai/zero/gemini/memory_tracer/__init__.py                                                              6      0   100%
colossalai/zero/gemini/memory_tracer/chunk_memstats_collector.py                                             17      2    88%
colossalai/zero/gemini/memory_tracer/memory_monitor.py                                                       72     35    51%
colossalai/zero/gemini/memory_tracer/memstats_collector.py                                                   52     11    79%
colossalai/zero/gemini/memory_tracer/runtime_mem_tracer.py                                                   64     64     0%
colossalai/zero/gemini/placement_policy.py                                                                  119     24    80%
colossalai/zero/low_level/_utils.py                                                                         117     66    44%
tests/components_to_test/resnet.py                                                                           20      9    55%
tests/test_auto_parallel/test_tensor_shard/test_compatibility_with_gemini.py                                 77     77     0%
tests/test_cluster/test_process_group_mesh.py                                                                85     34    60%
tests/test_device/test_init_logical_pg.py                                                                    25      1    96%
tests/test_tensor/test_comm_spec_apply.py                                                                    96      1    99%
tests/test_tensor/test_dtensor/test_comm_spec.py                                                             77      1    99%
tests/test_tensor/test_mix_gather.py                                                                        150    127    15%
tests/test_zero/test_gemini/test_chunk_mgrv2.py                                                              47      1    98%
tests/test_zero/test_gemini/test_fwd_bwd.py                                                                  74      1    99%
tests/test_zero/test_gemini/test_grad_clip.py                                                                76      2    97%
tests/test_zero/test_gemini/test_inference.py                                                                92      1    99%
tests/test_zero/test_gemini/test_optim.py                                                                   119      1    99%
tests/test_zero/test_gemini/test_zeroddp_state_dict.py                                                       91      5    95%
tests/test_zero/test_gemini/test_zerooptim_state_dict.py                                                     59      2    97%
-----------------------------------------------------------------------------------------------------------------------------
TOTAL                                                                                                     16361  12388    24%

@ver217 ver217 merged commit 432bc20 into hpcaitech:feature/remove-legacy Sep 16, 2023
19 checks passed
@ver217 ver217 deleted the feature/remove-gpc branch September 16, 2023 09:41
ver217 added a commit that referenced this pull request Sep 18, 2023
* [legacy] remove outdated codes of pipeline (#4692)

* [legacy] remove cli of benchmark and update optim (#4690)

* [legacy] remove cli of benchmark and update optim

* [doc] fix cli doc test

* [legacy] fix engine clip grad norm

* [legacy] remove outdated colo tensor (#4694)

* [legacy] remove outdated colo tensor

* [test] fix test import

* [legacy] move outdated zero to legacy (#4696)

* [legacy] clean up utils (#4700)

* [legacy] clean up utils

* [example] update examples

* [legacy] clean up amp

* [legacy] fix amp module

* [legacy] clean up gpc (#4742)

* [legacy] clean up context

* [legacy] clean core, constants and global vars

* [legacy] refactor initialize

* [example] fix examples ci

* [example] fix examples ci

* [legacy] fix tests

* [example] fix gpt example

* [example] fix examples ci

* [devops] fix ci installation

* [example] fix examples ci
Xu-Kai added a commit that referenced this pull request Sep 29, 2023
* [shardformer] fix GPT2DoubleHeadsModel (#4703)

* [hotfix] Fix import error: colossal.kernel without triton installed (#4722)

* [hotfix] remove triton kernels from kernel init

* revise bloom/llama kernel imports for infer

* [shardformer] to fix whisper test failed due to significant accuracy differences. (#4710)

* [shardformer] fix whisper test failed

* [shardformer] fix whisper test failed

* [shardformer] fix whisper test failed

* [shardformer] fix whisper test failed

* [doc] fix llama2 code link (#4726)

* [doc] fix llama2 code link

* [doc] fix llama2 code link

* [doc] fix llama2 code link

* [doc] Add user document for Shardformer (#4702)

* create shardformer doc files

* add docstring for seq-parallel

* update ShardConfig docstring

* add links to llama example

* add outdated massage

* finish introduction & supporting information

* finish 'how shardformer works'

* finish shardformer.md English doc

* fix doctest fail

* add Chinese document

* [format] applied code formatting on changed files in pull request 4726 (#4727)

Co-authored-by: github-actions <github-actions@github.com>

* [doc] add shardformer support matrix/update tensor parallel documents (#4728)

* add compatibility matrix for shardformer doc

* update tp doc

* Optimized some syntax errors in the documentation and code under applications/ (#4127)

Co-authored-by: flybird11111 <1829166702@qq.com>

* [shardformer] update pipeline parallel document (#4725)

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [legacy] remove deterministic data loader test

* [shardformer] update seq parallel document (#4730)

* update doc of seq parallel

* fix typo

* [example] add gpt2 HybridParallelPlugin example (#4653)

* add gpt2 HybridParallelPlugin example

* update readme and testci

* update test ci

* fix test_ci bug

* update requirements

* add requirements

* update requirements

* add requirement

* rename file

* [doc] polish shardformer doc (#4735)

* arrange position of chapters

* fix typos in seq parallel doc

* [shardformer] add custom policy in hybrid parallel plugin (#4718)

* add custom policy

* update assert

* [example] llama2 add fine-tune example (#4673)

* [shardformer] update shardformer readme

[shardformer] update shardformer readme

[shardformer] update shardformer readme

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] change dataset

* [shardformer] change dataset

* [shardformer] fix CI

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

[example] update opt example

[example] resolve comments

fix

fix

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* fix

* update llama2 example

* update llama2 example

* fix

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* Update requirements.txt

* update llama2 example

* update llama2 example

* update llama2 example

* [doc] explaination of loading large pretrained models (#4741)

* [kernel] update triton init #4740 (#4740)

* [legacy] clean up legacy code (#4743)

* [legacy] remove outdated codes of pipeline (#4692)

* [legacy] remove cli of benchmark and update optim (#4690)

* [legacy] remove cli of benchmark and update optim

* [doc] fix cli doc test

* [legacy] fix engine clip grad norm

* [legacy] remove outdated colo tensor (#4694)

* [legacy] remove outdated colo tensor

* [test] fix test import

* [legacy] move outdated zero to legacy (#4696)

* [legacy] clean up utils (#4700)

* [legacy] clean up utils

* [example] update examples

* [legacy] clean up amp

* [legacy] fix amp module

* [legacy] clean up gpc (#4742)

* [legacy] clean up context

* [legacy] clean core, constants and global vars

* [legacy] refactor initialize

* [example] fix examples ci

* [example] fix examples ci

* [legacy] fix tests

* [example] fix gpt example

* [example] fix examples ci

* [devops] fix ci installation

* [example] fix examples ci

* [format] applied code formatting on changed files in pull request 4743 (#4750)

Co-authored-by: github-actions <github-actions@github.com>

* [misc] update pre-commit and run all files (#4752)

* [misc] update pre-commit

* [misc] run pre-commit

* [misc] remove useless configuration files

* [misc] ignore cuda for clang-format

* [doc] explain suitable use case for each plugin

* [doc] put individual plugin explanation in front

* [doc] add model examples for each plugin

* [doc] put native colossalai plugins first in description section

* [chat]: update rm, add wandb and fix bugs (#4471)

* feat: modify forward fn of critic and reward model

* feat: modify calc_action_log_probs

* to: add wandb in sft and rm trainer

* feat: update train_sft

* feat: update train_rm

* style: modify type annotation and add warning

* feat: pass tokenizer to ppo trainer

* to: modify trainer base and maker base

* feat: add wandb in ppo trainer

* feat: pass tokenizer to generate

* test: update generate fn tests

* test: update train tests

* fix: remove action_mask

* feat: remove unused code

* fix: fix wrong ignore_index

* fix: fix mock tokenizer

* chore: update requirements

* revert: modify make_experience

* fix: fix inference

* fix: add padding side

* style: modify _on_learn_batch_end

* test: use mock tokenizer

* fix: use bf16 to avoid overflow

* fix: fix workflow

* [chat] fix gemini strategy

* [chat] fix

* sync: update colossalai strategy

* fix: fix args and model dtype

* fix: fix checkpoint test

* fix: fix requirements

* fix: fix missing import and wrong arg

* fix: temporarily skip gemini test in stage 3

* style: apply pre-commit

* fix: temporarily skip gemini test in stage 1&2

---------

Co-authored-by: Mingyan Jiang <1829166702@qq.com>

* [shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic (#4758)

* fix master param sync for hybrid plugin

* rewrite unwrap for ddp/fsdp

* rewrite unwrap for zero/gemini

* rewrite unwrap for hybrid plugin

* fix geemini unwrap

* fix bugs

* [bug] fix get_default_parser in examples (#4764)

* [doc] clean up outdated docs (#4765)

* [doc] clean up outdated docs

* [doc] fix linking

* [doc] fix linking

* [doc] add shardformer doc to sidebar (#4768)

* [chat]: add lora merge weights config (#4766)

* feat: modify lora merge weights fn

* feat: add lora merge weights config

* [lazy] support torch 2.0 (#4763)

* [lazy] support _like methods and clamp

* [lazy] pass transformers models

* [lazy] fix device move and requires grad

* [lazy] fix requires grad and refactor api

* [lazy] fix requires grad

* [bug] Fix the version check bug in colossalai run when generating the cmd. (#4713)

* Fix the version check bug in colossalai run when generating the cmd.

* polish code

* [feature] add gptq for inference (#4754)

* [gptq] add gptq kernel (#4416)

* add gptq

* refactor code

* fix tests

* replace auto-gptq

* rname inferance/quant

* refactor test

* add auto-gptq as an option

* reset requirements

* change assert and check auto-gptq

* add import warnings

* change test flash attn version

* remove example

* change requirements of flash_attn

* modify tests

* [skip ci] change requirements-test

* [gptq] faster gptq cuda kernel (#4494)

* [skip ci] add cuda kernels

* add license

* [skip ci] fix max_input_len

* format files & change test size

* [skip ci]

* [gptq] add gptq tensor parallel (#4538)

* add gptq tensor parallel

* add gptq tp

* delete print

* add test gptq check

* add test auto gptq check

* [gptq] combine gptq and kv cache manager (#4706)

* combine gptq and kv cache manager

* add init bits

* delete useless code

* add model path

* delete usless print and update test

* delete usless import

* move option gptq to shard config

* change replace linear to shardformer

* update bloom policy

* delete useless code

* fix import bug and delete uselss code

* change colossalai/gptq to colossalai/quant/gptq

* update import linear for tests

* delete useless code and mv gptq_kernel to kernel directory

* fix triton kernel

* add triton import

* [inference] chatglm2 infer demo (#4724)

* add chatglm2

* add

* gather needed kernels

* fix some bugs

* finish context forward

* finish context stage

* fix

* add

* pause

* add

* fix bugs

* finish chatglm

* fix bug

* change some logic

* fix bugs

* change some logics

* add

* add

* add

* fix

* fix tests

* fix

* [release] update version (#4775)

* [release] update version

* [doc] revert versions

* initial commit: add colossal llama 2 (#4784)

* [feature] ColossalEval: Evaluation Pipeline for LLMs (#4786)

* Add ColossalEval

* Delete evaluate in Chat

---------

Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
Co-authored-by: Tong Li <tong.li352711588@gmail.com>

* [doc] add llama2 domain-specific solution news (#4789)

* [doc] add llama2 domain-specific solution news

* [fix] fix weekly runing example (#4787)

* [fix] fix weekly runing example

* [fix] fix weekly runing example

* [doc] polish shardformer doc (#4779)

* fix example format in docstring

* polish shardformer doc

* [checkpointio] support unsharded checkpointIO for hybrid parallel (#4774)

* support unsharded saving/loading for model

* support optimizer unsharded saving

* update doc

* support unsharded loading for optimizer

* small fix

* update readme

* [lazy] support from_pretrained (#4801)

* [lazy] patch from pretrained

* [lazy] fix from pretrained and add tests

* [devops] update ci

* update

* [hotfix] change llama2 Colossal-LLaMA-2 script filename (#4800)

change filename:
pretraining.py -> trainin.py
there is no file named pretraing.py. wrong writing

* [misc] add last_epoch in CosineAnnealingWarmupLR (#4778)

* [doc] add lazy init docs (#4808)

* [hotfix] fix norm type error in zero optimizer (#4795)

* [hotfix] Correct several erroneous code comments (#4794)

* [format] applied code formatting on changed files in pull request 4595 (#4602)

Co-authored-by: github-actions <github-actions@github.com>

* fix format (#4815)

* [chat] fix gemini strategy (#4698)

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* g# This is a combination of 2 commits.

[chat] fix gemini strategy

fox

* [chat] fix gemini strategy

update llama2 example

[chat] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* fix

* fix

* fix

* fix

* fix

* Update train_prompts.py

* Update Qwen-7B results (#4821)

Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>

* [doc] update slack link (#4823)

* add autotune (#4822)

* update Colossal (#4832)

---------

Co-authored-by: flybird11111 <1829166702@qq.com>
Co-authored-by: Yuanheng Zhao <54058983+yuanheng-zhao@users.noreply.github.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
Co-authored-by: Baizhou Zhang <eddiezhang@pku.edu.cn>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions <github-actions@github.com>
Co-authored-by: digger yu <digger-yu@outlook.com>
Co-authored-by: Pengtai Xu <henryxu880@gmail.com>
Co-authored-by: Bin Jia <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: ppt0011 <143150326+ppt0011@users.noreply.github.com>
Co-authored-by: Xuanlei Zhao <43881818+oahzxl@users.noreply.github.com>
Co-authored-by: Hongxin Liu <lhx0217@gmail.com>
Co-authored-by: Wenhao Chen <cwher@outlook.com>
Co-authored-by: littsk <1214689160@qq.com>
Co-authored-by: Jianghai <72591262+CjhHa1@users.noreply.github.com>
Co-authored-by: Tong Li <tong.li352711588@gmail.com>
Co-authored-by: Yuanchen <70520919+chengeharrison@users.noreply.github.com>
Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
Co-authored-by: Desperado-Jia <502205863@qq.com>
Co-authored-by: Chandler-Bing <brp12138@163.com>
Co-authored-by: Yan haixu <40758050+hova88@users.noreply.github.com>
Xu-Kai added a commit that referenced this pull request Sep 29, 2023
* [shardformer] fix GPT2DoubleHeadsModel (#4703)

* [hotfix] Fix import error: colossal.kernel without triton installed (#4722)

* [hotfix] remove triton kernels from kernel init

* revise bloom/llama kernel imports for infer

* [shardformer] to fix whisper test failed due to significant accuracy differences. (#4710)

* [shardformer] fix whisper test failed

* [shardformer] fix whisper test failed

* [shardformer] fix whisper test failed

* [shardformer] fix whisper test failed

* [doc] fix llama2 code link (#4726)

* [doc] fix llama2 code link

* [doc] fix llama2 code link

* [doc] fix llama2 code link

* [doc] Add user document for Shardformer (#4702)

* create shardformer doc files

* add docstring for seq-parallel

* update ShardConfig docstring

* add links to llama example

* add outdated massage

* finish introduction & supporting information

* finish 'how shardformer works'

* finish shardformer.md English doc

* fix doctest fail

* add Chinese document

* [format] applied code formatting on changed files in pull request 4726 (#4727)

Co-authored-by: github-actions <github-actions@github.com>

* [doc] add shardformer support matrix/update tensor parallel documents (#4728)

* add compatibility matrix for shardformer doc

* update tp doc

* Optimized some syntax errors in the documentation and code under applications/ (#4127)

Co-authored-by: flybird11111 <1829166702@qq.com>

* [shardformer] update pipeline parallel document (#4725)

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [legacy] remove deterministic data loader test

* [shardformer] update seq parallel document (#4730)

* update doc of seq parallel

* fix typo

* [example] add gpt2 HybridParallelPlugin example (#4653)

* add gpt2 HybridParallelPlugin example

* update readme and testci

* update test ci

* fix test_ci bug

* update requirements

* add requirements

* update requirements

* add requirement

* rename file

* [doc] polish shardformer doc (#4735)

* arrange position of chapters

* fix typos in seq parallel doc

* [shardformer] add custom policy in hybrid parallel plugin (#4718)

* add custom policy

* update assert

* [example] llama2 add fine-tune example (#4673)

* [shardformer] update shardformer readme

[shardformer] update shardformer readme

[shardformer] update shardformer readme

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] change dataset

* [shardformer] change dataset

* [shardformer] fix CI

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

[example] update opt example

[example] resolve comments

fix

fix

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* fix

* update llama2 example

* update llama2 example

* fix

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* Update requirements.txt

* update llama2 example

* update llama2 example

* update llama2 example

* [doc] explaination of loading large pretrained models (#4741)

* [kernel] update triton init #4740 (#4740)

* [legacy] clean up legacy code (#4743)

* [legacy] remove outdated codes of pipeline (#4692)

* [legacy] remove cli of benchmark and update optim (#4690)

* [legacy] remove cli of benchmark and update optim

* [doc] fix cli doc test

* [legacy] fix engine clip grad norm

* [legacy] remove outdated colo tensor (#4694)

* [legacy] remove outdated colo tensor

* [test] fix test import

* [legacy] move outdated zero to legacy (#4696)

* [legacy] clean up utils (#4700)

* [legacy] clean up utils

* [example] update examples

* [legacy] clean up amp

* [legacy] fix amp module

* [legacy] clean up gpc (#4742)

* [legacy] clean up context

* [legacy] clean core, constants and global vars

* [legacy] refactor initialize

* [example] fix examples ci

* [example] fix examples ci

* [legacy] fix tests

* [example] fix gpt example

* [example] fix examples ci

* [devops] fix ci installation

* [example] fix examples ci

* [format] applied code formatting on changed files in pull request 4743 (#4750)

Co-authored-by: github-actions <github-actions@github.com>

* [misc] update pre-commit and run all files (#4752)

* [misc] update pre-commit

* [misc] run pre-commit

* [misc] remove useless configuration files

* [misc] ignore cuda for clang-format

* [doc] explain suitable use case for each plugin

* [doc] put individual plugin explanation in front

* [doc] add model examples for each plugin

* [doc] put native colossalai plugins first in description section

* [chat]: update rm, add wandb and fix bugs (#4471)

* feat: modify forward fn of critic and reward model

* feat: modify calc_action_log_probs

* to: add wandb in sft and rm trainer

* feat: update train_sft

* feat: update train_rm

* style: modify type annotation and add warning

* feat: pass tokenizer to ppo trainer

* to: modify trainer base and maker base

* feat: add wandb in ppo trainer

* feat: pass tokenizer to generate

* test: update generate fn tests

* test: update train tests

* fix: remove action_mask

* feat: remove unused code

* fix: fix wrong ignore_index

* fix: fix mock tokenizer

* chore: update requirements

* revert: modify make_experience

* fix: fix inference

* fix: add padding side

* style: modify _on_learn_batch_end

* test: use mock tokenizer

* fix: use bf16 to avoid overflow

* fix: fix workflow

* [chat] fix gemini strategy

* [chat] fix

* sync: update colossalai strategy

* fix: fix args and model dtype

* fix: fix checkpoint test

* fix: fix requirements

* fix: fix missing import and wrong arg

* fix: temporarily skip gemini test in stage 3

* style: apply pre-commit

* fix: temporarily skip gemini test in stage 1&2

---------

Co-authored-by: Mingyan Jiang <1829166702@qq.com>

* [shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic (#4758)

* fix master param sync for hybrid plugin

* rewrite unwrap for ddp/fsdp

* rewrite unwrap for zero/gemini

* rewrite unwrap for hybrid plugin

* fix geemini unwrap

* fix bugs

* [bug] fix get_default_parser in examples (#4764)

* [doc] clean up outdated docs (#4765)

* [doc] clean up outdated docs

* [doc] fix linking

* [doc] fix linking

* [doc] add shardformer doc to sidebar (#4768)

* [chat]: add lora merge weights config (#4766)

* feat: modify lora merge weights fn

* feat: add lora merge weights config

* [lazy] support torch 2.0 (#4763)

* [lazy] support _like methods and clamp

* [lazy] pass transformers models

* [lazy] fix device move and requires grad

* [lazy] fix requires grad and refactor api

* [lazy] fix requires grad

* [bug] Fix the version check bug in colossalai run when generating the cmd. (#4713)

* Fix the version check bug in colossalai run when generating the cmd.

* polish code

* [feature] add gptq for inference (#4754)

* [gptq] add gptq kernel (#4416)

* add gptq

* refactor code

* fix tests

* replace auto-gptq

* rname inferance/quant

* refactor test

* add auto-gptq as an option

* reset requirements

* change assert and check auto-gptq

* add import warnings

* change test flash attn version

* remove example

* change requirements of flash_attn

* modify tests

* [skip ci] change requirements-test

* [gptq] faster gptq cuda kernel (#4494)

* [skip ci] add cuda kernels

* add license

* [skip ci] fix max_input_len

* format files & change test size

* [skip ci]

* [gptq] add gptq tensor parallel (#4538)

* add gptq tensor parallel

* add gptq tp

* delete print

* add test gptq check

* add test auto gptq check

* [gptq] combine gptq and kv cache manager (#4706)

* combine gptq and kv cache manager

* add init bits

* delete useless code

* add model path

* delete usless print and update test

* delete usless import

* move option gptq to shard config

* change replace linear to shardformer

* update bloom policy

* delete useless code

* fix import bug and delete uselss code

* change colossalai/gptq to colossalai/quant/gptq

* update import linear for tests

* delete useless code and mv gptq_kernel to kernel directory

* fix triton kernel

* add triton import

* [inference] chatglm2 infer demo (#4724)

* add chatglm2

* add

* gather needed kernels

* fix some bugs

* finish context forward

* finish context stage

* fix

* add

* pause

* add

* fix bugs

* finish chatglm

* fix bug

* change some logic

* fix bugs

* change some logics

* add

* add

* add

* fix

* fix tests

* fix

* [release] update version (#4775)

* [release] update version

* [doc] revert versions

* initial commit: add colossal llama 2 (#4784)

* [feature] ColossalEval: Evaluation Pipeline for LLMs (#4786)

* Add ColossalEval

* Delete evaluate in Chat

---------

Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
Co-authored-by: Tong Li <tong.li352711588@gmail.com>

* [doc] add llama2 domain-specific solution news (#4789)

* [doc] add llama2 domain-specific solution news

* [fix] fix weekly runing example (#4787)

* [fix] fix weekly runing example

* [fix] fix weekly runing example

* [doc] polish shardformer doc (#4779)

* fix example format in docstring

* polish shardformer doc

* [checkpointio] support unsharded checkpointIO for hybrid parallel (#4774)

* support unsharded saving/loading for model

* support optimizer unsharded saving

* update doc

* support unsharded loading for optimizer

* small fix

* update readme

* [lazy] support from_pretrained (#4801)

* [lazy] patch from pretrained

* [lazy] fix from pretrained and add tests

* [devops] update ci

* update

* [hotfix] change llama2 Colossal-LLaMA-2 script filename (#4800)

change filename:
pretraining.py -> trainin.py
there is no file named pretraing.py. wrong writing

* [misc] add last_epoch in CosineAnnealingWarmupLR (#4778)

* [doc] add lazy init docs (#4808)

* [hotfix] fix norm type error in zero optimizer (#4795)

* [hotfix] Correct several erroneous code comments (#4794)

* [format] applied code formatting on changed files in pull request 4595 (#4602)

Co-authored-by: github-actions <github-actions@github.com>

* fix format (#4815)

* [chat] fix gemini strategy (#4698)

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* g# This is a combination of 2 commits.

[chat] fix gemini strategy

fox

* [chat] fix gemini strategy

update llama2 example

[chat] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* fix

* fix

* fix

* fix

* fix

* Update train_prompts.py

* Update Qwen-7B results (#4821)

Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>

* [doc] update slack link (#4823)

* add autotune (#4822)

* update Colossal (#4832)

* add int8 rotary embedding kernel

* remove useless code

---------

Co-authored-by: flybird11111 <1829166702@qq.com>
Co-authored-by: Yuanheng Zhao <54058983+yuanheng-zhao@users.noreply.github.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
Co-authored-by: Baizhou Zhang <eddiezhang@pku.edu.cn>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions <github-actions@github.com>
Co-authored-by: digger yu <digger-yu@outlook.com>
Co-authored-by: Pengtai Xu <henryxu880@gmail.com>
Co-authored-by: Bin Jia <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: ppt0011 <143150326+ppt0011@users.noreply.github.com>
Co-authored-by: Xuanlei Zhao <43881818+oahzxl@users.noreply.github.com>
Co-authored-by: Hongxin Liu <lhx0217@gmail.com>
Co-authored-by: Wenhao Chen <cwher@outlook.com>
Co-authored-by: littsk <1214689160@qq.com>
Co-authored-by: Jianghai <72591262+CjhHa1@users.noreply.github.com>
Co-authored-by: Tong Li <tong.li352711588@gmail.com>
Co-authored-by: Yuanchen <70520919+chengeharrison@users.noreply.github.com>
Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
Co-authored-by: Desperado-Jia <502205863@qq.com>
Co-authored-by: Chandler-Bing <brp12138@163.com>
Co-authored-by: Yan haixu <40758050+hova88@users.noreply.github.com>
Xu-Kai added a commit to Xu-Kai/ColossalAI that referenced this pull request Oct 3, 2023
…ch#4843)

* [shardformer] fix GPT2DoubleHeadsModel (hpcaitech#4703)

* [hotfix] Fix import error: colossal.kernel without triton installed (hpcaitech#4722)

* [hotfix] remove triton kernels from kernel init

* revise bloom/llama kernel imports for infer

* [shardformer] to fix whisper test failed due to significant accuracy differences. (hpcaitech#4710)

* [shardformer] fix whisper test failed

* [shardformer] fix whisper test failed

* [shardformer] fix whisper test failed

* [shardformer] fix whisper test failed

* [doc] fix llama2 code link (hpcaitech#4726)

* [doc] fix llama2 code link

* [doc] fix llama2 code link

* [doc] fix llama2 code link

* [doc] Add user document for Shardformer (hpcaitech#4702)

* create shardformer doc files

* add docstring for seq-parallel

* update ShardConfig docstring

* add links to llama example

* add outdated massage

* finish introduction & supporting information

* finish 'how shardformer works'

* finish shardformer.md English doc

* fix doctest fail

* add Chinese document

* [format] applied code formatting on changed files in pull request 4726 (hpcaitech#4727)

Co-authored-by: github-actions <github-actions@github.com>

* [doc] add shardformer support matrix/update tensor parallel documents (hpcaitech#4728)

* add compatibility matrix for shardformer doc

* update tp doc

* Optimized some syntax errors in the documentation and code under applications/ (hpcaitech#4127)

Co-authored-by: flybird11111 <1829166702@qq.com>

* [shardformer] update pipeline parallel document (hpcaitech#4725)

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [legacy] remove deterministic data loader test

* [shardformer] update seq parallel document (hpcaitech#4730)

* update doc of seq parallel

* fix typo

* [example] add gpt2 HybridParallelPlugin example (hpcaitech#4653)

* add gpt2 HybridParallelPlugin example

* update readme and testci

* update test ci

* fix test_ci bug

* update requirements

* add requirements

* update requirements

* add requirement

* rename file

* [doc] polish shardformer doc (hpcaitech#4735)

* arrange position of chapters

* fix typos in seq parallel doc

* [shardformer] add custom policy in hybrid parallel plugin (hpcaitech#4718)

* add custom policy

* update assert

* [example] llama2 add fine-tune example (hpcaitech#4673)

* [shardformer] update shardformer readme

[shardformer] update shardformer readme

[shardformer] update shardformer readme

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] change dataset

* [shardformer] change dataset

* [shardformer] fix CI

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

[example] update opt example

[example] resolve comments

fix

fix

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* fix

* update llama2 example

* update llama2 example

* fix

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* Update requirements.txt

* update llama2 example

* update llama2 example

* update llama2 example

* [doc] explaination of loading large pretrained models (hpcaitech#4741)

* [kernel] update triton init hpcaitech#4740 (hpcaitech#4740)

* [legacy] clean up legacy code (hpcaitech#4743)

* [legacy] remove outdated codes of pipeline (hpcaitech#4692)

* [legacy] remove cli of benchmark and update optim (hpcaitech#4690)

* [legacy] remove cli of benchmark and update optim

* [doc] fix cli doc test

* [legacy] fix engine clip grad norm

* [legacy] remove outdated colo tensor (hpcaitech#4694)

* [legacy] remove outdated colo tensor

* [test] fix test import

* [legacy] move outdated zero to legacy (hpcaitech#4696)

* [legacy] clean up utils (hpcaitech#4700)

* [legacy] clean up utils

* [example] update examples

* [legacy] clean up amp

* [legacy] fix amp module

* [legacy] clean up gpc (hpcaitech#4742)

* [legacy] clean up context

* [legacy] clean core, constants and global vars

* [legacy] refactor initialize

* [example] fix examples ci

* [example] fix examples ci

* [legacy] fix tests

* [example] fix gpt example

* [example] fix examples ci

* [devops] fix ci installation

* [example] fix examples ci

* [format] applied code formatting on changed files in pull request 4743 (hpcaitech#4750)

Co-authored-by: github-actions <github-actions@github.com>

* [misc] update pre-commit and run all files (hpcaitech#4752)

* [misc] update pre-commit

* [misc] run pre-commit

* [misc] remove useless configuration files

* [misc] ignore cuda for clang-format

* [doc] explain suitable use case for each plugin

* [doc] put individual plugin explanation in front

* [doc] add model examples for each plugin

* [doc] put native colossalai plugins first in description section

* [chat]: update rm, add wandb and fix bugs (hpcaitech#4471)

* feat: modify forward fn of critic and reward model

* feat: modify calc_action_log_probs

* to: add wandb in sft and rm trainer

* feat: update train_sft

* feat: update train_rm

* style: modify type annotation and add warning

* feat: pass tokenizer to ppo trainer

* to: modify trainer base and maker base

* feat: add wandb in ppo trainer

* feat: pass tokenizer to generate

* test: update generate fn tests

* test: update train tests

* fix: remove action_mask

* feat: remove unused code

* fix: fix wrong ignore_index

* fix: fix mock tokenizer

* chore: update requirements

* revert: modify make_experience

* fix: fix inference

* fix: add padding side

* style: modify _on_learn_batch_end

* test: use mock tokenizer

* fix: use bf16 to avoid overflow

* fix: fix workflow

* [chat] fix gemini strategy

* [chat] fix

* sync: update colossalai strategy

* fix: fix args and model dtype

* fix: fix checkpoint test

* fix: fix requirements

* fix: fix missing import and wrong arg

* fix: temporarily skip gemini test in stage 3

* style: apply pre-commit

* fix: temporarily skip gemini test in stage 1&2

---------

Co-authored-by: Mingyan Jiang <1829166702@qq.com>

* [shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic (hpcaitech#4758)

* fix master param sync for hybrid plugin

* rewrite unwrap for ddp/fsdp

* rewrite unwrap for zero/gemini

* rewrite unwrap for hybrid plugin

* fix geemini unwrap

* fix bugs

* [bug] fix get_default_parser in examples (hpcaitech#4764)

* [doc] clean up outdated docs (hpcaitech#4765)

* [doc] clean up outdated docs

* [doc] fix linking

* [doc] fix linking

* [doc] add shardformer doc to sidebar (hpcaitech#4768)

* [chat]: add lora merge weights config (hpcaitech#4766)

* feat: modify lora merge weights fn

* feat: add lora merge weights config

* [lazy] support torch 2.0 (hpcaitech#4763)

* [lazy] support _like methods and clamp

* [lazy] pass transformers models

* [lazy] fix device move and requires grad

* [lazy] fix requires grad and refactor api

* [lazy] fix requires grad

* [bug] Fix the version check bug in colossalai run when generating the cmd. (hpcaitech#4713)

* Fix the version check bug in colossalai run when generating the cmd.

* polish code

* [feature] add gptq for inference (hpcaitech#4754)

* [gptq] add gptq kernel (hpcaitech#4416)

* add gptq

* refactor code

* fix tests

* replace auto-gptq

* rname inferance/quant

* refactor test

* add auto-gptq as an option

* reset requirements

* change assert and check auto-gptq

* add import warnings

* change test flash attn version

* remove example

* change requirements of flash_attn

* modify tests

* [skip ci] change requirements-test

* [gptq] faster gptq cuda kernel (hpcaitech#4494)

* [skip ci] add cuda kernels

* add license

* [skip ci] fix max_input_len

* format files & change test size

* [skip ci]

* [gptq] add gptq tensor parallel (hpcaitech#4538)

* add gptq tensor parallel

* add gptq tp

* delete print

* add test gptq check

* add test auto gptq check

* [gptq] combine gptq and kv cache manager (hpcaitech#4706)

* combine gptq and kv cache manager

* add init bits

* delete useless code

* add model path

* delete usless print and update test

* delete usless import

* move option gptq to shard config

* change replace linear to shardformer

* update bloom policy

* delete useless code

* fix import bug and delete uselss code

* change colossalai/gptq to colossalai/quant/gptq

* update import linear for tests

* delete useless code and mv gptq_kernel to kernel directory

* fix triton kernel

* add triton import

* [inference] chatglm2 infer demo (hpcaitech#4724)

* add chatglm2

* add

* gather needed kernels

* fix some bugs

* finish context forward

* finish context stage

* fix

* add

* pause

* add

* fix bugs

* finish chatglm

* fix bug

* change some logic

* fix bugs

* change some logics

* add

* add

* add

* fix

* fix tests

* fix

* [release] update version (hpcaitech#4775)

* [release] update version

* [doc] revert versions

* initial commit: add colossal llama 2 (hpcaitech#4784)

* [feature] ColossalEval: Evaluation Pipeline for LLMs (hpcaitech#4786)

* Add ColossalEval

* Delete evaluate in Chat

---------

Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
Co-authored-by: Tong Li <tong.li352711588@gmail.com>

* [doc] add llama2 domain-specific solution news (hpcaitech#4789)

* [doc] add llama2 domain-specific solution news

* [fix] fix weekly runing example (hpcaitech#4787)

* [fix] fix weekly runing example

* [fix] fix weekly runing example

* [doc] polish shardformer doc (hpcaitech#4779)

* fix example format in docstring

* polish shardformer doc

* [checkpointio] support unsharded checkpointIO for hybrid parallel (hpcaitech#4774)

* support unsharded saving/loading for model

* support optimizer unsharded saving

* update doc

* support unsharded loading for optimizer

* small fix

* update readme

* [lazy] support from_pretrained (hpcaitech#4801)

* [lazy] patch from pretrained

* [lazy] fix from pretrained and add tests

* [devops] update ci

* update

* [hotfix] change llama2 Colossal-LLaMA-2 script filename (hpcaitech#4800)

change filename:
pretraining.py -> trainin.py
there is no file named pretraing.py. wrong writing

* [misc] add last_epoch in CosineAnnealingWarmupLR (hpcaitech#4778)

* [doc] add lazy init docs (hpcaitech#4808)

* [hotfix] fix norm type error in zero optimizer (hpcaitech#4795)

* [hotfix] Correct several erroneous code comments (hpcaitech#4794)

* [format] applied code formatting on changed files in pull request 4595 (hpcaitech#4602)

Co-authored-by: github-actions <github-actions@github.com>

* fix format (hpcaitech#4815)

* [chat] fix gemini strategy (hpcaitech#4698)

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* g# This is a combination of 2 commits.

[chat] fix gemini strategy

fox

* [chat] fix gemini strategy

update llama2 example

[chat] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* fix

* fix

* fix

* fix

* fix

* Update train_prompts.py

* Update Qwen-7B results (hpcaitech#4821)

Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>

* [doc] update slack link (hpcaitech#4823)

* add autotune (hpcaitech#4822)

* update Colossal (hpcaitech#4832)

* add int8 rotary embedding kernel

* remove useless code

---------

Co-authored-by: flybird11111 <1829166702@qq.com>
Co-authored-by: Yuanheng Zhao <54058983+yuanheng-zhao@users.noreply.github.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
Co-authored-by: Baizhou Zhang <eddiezhang@pku.edu.cn>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions <github-actions@github.com>
Co-authored-by: digger yu <digger-yu@outlook.com>
Co-authored-by: Pengtai Xu <henryxu880@gmail.com>
Co-authored-by: Bin Jia <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: ppt0011 <143150326+ppt0011@users.noreply.github.com>
Co-authored-by: Xuanlei Zhao <43881818+oahzxl@users.noreply.github.com>
Co-authored-by: Hongxin Liu <lhx0217@gmail.com>
Co-authored-by: Wenhao Chen <cwher@outlook.com>
Co-authored-by: littsk <1214689160@qq.com>
Co-authored-by: Jianghai <72591262+CjhHa1@users.noreply.github.com>
Co-authored-by: Tong Li <tong.li352711588@gmail.com>
Co-authored-by: Yuanchen <70520919+chengeharrison@users.noreply.github.com>
Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
Co-authored-by: Desperado-Jia <502205863@qq.com>
Co-authored-by: Chandler-Bing <brp12138@163.com>
Co-authored-by: Yan haixu <40758050+hova88@users.noreply.github.com>
Xu-Kai added a commit to Xu-Kai/ColossalAI that referenced this pull request Oct 13, 2023
…ch#4843)

* [shardformer] fix GPT2DoubleHeadsModel (hpcaitech#4703)

* [hotfix] Fix import error: colossal.kernel without triton installed (hpcaitech#4722)

* [hotfix] remove triton kernels from kernel init

* revise bloom/llama kernel imports for infer

* [shardformer] to fix whisper test failed due to significant accuracy differences. (hpcaitech#4710)

* [shardformer] fix whisper test failed

* [shardformer] fix whisper test failed

* [shardformer] fix whisper test failed

* [shardformer] fix whisper test failed

* [doc] fix llama2 code link (hpcaitech#4726)

* [doc] fix llama2 code link

* [doc] fix llama2 code link

* [doc] fix llama2 code link

* [doc] Add user document for Shardformer (hpcaitech#4702)

* create shardformer doc files

* add docstring for seq-parallel

* update ShardConfig docstring

* add links to llama example

* add outdated massage

* finish introduction & supporting information

* finish 'how shardformer works'

* finish shardformer.md English doc

* fix doctest fail

* add Chinese document

* [format] applied code formatting on changed files in pull request 4726 (hpcaitech#4727)

Co-authored-by: github-actions <github-actions@github.com>

* [doc] add shardformer support matrix/update tensor parallel documents (hpcaitech#4728)

* add compatibility matrix for shardformer doc

* update tp doc

* Optimized some syntax errors in the documentation and code under applications/ (hpcaitech#4127)

Co-authored-by: flybird11111 <1829166702@qq.com>

* [shardformer] update pipeline parallel document (hpcaitech#4725)

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [legacy] remove deterministic data loader test

* [shardformer] update seq parallel document (hpcaitech#4730)

* update doc of seq parallel

* fix typo

* [example] add gpt2 HybridParallelPlugin example (hpcaitech#4653)

* add gpt2 HybridParallelPlugin example

* update readme and testci

* update test ci

* fix test_ci bug

* update requirements

* add requirements

* update requirements

* add requirement

* rename file

* [doc] polish shardformer doc (hpcaitech#4735)

* arrange position of chapters

* fix typos in seq parallel doc

* [shardformer] add custom policy in hybrid parallel plugin (hpcaitech#4718)

* add custom policy

* update assert

* [example] llama2 add fine-tune example (hpcaitech#4673)

* [shardformer] update shardformer readme

[shardformer] update shardformer readme

[shardformer] update shardformer readme

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] change dataset

* [shardformer] change dataset

* [shardformer] fix CI

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

[example] update opt example

[example] resolve comments

fix

fix

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* fix

* update llama2 example

* update llama2 example

* fix

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* Update requirements.txt

* update llama2 example

* update llama2 example

* update llama2 example

* [doc] explaination of loading large pretrained models (hpcaitech#4741)

* [kernel] update triton init hpcaitech#4740 (hpcaitech#4740)

* [legacy] clean up legacy code (hpcaitech#4743)

* [legacy] remove outdated codes of pipeline (hpcaitech#4692)

* [legacy] remove cli of benchmark and update optim (hpcaitech#4690)

* [legacy] remove cli of benchmark and update optim

* [doc] fix cli doc test

* [legacy] fix engine clip grad norm

* [legacy] remove outdated colo tensor (hpcaitech#4694)

* [legacy] remove outdated colo tensor

* [test] fix test import

* [legacy] move outdated zero to legacy (hpcaitech#4696)

* [legacy] clean up utils (hpcaitech#4700)

* [legacy] clean up utils

* [example] update examples

* [legacy] clean up amp

* [legacy] fix amp module

* [legacy] clean up gpc (hpcaitech#4742)

* [legacy] clean up context

* [legacy] clean core, constants and global vars

* [legacy] refactor initialize

* [example] fix examples ci

* [example] fix examples ci

* [legacy] fix tests

* [example] fix gpt example

* [example] fix examples ci

* [devops] fix ci installation

* [example] fix examples ci

* [format] applied code formatting on changed files in pull request 4743 (hpcaitech#4750)

Co-authored-by: github-actions <github-actions@github.com>

* [misc] update pre-commit and run all files (hpcaitech#4752)

* [misc] update pre-commit

* [misc] run pre-commit

* [misc] remove useless configuration files

* [misc] ignore cuda for clang-format

* [doc] explain suitable use case for each plugin

* [doc] put individual plugin explanation in front

* [doc] add model examples for each plugin

* [doc] put native colossalai plugins first in description section

* [chat]: update rm, add wandb and fix bugs (hpcaitech#4471)

* feat: modify forward fn of critic and reward model

* feat: modify calc_action_log_probs

* to: add wandb in sft and rm trainer

* feat: update train_sft

* feat: update train_rm

* style: modify type annotation and add warning

* feat: pass tokenizer to ppo trainer

* to: modify trainer base and maker base

* feat: add wandb in ppo trainer

* feat: pass tokenizer to generate

* test: update generate fn tests

* test: update train tests

* fix: remove action_mask

* feat: remove unused code

* fix: fix wrong ignore_index

* fix: fix mock tokenizer

* chore: update requirements

* revert: modify make_experience

* fix: fix inference

* fix: add padding side

* style: modify _on_learn_batch_end

* test: use mock tokenizer

* fix: use bf16 to avoid overflow

* fix: fix workflow

* [chat] fix gemini strategy

* [chat] fix

* sync: update colossalai strategy

* fix: fix args and model dtype

* fix: fix checkpoint test

* fix: fix requirements

* fix: fix missing import and wrong arg

* fix: temporarily skip gemini test in stage 3

* style: apply pre-commit

* fix: temporarily skip gemini test in stage 1&2

---------

Co-authored-by: Mingyan Jiang <1829166702@qq.com>

* [shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic (hpcaitech#4758)

* fix master param sync for hybrid plugin

* rewrite unwrap for ddp/fsdp

* rewrite unwrap for zero/gemini

* rewrite unwrap for hybrid plugin

* fix geemini unwrap

* fix bugs

* [bug] fix get_default_parser in examples (hpcaitech#4764)

* [doc] clean up outdated docs (hpcaitech#4765)

* [doc] clean up outdated docs

* [doc] fix linking

* [doc] fix linking

* [doc] add shardformer doc to sidebar (hpcaitech#4768)

* [chat]: add lora merge weights config (hpcaitech#4766)

* feat: modify lora merge weights fn

* feat: add lora merge weights config

* [lazy] support torch 2.0 (hpcaitech#4763)

* [lazy] support _like methods and clamp

* [lazy] pass transformers models

* [lazy] fix device move and requires grad

* [lazy] fix requires grad and refactor api

* [lazy] fix requires grad

* [bug] Fix the version check bug in colossalai run when generating the cmd. (hpcaitech#4713)

* Fix the version check bug in colossalai run when generating the cmd.

* polish code

* [feature] add gptq for inference (hpcaitech#4754)

* [gptq] add gptq kernel (hpcaitech#4416)

* add gptq

* refactor code

* fix tests

* replace auto-gptq

* rname inferance/quant

* refactor test

* add auto-gptq as an option

* reset requirements

* change assert and check auto-gptq

* add import warnings

* change test flash attn version

* remove example

* change requirements of flash_attn

* modify tests

* [skip ci] change requirements-test

* [gptq] faster gptq cuda kernel (hpcaitech#4494)

* [skip ci] add cuda kernels

* add license

* [skip ci] fix max_input_len

* format files & change test size

* [skip ci]

* [gptq] add gptq tensor parallel (hpcaitech#4538)

* add gptq tensor parallel

* add gptq tp

* delete print

* add test gptq check

* add test auto gptq check

* [gptq] combine gptq and kv cache manager (hpcaitech#4706)

* combine gptq and kv cache manager

* add init bits

* delete useless code

* add model path

* delete usless print and update test

* delete usless import

* move option gptq to shard config

* change replace linear to shardformer

* update bloom policy

* delete useless code

* fix import bug and delete uselss code

* change colossalai/gptq to colossalai/quant/gptq

* update import linear for tests

* delete useless code and mv gptq_kernel to kernel directory

* fix triton kernel

* add triton import

* [inference] chatglm2 infer demo (hpcaitech#4724)

* add chatglm2

* add

* gather needed kernels

* fix some bugs

* finish context forward

* finish context stage

* fix

* add

* pause

* add

* fix bugs

* finish chatglm

* fix bug

* change some logic

* fix bugs

* change some logics

* add

* add

* add

* fix

* fix tests

* fix

* [release] update version (hpcaitech#4775)

* [release] update version

* [doc] revert versions

* initial commit: add colossal llama 2 (hpcaitech#4784)

* [feature] ColossalEval: Evaluation Pipeline for LLMs (hpcaitech#4786)

* Add ColossalEval

* Delete evaluate in Chat

---------

Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
Co-authored-by: Tong Li <tong.li352711588@gmail.com>

* [doc] add llama2 domain-specific solution news (hpcaitech#4789)

* [doc] add llama2 domain-specific solution news

* [fix] fix weekly runing example (hpcaitech#4787)

* [fix] fix weekly runing example

* [fix] fix weekly runing example

* [doc] polish shardformer doc (hpcaitech#4779)

* fix example format in docstring

* polish shardformer doc

* [checkpointio] support unsharded checkpointIO for hybrid parallel (hpcaitech#4774)

* support unsharded saving/loading for model

* support optimizer unsharded saving

* update doc

* support unsharded loading for optimizer

* small fix

* update readme

* [lazy] support from_pretrained (hpcaitech#4801)

* [lazy] patch from pretrained

* [lazy] fix from pretrained and add tests

* [devops] update ci

* update

* [hotfix] change llama2 Colossal-LLaMA-2 script filename (hpcaitech#4800)

change filename:
pretraining.py -> trainin.py
there is no file named pretraing.py. wrong writing

* [misc] add last_epoch in CosineAnnealingWarmupLR (hpcaitech#4778)

* [doc] add lazy init docs (hpcaitech#4808)

* [hotfix] fix norm type error in zero optimizer (hpcaitech#4795)

* [hotfix] Correct several erroneous code comments (hpcaitech#4794)

* [format] applied code formatting on changed files in pull request 4595 (hpcaitech#4602)

Co-authored-by: github-actions <github-actions@github.com>

* fix format (hpcaitech#4815)

* [chat] fix gemini strategy (hpcaitech#4698)

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* g# This is a combination of 2 commits.

[chat] fix gemini strategy

fox

* [chat] fix gemini strategy

update llama2 example

[chat] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* fix

* fix

* fix

* fix

* fix

* Update train_prompts.py

* Update Qwen-7B results (hpcaitech#4821)

Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>

* [doc] update slack link (hpcaitech#4823)

* add autotune (hpcaitech#4822)

* update Colossal (hpcaitech#4832)

* add int8 rotary embedding kernel

* remove useless code

---------

Co-authored-by: flybird11111 <1829166702@qq.com>
Co-authored-by: Yuanheng Zhao <54058983+yuanheng-zhao@users.noreply.github.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
Co-authored-by: Baizhou Zhang <eddiezhang@pku.edu.cn>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions <github-actions@github.com>
Co-authored-by: digger yu <digger-yu@outlook.com>
Co-authored-by: Pengtai Xu <henryxu880@gmail.com>
Co-authored-by: Bin Jia <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: ppt0011 <143150326+ppt0011@users.noreply.github.com>
Co-authored-by: Xuanlei Zhao <43881818+oahzxl@users.noreply.github.com>
Co-authored-by: Hongxin Liu <lhx0217@gmail.com>
Co-authored-by: Wenhao Chen <cwher@outlook.com>
Co-authored-by: littsk <1214689160@qq.com>
Co-authored-by: Jianghai <72591262+CjhHa1@users.noreply.github.com>
Co-authored-by: Tong Li <tong.li352711588@gmail.com>
Co-authored-by: Yuanchen <70520919+chengeharrison@users.noreply.github.com>
Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
Co-authored-by: Desperado-Jia <502205863@qq.com>
Co-authored-by: Chandler-Bing <brp12138@163.com>
Co-authored-by: Yan haixu <40758050+hova88@users.noreply.github.com>
Xu-Kai added a commit to Xu-Kai/ColossalAI that referenced this pull request Oct 13, 2023
…ch#4843)

* [shardformer] fix GPT2DoubleHeadsModel (hpcaitech#4703)

* [hotfix] Fix import error: colossal.kernel without triton installed (hpcaitech#4722)

* [hotfix] remove triton kernels from kernel init

* revise bloom/llama kernel imports for infer

* [shardformer] to fix whisper test failed due to significant accuracy differences. (hpcaitech#4710)

* [shardformer] fix whisper test failed

* [shardformer] fix whisper test failed

* [shardformer] fix whisper test failed

* [shardformer] fix whisper test failed

* [doc] fix llama2 code link (hpcaitech#4726)

* [doc] fix llama2 code link

* [doc] fix llama2 code link

* [doc] fix llama2 code link

* [doc] Add user document for Shardformer (hpcaitech#4702)

* create shardformer doc files

* add docstring for seq-parallel

* update ShardConfig docstring

* add links to llama example

* add outdated massage

* finish introduction & supporting information

* finish 'how shardformer works'

* finish shardformer.md English doc

* fix doctest fail

* add Chinese document

* [format] applied code formatting on changed files in pull request 4726 (hpcaitech#4727)

Co-authored-by: github-actions <github-actions@github.com>

* [doc] add shardformer support matrix/update tensor parallel documents (hpcaitech#4728)

* add compatibility matrix for shardformer doc

* update tp doc

* Optimized some syntax errors in the documentation and code under applications/ (hpcaitech#4127)

Co-authored-by: flybird11111 <1829166702@qq.com>

* [shardformer] update pipeline parallel document (hpcaitech#4725)

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [legacy] remove deterministic data loader test

* [shardformer] update seq parallel document (hpcaitech#4730)

* update doc of seq parallel

* fix typo

* [example] add gpt2 HybridParallelPlugin example (hpcaitech#4653)

* add gpt2 HybridParallelPlugin example

* update readme and testci

* update test ci

* fix test_ci bug

* update requirements

* add requirements

* update requirements

* add requirement

* rename file

* [doc] polish shardformer doc (hpcaitech#4735)

* arrange position of chapters

* fix typos in seq parallel doc

* [shardformer] add custom policy in hybrid parallel plugin (hpcaitech#4718)

* add custom policy

* update assert

* [example] llama2 add fine-tune example (hpcaitech#4673)

* [shardformer] update shardformer readme

[shardformer] update shardformer readme

[shardformer] update shardformer readme

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] change dataset

* [shardformer] change dataset

* [shardformer] fix CI

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

[example] update opt example

[example] resolve comments

fix

fix

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* fix

* update llama2 example

* update llama2 example

* fix

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* Update requirements.txt

* update llama2 example

* update llama2 example

* update llama2 example

* [doc] explaination of loading large pretrained models (hpcaitech#4741)

* [kernel] update triton init hpcaitech#4740 (hpcaitech#4740)

* [legacy] clean up legacy code (hpcaitech#4743)

* [legacy] remove outdated codes of pipeline (hpcaitech#4692)

* [legacy] remove cli of benchmark and update optim (hpcaitech#4690)

* [legacy] remove cli of benchmark and update optim

* [doc] fix cli doc test

* [legacy] fix engine clip grad norm

* [legacy] remove outdated colo tensor (hpcaitech#4694)

* [legacy] remove outdated colo tensor

* [test] fix test import

* [legacy] move outdated zero to legacy (hpcaitech#4696)

* [legacy] clean up utils (hpcaitech#4700)

* [legacy] clean up utils

* [example] update examples

* [legacy] clean up amp

* [legacy] fix amp module

* [legacy] clean up gpc (hpcaitech#4742)

* [legacy] clean up context

* [legacy] clean core, constants and global vars

* [legacy] refactor initialize

* [example] fix examples ci

* [example] fix examples ci

* [legacy] fix tests

* [example] fix gpt example

* [example] fix examples ci

* [devops] fix ci installation

* [example] fix examples ci

* [format] applied code formatting on changed files in pull request 4743 (hpcaitech#4750)

Co-authored-by: github-actions <github-actions@github.com>

* [misc] update pre-commit and run all files (hpcaitech#4752)

* [misc] update pre-commit

* [misc] run pre-commit

* [misc] remove useless configuration files

* [misc] ignore cuda for clang-format

* [doc] explain suitable use case for each plugin

* [doc] put individual plugin explanation in front

* [doc] add model examples for each plugin

* [doc] put native colossalai plugins first in description section

* [chat]: update rm, add wandb and fix bugs (hpcaitech#4471)

* feat: modify forward fn of critic and reward model

* feat: modify calc_action_log_probs

* to: add wandb in sft and rm trainer

* feat: update train_sft

* feat: update train_rm

* style: modify type annotation and add warning

* feat: pass tokenizer to ppo trainer

* to: modify trainer base and maker base

* feat: add wandb in ppo trainer

* feat: pass tokenizer to generate

* test: update generate fn tests

* test: update train tests

* fix: remove action_mask

* feat: remove unused code

* fix: fix wrong ignore_index

* fix: fix mock tokenizer

* chore: update requirements

* revert: modify make_experience

* fix: fix inference

* fix: add padding side

* style: modify _on_learn_batch_end

* test: use mock tokenizer

* fix: use bf16 to avoid overflow

* fix: fix workflow

* [chat] fix gemini strategy

* [chat] fix

* sync: update colossalai strategy

* fix: fix args and model dtype

* fix: fix checkpoint test

* fix: fix requirements

* fix: fix missing import and wrong arg

* fix: temporarily skip gemini test in stage 3

* style: apply pre-commit

* fix: temporarily skip gemini test in stage 1&2

---------

Co-authored-by: Mingyan Jiang <1829166702@qq.com>

* [shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic (hpcaitech#4758)

* fix master param sync for hybrid plugin

* rewrite unwrap for ddp/fsdp

* rewrite unwrap for zero/gemini

* rewrite unwrap for hybrid plugin

* fix geemini unwrap

* fix bugs

* [bug] fix get_default_parser in examples (hpcaitech#4764)

* [doc] clean up outdated docs (hpcaitech#4765)

* [doc] clean up outdated docs

* [doc] fix linking

* [doc] fix linking

* [doc] add shardformer doc to sidebar (hpcaitech#4768)

* [chat]: add lora merge weights config (hpcaitech#4766)

* feat: modify lora merge weights fn

* feat: add lora merge weights config

* [lazy] support torch 2.0 (hpcaitech#4763)

* [lazy] support _like methods and clamp

* [lazy] pass transformers models

* [lazy] fix device move and requires grad

* [lazy] fix requires grad and refactor api

* [lazy] fix requires grad

* [bug] Fix the version check bug in colossalai run when generating the cmd. (hpcaitech#4713)

* Fix the version check bug in colossalai run when generating the cmd.

* polish code

* [feature] add gptq for inference (hpcaitech#4754)

* [gptq] add gptq kernel (hpcaitech#4416)

* add gptq

* refactor code

* fix tests

* replace auto-gptq

* rname inferance/quant

* refactor test

* add auto-gptq as an option

* reset requirements

* change assert and check auto-gptq

* add import warnings

* change test flash attn version

* remove example

* change requirements of flash_attn

* modify tests

* [skip ci] change requirements-test

* [gptq] faster gptq cuda kernel (hpcaitech#4494)

* [skip ci] add cuda kernels

* add license

* [skip ci] fix max_input_len

* format files & change test size

* [skip ci]

* [gptq] add gptq tensor parallel (hpcaitech#4538)

* add gptq tensor parallel

* add gptq tp

* delete print

* add test gptq check

* add test auto gptq check

* [gptq] combine gptq and kv cache manager (hpcaitech#4706)

* combine gptq and kv cache manager

* add init bits

* delete useless code

* add model path

* delete usless print and update test

* delete usless import

* move option gptq to shard config

* change replace linear to shardformer

* update bloom policy

* delete useless code

* fix import bug and delete uselss code

* change colossalai/gptq to colossalai/quant/gptq

* update import linear for tests

* delete useless code and mv gptq_kernel to kernel directory

* fix triton kernel

* add triton import

* [inference] chatglm2 infer demo (hpcaitech#4724)

* add chatglm2

* add

* gather needed kernels

* fix some bugs

* finish context forward

* finish context stage

* fix

* add

* pause

* add

* fix bugs

* finish chatglm

* fix bug

* change some logic

* fix bugs

* change some logics

* add

* add

* add

* fix

* fix tests

* fix

* [release] update version (hpcaitech#4775)

* [release] update version

* [doc] revert versions

* initial commit: add colossal llama 2 (hpcaitech#4784)

* [feature] ColossalEval: Evaluation Pipeline for LLMs (hpcaitech#4786)

* Add ColossalEval

* Delete evaluate in Chat

---------

Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
Co-authored-by: Tong Li <tong.li352711588@gmail.com>

* [doc] add llama2 domain-specific solution news (hpcaitech#4789)

* [doc] add llama2 domain-specific solution news

* [fix] fix weekly runing example (hpcaitech#4787)

* [fix] fix weekly runing example

* [fix] fix weekly runing example

* [doc] polish shardformer doc (hpcaitech#4779)

* fix example format in docstring

* polish shardformer doc

* [checkpointio] support unsharded checkpointIO for hybrid parallel (hpcaitech#4774)

* support unsharded saving/loading for model

* support optimizer unsharded saving

* update doc

* support unsharded loading for optimizer

* small fix

* update readme

* [lazy] support from_pretrained (hpcaitech#4801)

* [lazy] patch from pretrained

* [lazy] fix from pretrained and add tests

* [devops] update ci

* update

* [hotfix] change llama2 Colossal-LLaMA-2 script filename (hpcaitech#4800)

change filename:
pretraining.py -> trainin.py
there is no file named pretraing.py. wrong writing

* [misc] add last_epoch in CosineAnnealingWarmupLR (hpcaitech#4778)

* [doc] add lazy init docs (hpcaitech#4808)

* [hotfix] fix norm type error in zero optimizer (hpcaitech#4795)

* [hotfix] Correct several erroneous code comments (hpcaitech#4794)

* [format] applied code formatting on changed files in pull request 4595 (hpcaitech#4602)

Co-authored-by: github-actions <github-actions@github.com>

* fix format (hpcaitech#4815)

* [chat] fix gemini strategy (hpcaitech#4698)

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* g# This is a combination of 2 commits.

[chat] fix gemini strategy

fox

* [chat] fix gemini strategy

update llama2 example

[chat] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* fix

* fix

* fix

* fix

* fix

* Update train_prompts.py

* Update Qwen-7B results (hpcaitech#4821)

Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>

* [doc] update slack link (hpcaitech#4823)

* add autotune (hpcaitech#4822)

* update Colossal (hpcaitech#4832)

* add int8 rotary embedding kernel

* remove useless code

---------

Co-authored-by: flybird11111 <1829166702@qq.com>
Co-authored-by: Yuanheng Zhao <54058983+yuanheng-zhao@users.noreply.github.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
Co-authored-by: Baizhou Zhang <eddiezhang@pku.edu.cn>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions <github-actions@github.com>
Co-authored-by: digger yu <digger-yu@outlook.com>
Co-authored-by: Pengtai Xu <henryxu880@gmail.com>
Co-authored-by: Bin Jia <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: ppt0011 <143150326+ppt0011@users.noreply.github.com>
Co-authored-by: Xuanlei Zhao <43881818+oahzxl@users.noreply.github.com>
Co-authored-by: Hongxin Liu <lhx0217@gmail.com>
Co-authored-by: Wenhao Chen <cwher@outlook.com>
Co-authored-by: littsk <1214689160@qq.com>
Co-authored-by: Jianghai <72591262+CjhHa1@users.noreply.github.com>
Co-authored-by: Tong Li <tong.li352711588@gmail.com>
Co-authored-by: Yuanchen <70520919+chengeharrison@users.noreply.github.com>
Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
Co-authored-by: Desperado-Jia <502205863@qq.com>
Co-authored-by: Chandler-Bing <brp12138@163.com>
Co-authored-by: Yan haixu <40758050+hova88@users.noreply.github.com>
Xu-Kai added a commit to Xu-Kai/ColossalAI that referenced this pull request Oct 13, 2023
…ch#4843)

* [shardformer] fix GPT2DoubleHeadsModel (hpcaitech#4703)

* [hotfix] Fix import error: colossal.kernel without triton installed (hpcaitech#4722)

* [hotfix] remove triton kernels from kernel init

* revise bloom/llama kernel imports for infer

* [shardformer] to fix whisper test failed due to significant accuracy differences. (hpcaitech#4710)

* [shardformer] fix whisper test failed

* [shardformer] fix whisper test failed

* [shardformer] fix whisper test failed

* [shardformer] fix whisper test failed

* [doc] fix llama2 code link (hpcaitech#4726)

* [doc] fix llama2 code link

* [doc] fix llama2 code link

* [doc] fix llama2 code link

* [doc] Add user document for Shardformer (hpcaitech#4702)

* create shardformer doc files

* add docstring for seq-parallel

* update ShardConfig docstring

* add links to llama example

* add outdated massage

* finish introduction & supporting information

* finish 'how shardformer works'

* finish shardformer.md English doc

* fix doctest fail

* add Chinese document

* [format] applied code formatting on changed files in pull request 4726 (hpcaitech#4727)

Co-authored-by: github-actions <github-actions@github.com>

* [doc] add shardformer support matrix/update tensor parallel documents (hpcaitech#4728)

* add compatibility matrix for shardformer doc

* update tp doc

* Optimized some syntax errors in the documentation and code under applications/ (hpcaitech#4127)

Co-authored-by: flybird11111 <1829166702@qq.com>

* [shardformer] update pipeline parallel document (hpcaitech#4725)

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [legacy] remove deterministic data loader test

* [shardformer] update seq parallel document (hpcaitech#4730)

* update doc of seq parallel

* fix typo

* [example] add gpt2 HybridParallelPlugin example (hpcaitech#4653)

* add gpt2 HybridParallelPlugin example

* update readme and testci

* update test ci

* fix test_ci bug

* update requirements

* add requirements

* update requirements

* add requirement

* rename file

* [doc] polish shardformer doc (hpcaitech#4735)

* arrange position of chapters

* fix typos in seq parallel doc

* [shardformer] add custom policy in hybrid parallel plugin (hpcaitech#4718)

* add custom policy

* update assert

* [example] llama2 add fine-tune example (hpcaitech#4673)

* [shardformer] update shardformer readme

[shardformer] update shardformer readme

[shardformer] update shardformer readme

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] change dataset

* [shardformer] change dataset

* [shardformer] fix CI

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

[example] update opt example

[example] resolve comments

fix

fix

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* fix

* update llama2 example

* update llama2 example

* fix

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* Update requirements.txt

* update llama2 example

* update llama2 example

* update llama2 example

* [doc] explaination of loading large pretrained models (hpcaitech#4741)

* [kernel] update triton init hpcaitech#4740 (hpcaitech#4740)

* [legacy] clean up legacy code (hpcaitech#4743)

* [legacy] remove outdated codes of pipeline (hpcaitech#4692)

* [legacy] remove cli of benchmark and update optim (hpcaitech#4690)

* [legacy] remove cli of benchmark and update optim

* [doc] fix cli doc test

* [legacy] fix engine clip grad norm

* [legacy] remove outdated colo tensor (hpcaitech#4694)

* [legacy] remove outdated colo tensor

* [test] fix test import

* [legacy] move outdated zero to legacy (hpcaitech#4696)

* [legacy] clean up utils (hpcaitech#4700)

* [legacy] clean up utils

* [example] update examples

* [legacy] clean up amp

* [legacy] fix amp module

* [legacy] clean up gpc (hpcaitech#4742)

* [legacy] clean up context

* [legacy] clean core, constants and global vars

* [legacy] refactor initialize

* [example] fix examples ci

* [example] fix examples ci

* [legacy] fix tests

* [example] fix gpt example

* [example] fix examples ci

* [devops] fix ci installation

* [example] fix examples ci

* [format] applied code formatting on changed files in pull request 4743 (hpcaitech#4750)

Co-authored-by: github-actions <github-actions@github.com>

* [misc] update pre-commit and run all files (hpcaitech#4752)

* [misc] update pre-commit

* [misc] run pre-commit

* [misc] remove useless configuration files

* [misc] ignore cuda for clang-format

* [doc] explain suitable use case for each plugin

* [doc] put individual plugin explanation in front

* [doc] add model examples for each plugin

* [doc] put native colossalai plugins first in description section

* [chat]: update rm, add wandb and fix bugs (hpcaitech#4471)

* feat: modify forward fn of critic and reward model

* feat: modify calc_action_log_probs

* to: add wandb in sft and rm trainer

* feat: update train_sft

* feat: update train_rm

* style: modify type annotation and add warning

* feat: pass tokenizer to ppo trainer

* to: modify trainer base and maker base

* feat: add wandb in ppo trainer

* feat: pass tokenizer to generate

* test: update generate fn tests

* test: update train tests

* fix: remove action_mask

* feat: remove unused code

* fix: fix wrong ignore_index

* fix: fix mock tokenizer

* chore: update requirements

* revert: modify make_experience

* fix: fix inference

* fix: add padding side

* style: modify _on_learn_batch_end

* test: use mock tokenizer

* fix: use bf16 to avoid overflow

* fix: fix workflow

* [chat] fix gemini strategy

* [chat] fix

* sync: update colossalai strategy

* fix: fix args and model dtype

* fix: fix checkpoint test

* fix: fix requirements

* fix: fix missing import and wrong arg

* fix: temporarily skip gemini test in stage 3

* style: apply pre-commit

* fix: temporarily skip gemini test in stage 1&2

---------

Co-authored-by: Mingyan Jiang <1829166702@qq.com>

* [shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic (hpcaitech#4758)

* fix master param sync for hybrid plugin

* rewrite unwrap for ddp/fsdp

* rewrite unwrap for zero/gemini

* rewrite unwrap for hybrid plugin

* fix geemini unwrap

* fix bugs

* [bug] fix get_default_parser in examples (hpcaitech#4764)

* [doc] clean up outdated docs (hpcaitech#4765)

* [doc] clean up outdated docs

* [doc] fix linking

* [doc] fix linking

* [doc] add shardformer doc to sidebar (hpcaitech#4768)

* [chat]: add lora merge weights config (hpcaitech#4766)

* feat: modify lora merge weights fn

* feat: add lora merge weights config

* [lazy] support torch 2.0 (hpcaitech#4763)

* [lazy] support _like methods and clamp

* [lazy] pass transformers models

* [lazy] fix device move and requires grad

* [lazy] fix requires grad and refactor api

* [lazy] fix requires grad

* [bug] Fix the version check bug in colossalai run when generating the cmd. (hpcaitech#4713)

* Fix the version check bug in colossalai run when generating the cmd.

* polish code

* [feature] add gptq for inference (hpcaitech#4754)

* [gptq] add gptq kernel (hpcaitech#4416)

* add gptq

* refactor code

* fix tests

* replace auto-gptq

* rname inferance/quant

* refactor test

* add auto-gptq as an option

* reset requirements

* change assert and check auto-gptq

* add import warnings

* change test flash attn version

* remove example

* change requirements of flash_attn

* modify tests

* [skip ci] change requirements-test

* [gptq] faster gptq cuda kernel (hpcaitech#4494)

* [skip ci] add cuda kernels

* add license

* [skip ci] fix max_input_len

* format files & change test size

* [skip ci]

* [gptq] add gptq tensor parallel (hpcaitech#4538)

* add gptq tensor parallel

* add gptq tp

* delete print

* add test gptq check

* add test auto gptq check

* [gptq] combine gptq and kv cache manager (hpcaitech#4706)

* combine gptq and kv cache manager

* add init bits

* delete useless code

* add model path

* delete usless print and update test

* delete usless import

* move option gptq to shard config

* change replace linear to shardformer

* update bloom policy

* delete useless code

* fix import bug and delete uselss code

* change colossalai/gptq to colossalai/quant/gptq

* update import linear for tests

* delete useless code and mv gptq_kernel to kernel directory

* fix triton kernel

* add triton import

* [inference] chatglm2 infer demo (hpcaitech#4724)

* add chatglm2

* add

* gather needed kernels

* fix some bugs

* finish context forward

* finish context stage

* fix

* add

* pause

* add

* fix bugs

* finish chatglm

* fix bug

* change some logic

* fix bugs

* change some logics

* add

* add

* add

* fix

* fix tests

* fix

* [release] update version (hpcaitech#4775)

* [release] update version

* [doc] revert versions

* initial commit: add colossal llama 2 (hpcaitech#4784)

* [feature] ColossalEval: Evaluation Pipeline for LLMs (hpcaitech#4786)

* Add ColossalEval

* Delete evaluate in Chat

---------

Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
Co-authored-by: Tong Li <tong.li352711588@gmail.com>

* [doc] add llama2 domain-specific solution news (hpcaitech#4789)

* [doc] add llama2 domain-specific solution news

* [fix] fix weekly runing example (hpcaitech#4787)

* [fix] fix weekly runing example

* [fix] fix weekly runing example

* [doc] polish shardformer doc (hpcaitech#4779)

* fix example format in docstring

* polish shardformer doc

* [checkpointio] support unsharded checkpointIO for hybrid parallel (hpcaitech#4774)

* support unsharded saving/loading for model

* support optimizer unsharded saving

* update doc

* support unsharded loading for optimizer

* small fix

* update readme

* [lazy] support from_pretrained (hpcaitech#4801)

* [lazy] patch from pretrained

* [lazy] fix from pretrained and add tests

* [devops] update ci

* update

* [hotfix] change llama2 Colossal-LLaMA-2 script filename (hpcaitech#4800)

change filename:
pretraining.py -> trainin.py
there is no file named pretraing.py. wrong writing

* [misc] add last_epoch in CosineAnnealingWarmupLR (hpcaitech#4778)

* [doc] add lazy init docs (hpcaitech#4808)

* [hotfix] fix norm type error in zero optimizer (hpcaitech#4795)

* [hotfix] Correct several erroneous code comments (hpcaitech#4794)

* [format] applied code formatting on changed files in pull request 4595 (hpcaitech#4602)

Co-authored-by: github-actions <github-actions@github.com>

* fix format (hpcaitech#4815)

* [chat] fix gemini strategy (hpcaitech#4698)

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* g# This is a combination of 2 commits.

[chat] fix gemini strategy

fox

* [chat] fix gemini strategy

update llama2 example

[chat] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* fix

* fix

* fix

* fix

* fix

* Update train_prompts.py

* Update Qwen-7B results (hpcaitech#4821)

Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>

* [doc] update slack link (hpcaitech#4823)

* add autotune (hpcaitech#4822)

* update Colossal (hpcaitech#4832)

* add int8 rotary embedding kernel

* remove useless code

---------

Co-authored-by: flybird11111 <1829166702@qq.com>
Co-authored-by: Yuanheng Zhao <54058983+yuanheng-zhao@users.noreply.github.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
Co-authored-by: Baizhou Zhang <eddiezhang@pku.edu.cn>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions <github-actions@github.com>
Co-authored-by: digger yu <digger-yu@outlook.com>
Co-authored-by: Pengtai Xu <henryxu880@gmail.com>
Co-authored-by: Bin Jia <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: ppt0011 <143150326+ppt0011@users.noreply.github.com>
Co-authored-by: Xuanlei Zhao <43881818+oahzxl@users.noreply.github.com>
Co-authored-by: Hongxin Liu <lhx0217@gmail.com>
Co-authored-by: Wenhao Chen <cwher@outlook.com>
Co-authored-by: littsk <1214689160@qq.com>
Co-authored-by: Jianghai <72591262+CjhHa1@users.noreply.github.com>
Co-authored-by: Tong Li <tong.li352711588@gmail.com>
Co-authored-by: Yuanchen <70520919+chengeharrison@users.noreply.github.com>
Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
Co-authored-by: Desperado-Jia <502205863@qq.com>
Co-authored-by: Chandler-Bing <brp12138@163.com>
Co-authored-by: Yan haixu <40758050+hova88@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants