Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[misc] update pre-commit and run all files #4752

Merged
merged 4 commits into from
Sep 19, 2023

Conversation

ver217
Copy link
Member

@ver217 ver217 commented Sep 18, 2023

📌 Checklist before creating the PR

  • I have created an issue for this PR for traceability
  • The title follows the standard format: [doc/gemini/tensor/...]: A concise description
  • I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

  1. Add autoflake to pre-commit, which can remove unused imports and variables.
  2. Replace yapf with black, which is a better formater for Python.
  3. Run pre-commit for all files.

💥 Checklist before requesting a review

  • I have linked my PR to an issue (instruction)
  • My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
  • I have performed a self-review of my code
  • I have added thorough tests.
  • I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

  • 🌝 Yes, I do.
  • 🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

Copy link
Member

@TongLi3701 TongLi3701 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@github-actions
Copy link
Contributor

The code coverage for the changed files is 37%.

Click me to view the complete report
Name                                                                                                      Stmts   Miss  Cover
-----------------------------------------------------------------------------------------------------------------------------
colossalai/__init__.py                                                                                        7      3    57%
colossalai/_analyzer/_subclasses/_meta_registration.py                                                      230    155    33%
colossalai/_analyzer/_subclasses/_monkey_patch.py                                                            16      3    81%
colossalai/_analyzer/_subclasses/flop_tensor.py                                                             252    210    17%
colossalai/_analyzer/_subclasses/meta_tensor.py                                                             106     42    60%
colossalai/_analyzer/fx/codegen.py                                                                          244    244     0%
colossalai/_analyzer/fx/graph_module.py                                                                     119    119     0%
colossalai/_analyzer/fx/node_util.py                                                                         97     97     0%
colossalai/_analyzer/fx/passes/graph_profile.py                                                             100    100     0%
colossalai/_analyzer/fx/passes/shape_prop.py                                                                104    104     0%
colossalai/_analyzer/fx/symbolic_profile.py                                                                  17     17     0%
colossalai/_analyzer/fx/tracer/bias_addition.py                                                              60     60     0%
colossalai/_analyzer/fx/tracer/custom_leaf_module.py                                                         22     22     0%
colossalai/_analyzer/fx/tracer/proxy.py                                                                      78     78     0%
colossalai/_analyzer/fx/tracer/symbolic_trace.py                                                             30     30     0%
colossalai/_analyzer/fx/tracer/tracer.py                                                                    222    222     0%
colossalai/amp/naive_amp/grad_scaler/__init__.py                                                              4      0   100%
colossalai/amp/naive_amp/grad_scaler/base_grad_scaler.py                                                     30      8    73%
colossalai/amp/naive_amp/grad_scaler/constant_grad_scaler.py                                                  7      2    71%
colossalai/amp/naive_amp/grad_scaler/dynamic_grad_scaler.py                                                  63     24    62%
colossalai/amp/naive_amp/mixed_precision_mixin/__init__.py                                                    4      0   100%
colossalai/amp/naive_amp/mixed_precision_mixin/base.py                                                       15      0   100%
colossalai/amp/naive_amp/mixed_precision_mixin/fp16.py                                                       48      0   100%
colossalai/amp/naive_amp/mixed_precision_optimizer.py                                                        98     21    79%
colossalai/auto_parallel/checkpoint/build_c_ext.py                                                            5      5     0%
colossalai/auto_parallel/checkpoint/ckpt_solver_base.py                                                      72     72     0%
colossalai/auto_parallel/checkpoint/ckpt_solver_chen.py                                                      47     47     0%
colossalai/auto_parallel/checkpoint/ckpt_solver_rotor.py                                                    252    252     0%
colossalai/auto_parallel/checkpoint/operation.py                                                            104    104     0%
colossalai/auto_parallel/meta_profiler/constants.py                                                           6      6     0%
colossalai/auto_parallel/meta_profiler/meta_registry/activation.py                                           28     28     0%
colossalai/auto_parallel/meta_profiler/meta_registry/binary_elementwise_ops.py                               26     26     0%
colossalai/auto_parallel/meta_profiler/meta_registry/conv.py                                                 47     47     0%
colossalai/auto_parallel/meta_profiler/meta_registry/embedding.py                                            23     23     0%
colossalai/auto_parallel/meta_profiler/meta_registry/linear.py                                              113    113     0%
colossalai/auto_parallel/meta_profiler/meta_registry/non_spmd.py                                             15     15     0%
colossalai/auto_parallel/meta_profiler/meta_registry/norm.py                                                 56     56     0%
colossalai/auto_parallel/meta_profiler/meta_registry/pooling.py                                              51     51     0%
colossalai/auto_parallel/meta_profiler/meta_registry/tensor.py                                               24     24     0%
colossalai/auto_parallel/meta_profiler/meta_registry/where.py                                                25     25     0%
colossalai/auto_parallel/meta_profiler/registry.py                                                           20     20     0%
colossalai/auto_parallel/meta_profiler/shard_metainfo.py                                                     63     63     0%
colossalai/auto_parallel/offload/amp_optimizer.py                                                            99     99     0%
colossalai/auto_parallel/offload/base_offload_module.py                                                      71     71     0%
colossalai/auto_parallel/offload/mem_optimize.py                                                             36     36     0%
colossalai/auto_parallel/offload/region.py                                                                   81     81     0%
colossalai/auto_parallel/offload/region_manager.py                                                          283    283     0%
colossalai/auto_parallel/offload/runtime.py                                                                 146    146     0%
colossalai/auto_parallel/offload/solver.py                                                                  259    259     0%
colossalai/auto_parallel/offload/training_simulator.py                                                      228    228     0%
colossalai/auto_parallel/offload/util.py                                                                     58     58     0%
colossalai/auto_parallel/passes/comm_metainfo_pass.py                                                        60     60     0%
colossalai/auto_parallel/passes/meta_info_prop.py                                                            94     94     0%
colossalai/auto_parallel/passes/runtime_apply_pass.py                                                       145    145     0%
colossalai/auto_parallel/passes/runtime_preparation_pass.py                                                 294    294     0%
colossalai/auto_parallel/tensor_shard/constants.py                                                           19      0   100%
colossalai/auto_parallel/tensor_shard/initialize.py                                                         120    120     0%
colossalai/auto_parallel/tensor_shard/node_handler/__init__.py                                               26     26     0%
colossalai/auto_parallel/tensor_shard/node_handler/addmm_handler.py                                          44     44     0%
colossalai/auto_parallel/tensor_shard/node_handler/batch_norm_handler.py                                     28     28     0%
colossalai/auto_parallel/tensor_shard/node_handler/binary_elementwise_handler.py                             62     62     0%
colossalai/auto_parallel/tensor_shard/node_handler/bmm_handler.py                                            55     55     0%
colossalai/auto_parallel/tensor_shard/node_handler/conv_handler.py                                           65     65     0%
colossalai/auto_parallel/tensor_shard/node_handler/default_reshape_handler.py                                38     38     0%
colossalai/auto_parallel/tensor_shard/node_handler/embedding_handler.py                                      89     89     0%
colossalai/auto_parallel/tensor_shard/node_handler/getattr_handler.py                                        15     15     0%
colossalai/auto_parallel/tensor_shard/node_handler/getitem_handler.py                                        23     23     0%
colossalai/auto_parallel/tensor_shard/node_handler/layer_norm_handler.py                                     23     23     0%
colossalai/auto_parallel/tensor_shard/node_handler/linear_handler.py                                        117    117     0%
colossalai/auto_parallel/tensor_shard/node_handler/matmul_handler.py                                        266    266     0%
colossalai/auto_parallel/tensor_shard/node_handler/node_handler.py                                          162    162     0%
colossalai/auto_parallel/tensor_shard/node_handler/normal_pooling_handler.py                                 25     25     0%
colossalai/auto_parallel/tensor_shard/node_handler/output_handler.py                                         33     33     0%
colossalai/auto_parallel/tensor_shard/node_handler/permute_handler.py                                        44     44     0%
colossalai/auto_parallel/tensor_shard/node_handler/placeholder_handler.py                                    20     20     0%
colossalai/auto_parallel/tensor_shard/node_handler/registry.py                                               19     19     0%
colossalai/auto_parallel/tensor_shard/node_handler/softmax_handler.py                                        30     30     0%
colossalai/auto_parallel/tensor_shard/node_handler/split_handler.py                                          36     36     0%
colossalai/auto_parallel/tensor_shard/node_handler/strategy/__init__.py                                      19     19     0%
colossalai/auto_parallel/tensor_shard/node_handler/strategy/batch_norm_generator.py                         115    115     0%
colossalai/auto_parallel/tensor_shard/node_handler/strategy/binary_elementwise_generator.py                  58     58     0%
colossalai/auto_parallel/tensor_shard/node_handler/strategy/conv_strategy_generator.py                      205    205     0%
colossalai/auto_parallel/tensor_shard/node_handler/strategy/embedding_generator.py                          106    106     0%
colossalai/auto_parallel/tensor_shard/node_handler/strategy/getattr_generator.py                             45     45     0%
colossalai/auto_parallel/tensor_shard/node_handler/strategy/getitem_generator.py                             92     92     0%
colossalai/auto_parallel/tensor_shard/node_handler/strategy/layer_norm_generator.py                          99     99     0%
colossalai/auto_parallel/tensor_shard/node_handler/strategy/matmul_strategy_generator.py                    387    387     0%
colossalai/auto_parallel/tensor_shard/node_handler/strategy/normal_pooling_generator.py                      58     58     0%
colossalai/auto_parallel/tensor_shard/node_handler/strategy/output_generator.py                              62     62     0%
colossalai/auto_parallel/tensor_shard/node_handler/strategy/placeholder_generator.py                         46     46     0%
colossalai/auto_parallel/tensor_shard/node_handler/strategy/reshape_generator.py                            179    179     0%
colossalai/auto_parallel/tensor_shard/node_handler/strategy/softmax_generator.py                             50     50     0%
colossalai/auto_parallel/tensor_shard/node_handler/strategy/strategy_generator.py                           134    134     0%
colossalai/auto_parallel/tensor_shard/node_handler/strategy/sum_generator.py                                 54     54     0%
colossalai/auto_parallel/tensor_shard/node_handler/strategy/tensor_constructor_generator.py                  28     28     0%
colossalai/auto_parallel/tensor_shard/node_handler/strategy/unary_elementwise_generator.py                   38     38     0%
colossalai/auto_parallel/tensor_shard/node_handler/strategy/where_generator.py                               46     46     0%
colossalai/auto_parallel/tensor_shard/node_handler/sum_handler.py                                            46     46     0%
colossalai/auto_parallel/tensor_shard/node_handler/tensor_constructor_handler.py                             20     20     0%
colossalai/auto_parallel/tensor_shard/node_handler/transpose_handler.py                                      36     36     0%
colossalai/auto_parallel/tensor_shard/node_handler/unary_elementwise_handler.py                              27     27     0%
colossalai/auto_parallel/tensor_shard/node_handler/view_handler.py                                           28     28     0%
colossalai/auto_parallel/tensor_shard/node_handler/where_handler.py                                          42     42     0%
colossalai/auto_parallel/tensor_shard/options.py                                                             20     20     0%
colossalai/auto_parallel/tensor_shard/sharding_strategy.py                                                  136     64    53%
colossalai/auto_parallel/tensor_shard/solver/__init__.py                                                      5      5     0%
colossalai/auto_parallel/tensor_shard/solver/cost_graph.py                                                  131    131     0%
colossalai/auto_parallel/tensor_shard/solver/graph_analysis.py                                               91     91     0%
colossalai/auto_parallel/tensor_shard/solver/solver.py                                                      274    274     0%
colossalai/auto_parallel/tensor_shard/solver/strategies_constructor.py                                      109    109     0%
colossalai/auto_parallel/tensor_shard/utils/__init__.py                                                       6      6     0%
colossalai/auto_parallel/tensor_shard/utils/broadcast.py                                                     77     77     0%
colossalai/auto_parallel/tensor_shard/utils/factory.py                                                      123    123     0%
colossalai/auto_parallel/tensor_shard/utils/misc.py                                                          45     45     0%
colossalai/auto_parallel/tensor_shard/utils/reshape.py                                                       86     86     0%
colossalai/auto_parallel/tensor_shard/utils/sharding.py                                                      61     61     0%
colossalai/booster/accelerator.py                                                                            17      5    71%
colossalai/booster/booster.py                                                                                69     12    83%
colossalai/booster/mixed_precision/__init__.py                                                               12      3    75%
colossalai/booster/mixed_precision/fp16_apex.py                                                               6      1    83%
colossalai/booster/mixed_precision/fp16_naive.py                                                              4      1    75%
colossalai/booster/mixed_precision/fp16_torch.py                                                             46      2    96%
colossalai/booster/plugin/__init__.py                                                                        11      0   100%
colossalai/booster/plugin/dp_plugin_base.py                                                                  22      4    82%
colossalai/booster/plugin/gemini_plugin.py                                                                  123     12    90%
colossalai/booster/plugin/hybrid_parallel_plugin.py                                                         214     15    93%
colossalai/booster/plugin/low_level_zero_plugin.py                                                          150     11    93%
colossalai/booster/plugin/plugin_base.py                                                                     36      6    83%
colossalai/booster/plugin/pp_plugin_base.py                                                                   9      1    89%
colossalai/booster/plugin/torch_ddp_plugin.py                                                                66      2    97%
colossalai/booster/plugin/torch_fsdp_plugin.py                                                               97     13    87%
colossalai/checkpoint_io/__init__.py                                                                          5      0   100%
colossalai/checkpoint_io/checkpoint_io_base.py                                                               58      1    98%
colossalai/checkpoint_io/general_checkpoint_io.py                                                            90      8    91%
colossalai/checkpoint_io/hybrid_parallel_checkpoint_io.py                                                   337     32    91%
colossalai/checkpoint_io/index_file.py                                                                       70     17    76%
colossalai/checkpoint_io/utils.py                                                                           316     45    86%
colossalai/cli/__init__.py                                                                                    2      2     0%
colossalai/cli/check/__init__.py                                                                             10     10     0%
colossalai/cli/check/check_installation.py                                                                  106    106     0%
colossalai/cli/cli.py                                                                                        14     14     0%
colossalai/cli/launcher/__init__.py                                                                          24     24     0%
colossalai/cli/launcher/hostinfo.py                                                                          43     43     0%
colossalai/cli/launcher/multinode_runner.py                                                                  52     52     0%
colossalai/cli/launcher/run.py                                                                              139    139     0%
colossalai/cluster/__init__.py                                                                                5      0   100%
colossalai/cluster/device_mesh_manager.py                                                                    52     18    65%
colossalai/cluster/dist_coordinator.py                                                                       63     30    52%
colossalai/cluster/process_group_manager.py                                                                  25     16    36%
colossalai/cluster/process_group_mesh.py                                                                     73      1    99%
colossalai/context/__init__.py                                                                                2      0   100%
colossalai/context/config.py                                                                                 58      9    84%
colossalai/context/moe_context.py                                                                            65     37    43%
colossalai/context/singleton_meta.py                                                                          8      0   100%
colossalai/device/__init__.py                                                                                 3      0   100%
colossalai/device/alpha_beta_profiler.py                                                                    209    191     9%
colossalai/device/calc_pipeline_strategy.py                                                                  80     75     6%
colossalai/device/device_mesh.py                                                                            178     16    91%
colossalai/fx/_compatibility.py                                                                              24     10    58%
colossalai/fx/_meta_regist_12.py                                                                            257    257     0%
colossalai/fx/codegen/activation_checkpoint_codegen.py                                                      581    555     4%
colossalai/fx/graph_module.py                                                                                93     75    19%
colossalai/fx/passes/adding_split_node_pass.py                                                              256    238     7%
colossalai/fx/passes/concrete_info_prop.py                                                                   64     36    44%
colossalai/fx/passes/meta_info_prop.py                                                                      101     64    37%
colossalai/fx/passes/passes_for_gpt2_test.py                                                                245    245     0%
colossalai/fx/passes/shard_1d_pass.py                                                                        91     79    13%
colossalai/fx/passes/split_module.py                                                                        146    134     8%
colossalai/fx/passes/utils.py                                                                                76     76     0%
colossalai/fx/profiler/__init__.py                                                                            9      1    89%
colossalai/fx/profiler/constants.py                                                                           9      0   100%
colossalai/fx/profiler/dataflow.py                                                                           56     28    50%
colossalai/fx/profiler/experimental/constants.py                                                              6      6     0%
colossalai/fx/profiler/experimental/profiler.py                                                              64     64     0%
colossalai/fx/profiler/experimental/profiler_function/activation_function.py                                 18     18     0%
colossalai/fx/profiler/experimental/profiler_function/arithmetic.py                                          63     63     0%
colossalai/fx/profiler/experimental/profiler_function/embedding.py                                            8      8     0%
colossalai/fx/profiler/experimental/profiler_function/linear.py                                              11     11     0%
colossalai/fx/profiler/experimental/profiler_function/normalization.py                                       29     29     0%
colossalai/fx/profiler/experimental/profiler_function/pooling.py                                             19     19     0%
colossalai/fx/profiler/experimental/profiler_function/python_ops.py                                          13     13     0%
colossalai/fx/profiler/experimental/profiler_function/torch_ops.py                                           47     47     0%
colossalai/fx/profiler/experimental/profiler_module/activation_function.py                                   18     18     0%
colossalai/fx/profiler/experimental/profiler_module/attention.py                                             40     40     0%
colossalai/fx/profiler/experimental/profiler_module/convolution.py                                           90     90     0%
colossalai/fx/profiler/experimental/profiler_module/dropout.py                                                8      8     0%
colossalai/fx/profiler/experimental/profiler_module/linear.py                                                12     12     0%
colossalai/fx/profiler/experimental/profiler_module/normalization.py                                         26     26     0%
colossalai/fx/profiler/experimental/profiler_module/pooling.py                                               19     19     0%
colossalai/fx/profiler/experimental/profiler_module/rnn.py                                                   57     57     0%
colossalai/fx/profiler/experimental/profiler_module/torch_op.py                                               8      8     0%
colossalai/fx/profiler/experimental/registry.py                                                              17     17     0%
colossalai/fx/profiler/experimental/shard_utils.py                                                           12     12     0%
colossalai/fx/profiler/memory_utils.py                                                                       36     26    28%
colossalai/fx/profiler/opcount.py                                                                           114     82    28%
colossalai/fx/profiler/profiler.py                                                                          220    190    14%
colossalai/fx/profiler/shard_utils.py                                                                        34     19    44%
colossalai/fx/profiler/tensor.py                                                                             72     53    26%
colossalai/fx/proxy.py                                                                                       81     54    33%
colossalai/fx/tracer/_meta_trace.py                                                                          70     64     9%
colossalai/fx/tracer/_tracer_utils.py                                                                        34     26    24%
colossalai/fx/tracer/bias_addition_patch/patched_bias_addition_function/addbmm.py                            42     32    24%
colossalai/fx/tracer/bias_addition_patch/patched_bias_addition_function/addmm.py                             34     25    26%
colossalai/fx/tracer/bias_addition_patch/patched_bias_addition_function/bias_addition_function.py            39     23    41%
colossalai/fx/tracer/bias_addition_patch/patched_bias_addition_function/linear.py                            16      9    44%
colossalai/fx/tracer/bias_addition_patch/patched_bias_addition_module/bias_addition_module.py                43     28    35%
colossalai/fx/tracer/bias_addition_patch/patched_bias_addition_module/conv.py                                41     30    27%
colossalai/fx/tracer/bias_addition_patch/patched_bias_addition_module/linear.py                              11      4    64%
colossalai/fx/tracer/experimental.py                                                                        420    420     0%
colossalai/fx/tracer/meta_patch/patched_function/activation_function.py                                       5      1    80%
colossalai/fx/tracer/meta_patch/patched_function/arithmetic.py                                               78     59    24%
colossalai/fx/tracer/meta_patch/patched_function/convolution.py                                             110     85    23%
colossalai/fx/tracer/meta_patch/patched_function/embedding.py                                                 5      1    80%
colossalai/fx/tracer/meta_patch/patched_function/normalization.py                                             8      2    75%
colossalai/fx/tracer/meta_patch/patched_function/python_ops.py                                               44     38    14%
colossalai/fx/tracer/meta_patch/patched_function/torch_ops.py                                               125     87    30%
colossalai/fx/tracer/meta_patch/patched_module/__init__.py                                                    7      0   100%
colossalai/fx/tracer/meta_patch/patched_module/activation_function.py                                        10      1    90%
colossalai/fx/tracer/meta_patch/patched_module/convolution.py                                                51     36    29%
colossalai/fx/tracer/meta_patch/patched_module/embedding.py                                                   6      2    67%
colossalai/fx/tracer/meta_patch/patched_module/linear.py                                                      7      3    57%
colossalai/fx/tracer/meta_patch/patched_module/normalization.py                                              23      9    61%
colossalai/fx/tracer/meta_patch/patched_module/pooling.py                                                   129    105    19%
colossalai/fx/tracer/meta_patch/patched_module/rnn.py                                                         9      4    56%
colossalai/fx/tracer/registry.py                                                                             20      4    80%
colossalai/fx/tracer/tracer.py                                                                              312    261    16%
colossalai/inference/tensor_parallel/__init__.py                                                              3      0   100%
colossalai/inference/tensor_parallel/batch_infer_state.py                                                    34      9    74%
colossalai/inference/tensor_parallel/engine.py                                                              134    105    22%
colossalai/inference/tensor_parallel/kvcache_manager.py                                                      56     42    25%
colossalai/inference/tensor_parallel/modeling/__init__.py                                                     3      3     0%
colossalai/inference/tensor_parallel/modeling/bloom.py                                                      212    212     0%
colossalai/inference/tensor_parallel/modeling/llama.py                                                      162    162     0%
colossalai/inference/tensor_parallel/policies/__init__.py                                                     3      3     0%
colossalai/inference/tensor_parallel/policies/bloom.py                                                       37     37     0%
colossalai/inference/tensor_parallel/policies/llama.py                                                       40     40     0%
colossalai/initialize.py                                                                                     45     22    51%
colossalai/interface/__init__.py                                                                              3      0   100%
colossalai/interface/model.py                                                                                13      1    92%
colossalai/interface/optimizer.py                                                                            45      5    89%
colossalai/kernel/cuda_native/__init__.py                                                                     5      0   100%
colossalai/kernel/cuda_native/layer_norm.py                                                                  49     27    45%
colossalai/kernel/cuda_native/mha/__init__.py                                                                 2      0   100%
colossalai/kernel/cuda_native/mha/flash_attn_2.py                                                            30      7    77%
colossalai/kernel/cuda_native/mha/mem_eff_attn.py                                                            36      6    83%
colossalai/kernel/cuda_native/mha/mha.py                                                                     60      7    88%
colossalai/kernel/cuda_native/mha/utils.py                                                                   52     11    79%
colossalai/kernel/cuda_native/multihead_attention.py                                                        153    123    20%
colossalai/kernel/cuda_native/scaled_softmax.py                                                              96     65    32%
colossalai/kernel/jit/__init__.py                                                                             4      0   100%
colossalai/kernel/jit/bias_dropout_add.py                                                                    11      5    55%
colossalai/kernel/jit/bias_gelu.py                                                                           22      9    59%
colossalai/kernel/jit/option.py                                                                              48     40    17%
colossalai/kernel/triton/__init__.py                                                                         14      3    79%
colossalai/kernel/triton/context_attention.py                                                                92     79    14%
colossalai/kernel/triton/copy_kv_cache_dest.py                                                               32     22    31%
colossalai/kernel/triton/fused_layernorm.py                                                                  50     40    20%
colossalai/kernel/triton/qkv_matmul_kernel.py                                                                43     36    16%
colossalai/kernel/triton/rms_norm.py                                                                         42     33    21%
colossalai/kernel/triton/rotary_embedding_kernel.py                                                          37     30    19%
colossalai/kernel/triton/self_attention_nofusion.py                                                          56     47    16%
colossalai/kernel/triton/softmax.py                                                                          55     46    16%
colossalai/kernel/triton/token_attention_kernel.py                                                          150    128    15%
colossalai/lazy/__init__.py                                                                                   2      0   100%
colossalai/lazy/lazy_init.py                                                                                314     44    86%
colossalai/legacy/__init__.py                                                                                 2      0   100%
colossalai/legacy/amp/__init__.py                                                                            20     10    50%
colossalai/legacy/amp/amp_type.py                                                                             5      0   100%
colossalai/legacy/amp/apex_amp/__init__.py                                                                    9      0   100%
colossalai/legacy/amp/apex_amp/apex_amp.py                                                                   15      4    73%
colossalai/legacy/amp/naive_amp/__init__.py                                                                  29     20    31%
colossalai/legacy/amp/naive_amp/_fp16_optimizer.py                                                          173    127    27%
colossalai/legacy/amp/naive_amp/_utils.py                                                                    21     17    19%
colossalai/legacy/amp/naive_amp/naive_amp.py                                                                 86     58    33%
colossalai/legacy/amp/torch_amp/__init__.py                                                                  15      7    53%
colossalai/legacy/amp/torch_amp/_grad_scaler.py                                                             234    187    20%
colossalai/legacy/amp/torch_amp/torch_amp.py                                                                 35     14    60%
colossalai/legacy/builder/__init__.py                                                                         2      0   100%
colossalai/legacy/builder/builder.py                                                                         21     16    24%
colossalai/legacy/communication/__init__.py                                                                   5      0   100%
colossalai/legacy/communication/collective.py                                                                92     78    15%
colossalai/legacy/communication/p2p.py                                                                      131    107    18%
colossalai/legacy/communication/p2p_v2.py                                                                   113    113     0%
colossalai/legacy/communication/ring.py                                                                      20     15    25%
colossalai/legacy/communication/utils.py                                                                     64     51    20%
colossalai/legacy/constants.py                                                                               11      0   100%
colossalai/legacy/context/parallel_context.py                                                               226    153    32%
colossalai/legacy/context/parallel_mode.py                                                                   21      0   100%
colossalai/legacy/context/process_group_initializer/__init__.py                                              11      0   100%
colossalai/legacy/context/process_group_initializer/initializer_1d.py                                        29     20    31%
colossalai/legacy/context/process_group_initializer/initializer_2d.py                                        72     55    24%
colossalai/legacy/context/process_group_initializer/initializer_2p5d.py                                     136    112    18%
colossalai/legacy/context/process_group_initializer/initializer_3d.py                                       155    129    17%
colossalai/legacy/context/process_group_initializer/initializer_data.py                                      27     19    30%
colossalai/legacy/context/process_group_initializer/initializer_model.py                                     28     20    29%
colossalai/legacy/context/process_group_initializer/initializer_pipeline.py                                  26     18    31%
colossalai/legacy/context/process_group_initializer/initializer_sequence.py                                  42     29    31%
colossalai/legacy/context/process_group_initializer/initializer_tensor.py                                    27     19    30%
colossalai/legacy/context/process_group_initializer/process_group_initializer.py                             14      8    43%
colossalai/legacy/context/random/__init__.py                                                                  2      0   100%
colossalai/legacy/context/random/_helper.py                                                                  53     34    36%
colossalai/legacy/context/random/seed_manager.py                                                             40     22    45%
colossalai/legacy/core.py                                                                                     2      0   100%
colossalai/legacy/engine/__init__.py                                                                          3      0   100%
colossalai/legacy/engine/_base_engine.py                                                                     90     56    38%
colossalai/legacy/engine/gradient_accumulation/__init__.py                                                   15      7    53%
colossalai/legacy/engine/gradient_accumulation/_gradient_accumulation.py                                    107     70    35%
colossalai/legacy/engine/gradient_handler/__init__.py                                                         7      0   100%
colossalai/legacy/engine/gradient_handler/_base_gradient_handler.py                                           7      2    71%
colossalai/legacy/engine/gradient_handler/_data_parallel_gradient_handler.py                                 10      2    80%
colossalai/legacy/engine/gradient_handler/_moe_gradient_handler.py                                           20      9    55%
colossalai/legacy/engine/gradient_handler/_pipeline_parallel_gradient_handler.py                             24     14    42%
colossalai/legacy/engine/gradient_handler/_sequence_parallel_gradient_handler.py                             10      2    80%
colossalai/legacy/engine/gradient_handler/_zero_gradient_handler.py                                           6      1    83%
colossalai/legacy/engine/schedule/__init__.py                                                                 4      0   100%
colossalai/legacy/engine/schedule/_base_schedule.py                                                          74     55    26%
colossalai/legacy/engine/schedule/_non_pipeline_schedule.py                                                  30     22    27%
colossalai/legacy/engine/schedule/_pipeline_schedule.py                                                     430    397     8%
colossalai/legacy/engine/schedule/_pipeline_schedule_v2.py                                                   78     78     0%
colossalai/legacy/global_variables.py                                                                        25      1    96%
colossalai/legacy/initialize.py                                                                             190    155    18%
colossalai/legacy/nn/_ops/_utils.py                                                                         156    156     0%
colossalai/legacy/nn/layer/base_layer.py                                                                     36     22    39%
colossalai/legacy/nn/layer/colossalai_layer/__init__.py                                                       6      0   100%
colossalai/legacy/nn/layer/colossalai_layer/_utils.py                                                        30     18    40%
colossalai/legacy/nn/layer/colossalai_layer/dropout.py                                                       17      9    47%
colossalai/legacy/nn/layer/colossalai_layer/embedding.py                                                     30     11    63%
colossalai/legacy/nn/layer/colossalai_layer/linear.py                                                        31     12    61%
colossalai/legacy/nn/layer/parallel_1d/__init__.py                                                            2      0   100%
colossalai/legacy/nn/layer/parallel_1d/_operation.py                                                         53     35    34%
colossalai/legacy/nn/layer/parallel_1d/_utils.py                                                             96     52    46%
colossalai/legacy/nn/layer/parallel_1d/layers.py                                                            474    378    20%
colossalai/legacy/nn/layer/parallel_2d/__init__.py                                                            3      0   100%
colossalai/legacy/nn/layer/parallel_2d/_operation.py                                                        395    309    22%
colossalai/legacy/nn/layer/parallel_2d/_utils.py                                                             12      7    42%
colossalai/legacy/nn/layer/parallel_2d/layers.py                                                            484    407    16%
colossalai/legacy/nn/layer/parallel_2p5d/__init__.py                                                          3      0   100%
colossalai/legacy/nn/layer/parallel_2p5d/_operation.py                                                      431    336    22%
colossalai/legacy/nn/layer/parallel_2p5d/_utils.py                                                           14      9    36%
colossalai/legacy/nn/layer/parallel_2p5d/layers.py                                                          477    401    16%
colossalai/legacy/nn/layer/parallel_3d/__init__.py                                                            3      0   100%
colossalai/legacy/nn/layer/parallel_3d/_operation.py                                                        237    164    31%
colossalai/legacy/nn/layer/parallel_3d/_utils.py                                                             63     39    38%
colossalai/legacy/nn/layer/parallel_3d/layers.py                                                            513    431    16%
colossalai/legacy/nn/layer/parallel_sequence/__init__.py                                                      3      0   100%
colossalai/legacy/nn/layer/parallel_sequence/_operation.py                                                   82     60    27%
colossalai/legacy/nn/layer/parallel_sequence/layers.py                                                       91     70    23%
colossalai/legacy/nn/layer/utils/__init__.py                                                                  2      0   100%
colossalai/legacy/nn/layer/utils/common.py                                                                   48     18    62%
colossalai/legacy/nn/layer/vanilla/__init__.py                                                                2      0   100%
colossalai/legacy/nn/layer/vanilla/layers.py                                                                147    103    30%
colossalai/legacy/nn/layer/wrapper/__init__.py                                                                2      0   100%
colossalai/legacy/nn/layer/wrapper/pipeline_wrapper.py                                                       38     28    26%
colossalai/legacy/nn/loss/__init__.py                                                                        23      9    61%
colossalai/legacy/nn/loss/loss_1d.py                                                                         59     41    31%
colossalai/legacy/nn/loss/loss_2d.py                                                                         74     48    35%
colossalai/legacy/nn/loss/loss_2p5d.py                                                                       74     48    35%
colossalai/legacy/nn/loss/loss_3d.py                                                                         81     55    32%
colossalai/legacy/nn/metric/__init__.py                                                                      16      6    62%
colossalai/legacy/nn/metric/accuracy_2d.py                                                                   13      6    54%
colossalai/legacy/nn/metric/accuracy_2p5d.py                                                                 13      6    54%
colossalai/legacy/nn/metric/accuracy_3d.py                                                                   18      9    50%
colossalai/legacy/nn/parallel/__init__.py                                                                     2      2     0%
colossalai/legacy/nn/parallel/data_parallel.py                                                               99     99     0%
colossalai/legacy/nn/parallel/layers/__init__.py                                                              6      6     0%
colossalai/legacy/nn/parallel/layers/cache_embedding/__init__.py                                              8      8     0%
colossalai/legacy/nn/parallel/layers/cache_embedding/base_embedding.py                                       20     20     0%
colossalai/legacy/nn/parallel/layers/cache_embedding/cache_mgr.py                                           294    294     0%
colossalai/legacy/nn/parallel/layers/cache_embedding/cached_embedding.py                                     64     64     0%
colossalai/legacy/nn/parallel/layers/cache_embedding/copyer.py                                               25     25     0%
colossalai/legacy/nn/parallel/layers/cache_embedding/embedding_config.py                                     10     10     0%
colossalai/legacy/nn/parallel/layers/cache_embedding/parallel_cached_embedding.py                            56     56     0%
colossalai/legacy/nn/parallel/layers/cache_embedding/parallel_cached_embedding_tablewise.py                  96     96     0%
colossalai/legacy/nn/parallel/layers/cache_embedding/parallel_cached_embedding_tablewise_split_cache.py      73     73     0%
colossalai/legacy/nn/parallel/layers/colo_module.py                                                          30     30     0%
colossalai/legacy/nn/parallel/layers/embedding.py                                                            15     15     0%
colossalai/legacy/nn/parallel/layers/linear.py                                                               15     15     0%
colossalai/legacy/nn/parallel/layers/module_utils.py                                                         83     83     0%
colossalai/legacy/nn/parallel/reducer.py                                                                     77     77     0%
colossalai/legacy/pipeline/__init__.py                                                                        3      3     0%
colossalai/legacy/pipeline/layer_spec.py                                                                     39     39     0%
colossalai/legacy/pipeline/middleware/__init__.py                                                             2      2     0%
colossalai/legacy/pipeline/middleware/adaptor/__init__.py                                                     2      2     0%
colossalai/legacy/pipeline/middleware/adaptor/fx.py                                                         109    109     0%
colossalai/legacy/pipeline/middleware/topo.py                                                               144    144     0%
colossalai/legacy/pipeline/pipelinable.py                                                                   161    161     0%
colossalai/legacy/pipeline/pipeline_process_group.py                                                        113    113     0%
colossalai/legacy/pipeline/rpc/__init__.py                                                                    3      3     0%
colossalai/legacy/pipeline/rpc/_pipeline_base.py                                                            895    895     0%
colossalai/legacy/pipeline/rpc/_pipeline_schedule.py                                                        203    203     0%
colossalai/legacy/pipeline/rpc/utils.py                                                                     100    100     0%
colossalai/legacy/pipeline/utils.py                                                                         183    183     0%
colossalai/legacy/registry/registry.py                                                                       31     15    52%
colossalai/legacy/tensor/__init__.py                                                                          7      0   100%
colossalai/legacy/tensor/compute_spec.py                                                                     15      5    67%
colossalai/legacy/tensor/const.py                                                                             4      4     0%
colossalai/legacy/tensor/dist_spec_mgr.py                                                                   118     83    30%
colossalai/legacy/tensor/distspec.py                                                                         31     17    45%
colossalai/legacy/tensor/process_group.py                                                                   129     96    26%
colossalai/legacy/tensor/tensor_spec.py                                                                      10      0   100%
colossalai/legacy/trainer/__init__.py                                                                         2      2     0%
colossalai/legacy/trainer/_trainer.py                                                                       171    171     0%
colossalai/legacy/trainer/hooks/__init__.py                                                                   6      6     0%
colossalai/legacy/trainer/hooks/_base_hook.py                                                                21     21     0%
colossalai/legacy/trainer/hooks/_checkpoint_hook.py                                                          31     31     0%
colossalai/legacy/trainer/hooks/_commons_.py                                                                  7      7     0%
colossalai/legacy/trainer/hooks/_log_hook.py                                                                146    146     0%
colossalai/legacy/trainer/hooks/_lr_scheduler_hook.py                                                        21     21     0%
colossalai/legacy/trainer/hooks/_metric_hook.py                                                             223    223     0%
colossalai/legacy/utils/__init__.py                                                                           5      0   100%
colossalai/legacy/utils/activation_checkpoint.py                                                            151    151     0%
colossalai/legacy/utils/checkpoint/__init__.py                                                                2      0   100%
colossalai/legacy/utils/checkpoint/module_checkpoint.py                                                      80     72    10%
colossalai/legacy/utils/checkpoint/utils.py                                                                  40     32    20%
colossalai/legacy/utils/checkpointing.py                                                                    141    120    15%
colossalai/legacy/utils/common.py                                                                           267    222    17%
colossalai/legacy/utils/data_sampler/__init__.py                                                              3      0   100%
colossalai/legacy/utils/data_sampler/base_sampler.py                                                         11      4    64%
colossalai/legacy/utils/data_sampler/data_parallel_sampler.py                                                56     41    27%
colossalai/legacy/utils/memory.py                                                                            85     62    27%
colossalai/legacy/utils/profiler/extention.py                                                                14     14     0%
colossalai/legacy/utils/profiler/legacy/__init__.py                                                           5      5     0%
colossalai/legacy/utils/profiler/legacy/comm_profiler.py                                                    204    204     0%
colossalai/legacy/utils/profiler/legacy/pcie_profiler.py                                                    102    102     0%
colossalai/legacy/utils/profiler/legacy/prof_utils.py                                                        77     77     0%
colossalai/legacy/utils/profiler/profiler.py                                                                 62     62     0%
colossalai/legacy/utils/profiler/stateful_tensor_mem_extention.py                                            92     92     0%
colossalai/legacy/zero/__init__.py                                                                           20     10    50%
colossalai/legacy/zero/gemini/__init__.py                                                                     5      0   100%
colossalai/legacy/zero/gemini/gemini_context.py                                                              29      6    79%
colossalai/legacy/zero/gemini/ophooks/_shard_grad_ophook.py                                                  19     19     0%
colossalai/legacy/zero/gemini/ophooks/_shard_param_ophook.py                                                 33     33     0%
colossalai/legacy/zero/gemini/ophooks/runtime_mem_tracer_hook.py                                             94     60    36%
colossalai/legacy/zero/gemini/ophooks/utils.py                                                               90     63    30%
colossalai/legacy/zero/gemini/paramhooks/_param_hookmgr.py                                                   18     11    39%
colossalai/legacy/zero/gemini/stateful_tensor.py                                                            123     83    33%
colossalai/legacy/zero/gemini/stateful_tensor_mgr.py                                                         67     48    28%
colossalai/legacy/zero/gemini/tensor_placement_policy.py                                                     82     53    35%
colossalai/legacy/zero/gemini/tensor_utils.py                                                                54     43    20%
colossalai/legacy/zero/init_ctx/__init__.py                                                                   2      0   100%
colossalai/legacy/zero/init_ctx/init_context.py                                                             147    105    29%
colossalai/legacy/zero/shard_utils/__init__.py                                                                4      0   100%
colossalai/legacy/zero/shard_utils/base_shard_strategy.py                                                    13      3    77%
colossalai/legacy/zero/shard_utils/bucket_tensor_shard_strategy.py                                           32     23    28%
colossalai/legacy/zero/shard_utils/tensor_shard_strategy.py                                                  38     25    34%
colossalai/legacy/zero/sharded_model/__init__.py                                                              2      0   100%
colossalai/legacy/zero/sharded_model/_utils.py                                                               57     43    25%
colossalai/legacy/zero/sharded_model/reduce_scatter.py                                                       94     68    28%
colossalai/legacy/zero/sharded_model/sharded_model_v2.py                                                    298    239    20%
colossalai/legacy/zero/sharded_model/utils.py                                                                12     12     0%
colossalai/legacy/zero/sharded_model/zero_hook.py                                                            73     50    32%
colossalai/legacy/zero/sharded_optim/__init__.py                                                              2      0   100%
colossalai/legacy/zero/sharded_optim/sharded_optim_v2.py                                                    214    170    21%
colossalai/legacy/zero/sharded_param/__init__.py                                                              3      0   100%
colossalai/legacy/zero/sharded_param/sharded_param.py                                                        68     45    34%
colossalai/legacy/zero/sharded_param/sharded_tensor.py                                                       26     12    54%
colossalai/logging/__init__.py                                                                               13      1    92%
colossalai/logging/logger.py                                                                                 82     27    67%
colossalai/nn/init.py                                                                                        87     64    26%
colossalai/nn/layer/moe/__init__.py                                                                           6      6     0%
colossalai/nn/layer/moe/_operation.py                                                                       118    118     0%
colossalai/nn/layer/moe/checkpoint.py                                                                        30     30     0%
colossalai/nn/layer/moe/experts.py                                                                          127    127     0%
colossalai/nn/layer/moe/layers.py                                                                           103    103     0%
colossalai/nn/layer/moe/routers.py                                                                          125    125     0%
colossalai/nn/layer/moe/utils.py                                                                             31     31     0%
colossalai/nn/layer/utils.py                                                                                  4      0   100%
colossalai/nn/lr_scheduler/__init__.py                                                                        7      0   100%
colossalai/nn/lr_scheduler/cosine.py                                                                         25     15    40%
colossalai/nn/lr_scheduler/delayed.py                                                                       109     89    18%
colossalai/nn/lr_scheduler/linear.py                                                                         10      6    40%
colossalai/nn/lr_scheduler/multistep.py                                                                      13      6    54%
colossalai/nn/lr_scheduler/onecycle.py                                                                        5      2    60%
colossalai/nn/lr_scheduler/poly.py                                                                           18     10    44%
colossalai/nn/optimizer/__init__.py                                                                           8      0   100%
colossalai/nn/optimizer/cpu_adam.py                                                                          66      4    94%
colossalai/nn/optimizer/fused_adam.py                                                                        53     10    81%
colossalai/nn/optimizer/fused_lamb.py                                                                        80     74     8%
colossalai/nn/optimizer/fused_sgd.py                                                                         58     50    14%
colossalai/nn/optimizer/hybrid_adam.py                                                                       60      4    93%
colossalai/nn/optimizer/lamb.py                                                                              52     47    10%
colossalai/nn/optimizer/lars.py                                                                              53     46    13%
colossalai/nn/optimizer/nvme_optimizer.py                                                                   102      8    92%
colossalai/pipeline/__init__.py                                                                               4      0   100%
colossalai/pipeline/p2p.py                                                                                   96      7    93%
colossalai/pipeline/schedule/__init__.py                                                                      4      0   100%
colossalai/pipeline/schedule/_utils.py                                                                       75      7    91%
colossalai/pipeline/schedule/base.py                                                                         10      1    90%
colossalai/pipeline/schedule/interleaved_pp.py                                                              172     11    94%
colossalai/pipeline/schedule/one_f_one_b.py                                                                 141      7    95%
colossalai/pipeline/stage_manager.py                                                                         49      0   100%
colossalai/shardformer/_utils.py                                                                             54     15    72%
colossalai/shardformer/layer/__init__.py                                                                      8      0   100%
colossalai/shardformer/layer/_operation.py                                                                  297    142    52%
colossalai/shardformer/layer/dropout.py                                                                      35      0   100%
colossalai/shardformer/layer/embedding.py                                                                   130     23    82%
colossalai/shardformer/layer/linear.py                                                                      190     53    72%
colossalai/shardformer/layer/loss.py                                                                         49      8    84%
colossalai/shardformer/layer/normalization.py                                                                50     10    80%
colossalai/shardformer/layer/parallel_module.py                                                              71     19    73%
colossalai/shardformer/layer/qkv_fused_linear.py                                                            300     75    75%
colossalai/shardformer/layer/utils.py                                                                        83     17    80%
colossalai/shardformer/modeling/bert.py                                                                     479    147    69%
colossalai/shardformer/modeling/blip2.py                                                                     52      1    98%
colossalai/shardformer/modeling/bloom.py                                                                    445    141    68%
colossalai/shardformer/modeling/chatglm2.py                                                                 181     41    77%
colossalai/shardformer/modeling/chatglm2_6b/configuration_chatglm.py                                         30      0   100%
colossalai/shardformer/modeling/chatglm2_6b/modeling_chatglm.py                                             570    239    58%
colossalai/shardformer/modeling/gpt2.py                                                                     398    121    70%
colossalai/shardformer/modeling/jit.py                                                                       19      3    84%
colossalai/shardformer/modeling/llama.py                                                                    215     68    68%
colossalai/shardformer/modeling/opt.py                                                                      284     90    68%
colossalai/shardformer/modeling/sam.py                                                                       94      6    94%
colossalai/shardformer/modeling/t5.py                                                                       297     74    75%
colossalai/shardformer/modeling/vit.py                                                                      148     28    81%
colossalai/shardformer/modeling/whisper.py                                                                  319    114    64%
colossalai/shardformer/policies/auto_policy.py                                                               33      4    88%
colossalai/shardformer/policies/base_policy.py                                                               81      3    96%
colossalai/shardformer/policies/bert.py                                                                     260      0   100%
colossalai/shardformer/policies/blip2.py                                                                     47      2    96%
colossalai/shardformer/policies/bloom.py                                                                    154      2    99%
colossalai/shardformer/policies/chatglm2.py                                                                 103      6    94%
colossalai/shardformer/policies/gpt2.py                                                                     184      1    99%
colossalai/shardformer/policies/llama.py                                                                    121      3    98%
colossalai/shardformer/policies/opt.py                                                                      144     13    91%
colossalai/shardformer/policies/sam.py                                                                       30      0   100%
colossalai/shardformer/policies/t5.py                                                                       180      5    97%
colossalai/shardformer/policies/vit.py                                                                      112      1    99%
colossalai/shardformer/policies/whisper.py                                                                  202     11    95%
colossalai/shardformer/shard/__init__.py                                                                      4      0   100%
colossalai/shardformer/shard/shard_config.py                                                                 39      3    92%
colossalai/shardformer/shard/sharder.py                                                                      96      3    97%
colossalai/tensor/__init__.py                                                                                 6      0   100%
colossalai/tensor/colo_parameter.py                                                                          59     10    83%
colossalai/tensor/colo_tensor.py                                                                             52      9    83%
colossalai/tensor/comm_spec.py                                                                              253     97    62%
colossalai/tensor/d_tensor/__init__.py                                                                        4      0   100%
colossalai/tensor/d_tensor/api.py                                                                           149     24    84%
colossalai/tensor/d_tensor/comm_spec.py                                                                     151     35    77%
colossalai/tensor/d_tensor/layout.py                                                                         38      2    95%
colossalai/tensor/d_tensor/layout_converter.py                                                              194     18    91%
colossalai/tensor/d_tensor/sharding_spec.py                                                                 104     14    87%
colossalai/tensor/d_tensor/utils.py                                                                          38      7    82%
colossalai/tensor/param_op_hook.py                                                                           98     12    88%
colossalai/tensor/shape_consistency.py                                                                      293    120    59%
colossalai/tensor/sharding_spec.py                                                                          138     19    86%
colossalai/tensor/utils.py                                                                                   78     40    49%
colossalai/testing/__init__.py                                                                                4      0   100%
colossalai/testing/comparison.py                                                                             77     28    64%
colossalai/testing/pytest_wrapper.py                                                                         12     10    17%
colossalai/testing/random.py                                                                                 15      2    87%
colossalai/testing/utils.py                                                                                  86     13    85%
colossalai/utils/__init__.py                                                                                  6      0   100%
colossalai/utils/common.py                                                                                   44      6    86%
colossalai/utils/cuda.py                                                                                     26     13    50%
colossalai/utils/model/utils.py                                                                              61     49    20%
colossalai/utils/moe.py                                                                                      29     20    31%
colossalai/utils/multi_tensor_apply/multi_tensor_apply.py                                                    16      4    75%
colossalai/utils/rank_recorder/__init__.py                                                                    2      2     0%
colossalai/utils/rank_recorder/rank_recorder.py                                                             134    134     0%
colossalai/utils/tensor_detector/__init__.py                                                                  1      0   100%
colossalai/utils/tensor_detector/tensor_detector.py                                                         118    102    14%
colossalai/utils/timer.py                                                                                    71     46    35%
colossalai/zero/__init__.py                                                                                   4      0   100%
colossalai/zero/gemini/__init__.py                                                                            7      0   100%
colossalai/zero/gemini/chunk/__init__.py                                                                      5      0   100%
colossalai/zero/gemini/chunk/chunk.py                                                                       314     44    86%
colossalai/zero/gemini/chunk/manager.py                                                                     132     16    88%
colossalai/zero/gemini/chunk/search_utils.py                                                                 86      2    98%
colossalai/zero/gemini/chunk/utils.py                                                                        30      4    87%
colossalai/zero/gemini/colo_init_context.py                                                                 101     86    15%
colossalai/zero/gemini/gemini_ddp.py                                                                        399     83    79%
colossalai/zero/gemini/gemini_hook.py                                                                        50      2    96%
colossalai/zero/gemini/gemini_mgr.py                                                                         97      8    92%
colossalai/zero/gemini/gemini_optimizer.py                                                                  391     39    90%
colossalai/zero/gemini/memory_tracer/__init__.py                                                              6      0   100%
colossalai/zero/gemini/memory_tracer/chunk_memstats_collector.py                                             17      2    88%
colossalai/zero/gemini/memory_tracer/memory_monitor.py                                                       72     35    51%
colossalai/zero/gemini/memory_tracer/memory_stats.py                                                         74     35    53%
colossalai/zero/gemini/memory_tracer/memstats_collector.py                                                   52     11    79%
colossalai/zero/gemini/memory_tracer/param_runtime_order.py                                                  25      5    80%
colossalai/zero/gemini/memory_tracer/runtime_mem_tracer.py                                                   64     45    30%
colossalai/zero/gemini/memory_tracer/static_memstats_collector.py                                            72     72     0%
colossalai/zero/gemini/memory_tracer/utils.py                                                                34     34     0%
colossalai/zero/gemini/placement_policy.py                                                                  119     24    80%
colossalai/zero/gemini/utils.py                                                                              58     37    36%
colossalai/zero/low_level/__init__.py                                                                         2      0   100%
colossalai/zero/low_level/_utils.py                                                                         117     66    44%
colossalai/zero/low_level/bookkeeping/__init__.py                                                             5      0   100%
colossalai/zero/low_level/bookkeeping/base_store.py                                                          12      2    83%
colossalai/zero/low_level/bookkeeping/bucket_store.py                                                        59      0   100%
colossalai/zero/low_level/bookkeeping/gradient_store.py                                                      30      0   100%
colossalai/zero/low_level/bookkeeping/parameter_store.py                                                     16      0   100%
colossalai/zero/low_level/bookkeeping/tensor_bucket.py                                                       37     22    41%
colossalai/zero/low_level/low_level_optim.py                                                                340     24    93%
colossalai/zero/wrapper.py                                                                                   36     29    19%
op_builder/__init__.py                                                                                        9      0   100%
op_builder/builder.py                                                                                        89     50    44%
op_builder/cpu_adam.py                                                                                       20      8    60%
op_builder/fused_optim.py                                                                                    20      9    55%
op_builder/layernorm.py                                                                                      20     10    50%
op_builder/moe.py                                                                                            20     10    50%
op_builder/multi_head_attn.py                                                                                20     10    50%
op_builder/scaled_masked_softmax.py                                                                          18      8    56%
op_builder/scaled_upper_triangle_masked_softmax.py                                                           19      9    53%
op_builder/utils.py                                                                                          94     79    16%
setup.py                                                                                                     72     72     0%
tests/components_to_test/__init__.py                                                                          4      0   100%
tests/components_to_test/albert.py                                                                           24      0   100%
tests/components_to_test/beit.py                                                                             23      0   100%
tests/components_to_test/bert.py                                                                             37      1    97%
tests/components_to_test/gpt2.py                                                                             43      2    95%
tests/components_to_test/hanging_param_model.py                                                              31      0   100%
tests/components_to_test/inline_op_model.py                                                                  31     19    39%
tests/components_to_test/nested_model.py                                                                     37      0   100%
tests/components_to_test/registry.py                                                                         25     10    60%
tests/components_to_test/repeated_computed_layers.py                                                         29      0   100%
tests/components_to_test/resnet.py                                                                           20      9    55%
tests/components_to_test/simple_net.py                                                                       37     10    73%
tests/components_to_test/utils/dummy_data_generator.py                                                       17      3    82%
tests/kit/model_zoo/__init__.py                                                                               3      0   100%
tests/kit/model_zoo/diffusers/diffusers.py                                                                   35      0   100%
tests/kit/model_zoo/registry.py                                                                              18      0   100%
tests/kit/model_zoo/timm/timm.py                                                                             39      0   100%
tests/kit/model_zoo/torchaudio/torchaudio.py                                                                 50      0   100%
tests/kit/model_zoo/torchrec/torchrec.py                                                                     65      6    91%
tests/kit/model_zoo/torchvision/torchvision.py                                                               38      0   100%
tests/kit/model_zoo/transformers/albert.py                                                                   37      0   100%
tests/kit/model_zoo/transformers/bert.py                                                                     50      0   100%
tests/kit/model_zoo/transformers/blip2.py                                                                    21      0   100%
tests/kit/model_zoo/transformers/bloom.py                                                                    36      0   100%
tests/kit/model_zoo/transformers/chatglm2.py                                                                 19      0   100%
tests/kit/model_zoo/transformers/gpt.py                                                                      51      0   100%
tests/kit/model_zoo/transformers/llama.py                                                                    28      2    93%
tests/kit/model_zoo/transformers/opt.py                                                                      31      4    87%
tests/kit/model_zoo/transformers/sam.py                                                                      14      0   100%
tests/kit/model_zoo/transformers/t5.py                                                                       25      0   100%
tests/kit/model_zoo/transformers/vit.py                                                                      24      0   100%
tests/kit/model_zoo/transformers/whisper.py                                                                  23      0   100%
tests/test_analyzer/test_fx/test_bias_addition.py                                                            76     76     0%
tests/test_analyzer/test_fx/test_mod_dir.py                                                                  49     49     0%
tests/test_analyzer/test_fx/test_nested_ckpt.py                                                              42     42     0%
tests/test_analyzer/test_fx/test_shape_prop.py                                                               46     46     0%
tests/test_analyzer/test_fx/test_symbolic_profile.py                                                         38     38     0%
tests/test_analyzer/test_subclasses/test_aten.py                                                             32     32     0%
tests/test_analyzer/test_subclasses/test_flop_tensor.py                                                      29     29     0%
tests/test_analyzer/test_subclasses/test_meta_mode.py                                                        31     31     0%
tests/test_auto_parallel/test_pass/test_node_converting_pass.py                                              43     43     0%
tests/test_auto_parallel/test_pass/test_size_value_converting_pass.py                                        56     56     0%
tests/test_auto_parallel/test_tensor_shard/test_bias_addition_forward.py                                     62     62     0%
tests/test_auto_parallel/test_tensor_shard/test_broadcast.py                                                 37     37     0%
tests/test_auto_parallel/test_tensor_shard/test_checkpoint.py                                                49     49     0%
tests/test_auto_parallel/test_tensor_shard/test_compatibility_with_ddp.py                                    75     75     0%
tests/test_auto_parallel/test_tensor_shard/test_compatibility_with_gemini.py                                 77     77     0%
tests/test_auto_parallel/test_tensor_shard/test_find_repeat_block.py                                         77     77     0%
tests/test_auto_parallel/test_tensor_shard/test_gpt/gpt_modules.py                                          144    144     0%
tests/test_auto_parallel/test_tensor_shard/test_gpt/test_runtime_with_gpt_modules.py                        135    135     0%
tests/test_auto_parallel/test_tensor_shard/test_gpt/test_solver_with_gpt_module.py                           66     66     0%
tests/test_auto_parallel/test_tensor_shard/test_liveness_analysis.py                                         43     43     0%
tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_addbmm_handler.py                         158    158     0%
tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_addmm_handler.py                          110    110     0%
tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_batch_norm_handler.py                      64     64     0%
tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_bias_linear_function_node.py               99     99     0%
tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_bias_linear_module_node.py                 96     96     0%
tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_binary_elementwise_handler.py             168    168     0%
tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_bmm_handler.py                            137    137     0%
tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_conv_handler.py                           188    188     0%
tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_default_reshape_handler.py                 53     53     0%
tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_embedding_handler.py                      175    175     0%
tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_getattr_handler.py                         51     51     0%
tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_getitem_handler.py                        112    112     0%
tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_layer_norm_handler.py                      67     67     0%
tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_linear_handler.py                         200    200     0%
tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_matmul_handler.py                          94     94     0%
tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_norm_pooling_handler.py                    41     41     0%
tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_output_handler.py                          44     44     0%
tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_permute_and_transpose_handler.py          245    245     0%
tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_placeholder_handler.py                     50     50     0%
tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_shard_option.py                            72     72     0%
tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_softmax_handler.py                        124    124     0%
tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_split_handler.py                          185    185     0%
tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_sum_handler.py                            175    175     0%
tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_tensor_constructor.py                      44     44     0%
tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_unary_element_wise_handler.py              54     54     0%
tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_view_handler.py                           185    185     0%
tests/test_auto_parallel/test_tensor_shard/test_node_handler/test_where_handler.py                           59     59     0%
tests/test_auto_parallel/test_tensor_shard/test_node_handler/utils.py                                       124    124     0%
tests/test_auto_parallel/test_tensor_shard/test_solver_with_resnet_v2.py                                     57     57     0%
tests/test_booster/test_accelerator.py                                                                       11      0   100%
tests/test_booster/test_mixed_precision/test_fp16_torch.py                                                   30      1    97%
tests/test_booster/test_plugin/test_3d_plugin.py                                                             64      7    89%
tests/test_booster/test_plugin/test_dp_plugin_base.py                                                        49      9    82%
tests/test_booster/test_plugin/test_gemini_plugin.py                                                         72      9    88%
tests/test_booster/test_plugin/test_low_level_zero_plugin.py                                                 59      6    90%
tests/test_booster/test_plugin/test_torch_ddp_plugin.py                                                      78      0   100%
tests/test_booster/test_plugin/test_torch_fsdp_plugin.py                                                     43      0   100%
tests/test_checkpoint_io/test_gemini_checkpoint_io.py                                                        86      0   100%
tests/test_checkpoint_io/test_gemini_torch_compability.py                                                   116      0   100%
tests/test_checkpoint_io/test_general_checkpoint_io.py                                                      104      0   100%
tests/test_checkpoint_io/test_hybrid_parallel_plugin_checkpoint_io.py                                        87      0   100%
tests/test_checkpoint_io/test_low_level_zero_checkpoint_io.py                                                58      1    98%
tests/test_checkpoint_io/test_plugins_huggingface_compatibility.py                                           55      1    98%
tests/test_checkpoint_io/test_torch_ddp_checkpoint_io.py                                                     55      0   100%
tests/test_checkpoint_io/test_torch_fsdp_checkpoint_io.py                                                    81      6    93%
tests/test_checkpoint_io/utils.py                                                                            14      0   100%
tests/test_cluster/test_device_mesh_manager.py                                                               16      1    94%
tests/test_cluster/test_process_group_mesh.py                                                                85     34    60%
tests/test_config/sample_config.py                                                                            1      0   100%
tests/test_config/test_load_config.py                                                                         8      0   100%
tests/test_device/test_alpha_beta.py                                                                         20      8    60%
tests/test_device/test_device_mesh.py                                                                        58     36    38%
tests/test_device/test_extract_alpha_beta.py                                                                 22     10    55%
tests/test_device/test_init_logical_pg.py                                                                    25      1    96%
tests/test_device/test_search_logical_device_mesh.py                                                         22     10    55%
tests/test_infer/test_bloom_infer.py                                                                         38     15    61%
tests/test_infer/test_infer_engine.py                                                                        59     34    42%
tests/test_infer/test_kvcache_manager.py                                                                     44     24    45%
tests/test_infer/test_llama_infer.py                                                                         58     32    45%
tests/test_infer_ops/cuda/test_vllm_rmsnorm.py                                                               38     22    42%
tests/test_infer_ops/cuda/test_vllm_rotary_embedding.py                                                      75     54    28%
tests/test_infer_ops/triton/kernel_utils.py                                                                  20     16    20%
tests/test_infer_ops/triton/test_bloom_context_attention.py                                                  33     22    33%
tests/test_infer_ops/triton/test_copy_kv_dest.py                                                             22     12    45%
tests/test_infer_ops/triton/test_layernorm_triton.py                                                         30     17    43%
tests/test_infer_ops/triton/test_llama_context_attention.py                                                  32     21    34%
tests/test_infer_ops/triton/test_rotary_embedding.py                                                         36     25    31%
tests/test_infer_ops/triton/test_self_attention_nonfusion.py                                                 72     56    22%
tests/test_infer_ops/triton/test_softmax.py                                                                  22     11    50%
tests/test_infer_ops/triton/test_token_attn_1.py                                                             47     34    28%
tests/test_infer_ops/triton/test_token_attn_2.py                                                             37     26    30%
tests/test_infer_ops/triton/test_token_attn_fwd.py                                                           45     34    24%
tests/test_infer_ops/triton/test_token_softmax.py                                                            30     20    33%
tests/test_lazy/lazy_init_utils.py                                                                           72     26    64%
tests/test_lazy/test_models.py                                                                               14      1    93%
tests/test_optimizer/test_adam_kernel.py                                                                     87      1    99%
tests/test_optimizer/test_adam_optim.py                                                                      49      0   100%
tests/test_optimizer/test_nvme.py                                                                            35      1    97%
tests/test_pipeline/test_p2p_communication.py                                                                44      1    98%
tests/test_pipeline/test_pipeline_utils/test_t5_pipeline_utils.py                                            17      0   100%
tests/test_pipeline/test_pipeline_utils/test_whisper_pipeline_utils.py                                       20      2    90%
tests/test_pipeline/test_schedule/test_interleaved.py                                                        99      1    99%
tests/test_pipeline/test_schedule/test_oneF_oneB.py                                                          80      2    98%
tests/test_pipeline/test_schedule/test_pipeline_schedule_utils.py                                            40      0   100%
tests/test_pipeline/test_stage_manager.py                                                                    45      1    98%
tests/test_shardformer/test_layer/test_dist_crossentropy.py                                                  27      1    96%
tests/test_shardformer/test_layer/test_dropout.py                                                            42      1    98%
tests/test_shardformer/test_layer/test_embedding.py                                                          37      1    97%
tests/test_shardformer/test_layer/test_gpt2_qkv_fused_linear_1d.py                                           94      1    99%
tests/test_shardformer/test_layer/test_layernorm.py                                                          35      1    97%
tests/test_shardformer/test_layer/test_linear_1d.py                                                         116      1    99%
tests/test_shardformer/test_layer/test_qkv_fused_linear_1d.py                                                89      1    99%
tests/test_shardformer/test_layer/test_vocab_parallel_embedding_1d.py                                        39      1    97%
tests/test_shardformer/test_model/_utils.py                                                                 182     30    84%
tests/test_shardformer/test_model/test_shard_bert.py                                                         80     12    85%
tests/test_shardformer/test_model/test_shard_blip2.py                                                        40      1    98%
tests/test_shardformer/test_model/test_shard_bloom.py                                                        80     12    85%
tests/test_shardformer/test_model/test_shard_chatglm2.py                                                     79     11    86%
tests/test_shardformer/test_model/test_shard_gpt2.py                                                         81     11    86%
tests/test_shardformer/test_model/test_shard_llama.py                                                        82     12    85%
tests/test_shardformer/test_model/test_shard_opt.py                                                          81     11    86%
tests/test_shardformer/test_model/test_shard_sam.py                                                          39      1    97%
tests/test_shardformer/test_model/test_shard_t5.py                                                           79     11    86%
tests/test_shardformer/test_model/test_shard_vit.py                                                          79     11    86%
tests/test_shardformer/test_model/test_shard_whisper.py                                                      88     14    84%
tests/test_shardformer/test_shard_utils.py                                                                   21      0   100%
tests/test_shardformer/test_with_torch_ddp.py                                                                52      1    98%
tests/test_tensor/test_comm_spec_apply.py                                                                    96      1    99%
tests/test_tensor/test_dtensor/test_comm_spec.py                                                             77      1    99%
tests/test_tensor/test_dtensor/test_dtensor.py                                                               65      5    92%
tests/test_tensor/test_dtensor/test_dtensor_sharding_spec.py                                                 20      1    95%
tests/test_tensor/test_dtensor/test_layout_converter.py                                                      91      1    99%
tests/test_tensor/test_mix_gather.py                                                                        150    127    15%
tests/test_tensor/test_shape_consistency.py                                                                  50      2    96%
tests/test_tensor/test_shape_consistency_apply.py                                                            43      1    98%
tests/test_tensor/test_sharding_spec.py                                                                      13      1    92%
tests/test_utils/test_flash_attention.py                                                                    119      0   100%
tests/test_zero/test_gemini/test_chunk_mgrv2.py                                                              47      1    98%
tests/test_zero/test_gemini/test_chunkv2.py                                                                  87      1    99%
tests/test_zero/test_gemini/test_fwd_bwd.py                                                                  74      1    99%
tests/test_zero/test_gemini/test_gemini_use_rmt.py                                                           69     46    33%
tests/test_zero/test_gemini/test_grad_clip.py                                                                76      2    97%
tests/test_zero/test_gemini/test_inference.py                                                                92      1    99%
tests/test_zero/test_gemini/test_optim.py                                                                   119      1    99%
tests/test_zero/test_gemini/test_runtime_mem_tracer.py                                                       40     28    30%
tests/test_zero/test_gemini/test_search.py                                                                   38      2    95%
tests/test_zero/test_gemini/test_zeroddp_state_dict.py                                                       91      5    95%
tests/test_zero/test_gemini/test_zerooptim_state_dict.py                                                     59      2    97%
tests/test_zero/test_low_level/test_grad_acc.py                                                              86      1    99%
tests/test_zero/test_low_level/test_zero1_2.py                                                              101      1    99%
tests/test_zero/test_low_level/test_zero_ckpt.py                                                             68      5    93%
-----------------------------------------------------------------------------------------------------------------------------
TOTAL                                                                                                     59505  37225    37%

@ver217 ver217 merged commit 079bf3c into hpcaitech:main Sep 19, 2023
8 of 9 checks passed
@ver217 ver217 deleted the misc/pre-commit branch September 19, 2023 06:20
Xu-Kai added a commit that referenced this pull request Sep 29, 2023
* [shardformer] fix GPT2DoubleHeadsModel (#4703)

* [hotfix] Fix import error: colossal.kernel without triton installed (#4722)

* [hotfix] remove triton kernels from kernel init

* revise bloom/llama kernel imports for infer

* [shardformer] to fix whisper test failed due to significant accuracy differences. (#4710)

* [shardformer] fix whisper test failed

* [shardformer] fix whisper test failed

* [shardformer] fix whisper test failed

* [shardformer] fix whisper test failed

* [doc] fix llama2 code link (#4726)

* [doc] fix llama2 code link

* [doc] fix llama2 code link

* [doc] fix llama2 code link

* [doc] Add user document for Shardformer (#4702)

* create shardformer doc files

* add docstring for seq-parallel

* update ShardConfig docstring

* add links to llama example

* add outdated massage

* finish introduction & supporting information

* finish 'how shardformer works'

* finish shardformer.md English doc

* fix doctest fail

* add Chinese document

* [format] applied code formatting on changed files in pull request 4726 (#4727)

Co-authored-by: github-actions <github-actions@github.com>

* [doc] add shardformer support matrix/update tensor parallel documents (#4728)

* add compatibility matrix for shardformer doc

* update tp doc

* Optimized some syntax errors in the documentation and code under applications/ (#4127)

Co-authored-by: flybird11111 <1829166702@qq.com>

* [shardformer] update pipeline parallel document (#4725)

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [legacy] remove deterministic data loader test

* [shardformer] update seq parallel document (#4730)

* update doc of seq parallel

* fix typo

* [example] add gpt2 HybridParallelPlugin example (#4653)

* add gpt2 HybridParallelPlugin example

* update readme and testci

* update test ci

* fix test_ci bug

* update requirements

* add requirements

* update requirements

* add requirement

* rename file

* [doc] polish shardformer doc (#4735)

* arrange position of chapters

* fix typos in seq parallel doc

* [shardformer] add custom policy in hybrid parallel plugin (#4718)

* add custom policy

* update assert

* [example] llama2 add fine-tune example (#4673)

* [shardformer] update shardformer readme

[shardformer] update shardformer readme

[shardformer] update shardformer readme

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] change dataset

* [shardformer] change dataset

* [shardformer] fix CI

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

[example] update opt example

[example] resolve comments

fix

fix

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* fix

* update llama2 example

* update llama2 example

* fix

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* Update requirements.txt

* update llama2 example

* update llama2 example

* update llama2 example

* [doc] explaination of loading large pretrained models (#4741)

* [kernel] update triton init #4740 (#4740)

* [legacy] clean up legacy code (#4743)

* [legacy] remove outdated codes of pipeline (#4692)

* [legacy] remove cli of benchmark and update optim (#4690)

* [legacy] remove cli of benchmark and update optim

* [doc] fix cli doc test

* [legacy] fix engine clip grad norm

* [legacy] remove outdated colo tensor (#4694)

* [legacy] remove outdated colo tensor

* [test] fix test import

* [legacy] move outdated zero to legacy (#4696)

* [legacy] clean up utils (#4700)

* [legacy] clean up utils

* [example] update examples

* [legacy] clean up amp

* [legacy] fix amp module

* [legacy] clean up gpc (#4742)

* [legacy] clean up context

* [legacy] clean core, constants and global vars

* [legacy] refactor initialize

* [example] fix examples ci

* [example] fix examples ci

* [legacy] fix tests

* [example] fix gpt example

* [example] fix examples ci

* [devops] fix ci installation

* [example] fix examples ci

* [format] applied code formatting on changed files in pull request 4743 (#4750)

Co-authored-by: github-actions <github-actions@github.com>

* [misc] update pre-commit and run all files (#4752)

* [misc] update pre-commit

* [misc] run pre-commit

* [misc] remove useless configuration files

* [misc] ignore cuda for clang-format

* [doc] explain suitable use case for each plugin

* [doc] put individual plugin explanation in front

* [doc] add model examples for each plugin

* [doc] put native colossalai plugins first in description section

* [chat]: update rm, add wandb and fix bugs (#4471)

* feat: modify forward fn of critic and reward model

* feat: modify calc_action_log_probs

* to: add wandb in sft and rm trainer

* feat: update train_sft

* feat: update train_rm

* style: modify type annotation and add warning

* feat: pass tokenizer to ppo trainer

* to: modify trainer base and maker base

* feat: add wandb in ppo trainer

* feat: pass tokenizer to generate

* test: update generate fn tests

* test: update train tests

* fix: remove action_mask

* feat: remove unused code

* fix: fix wrong ignore_index

* fix: fix mock tokenizer

* chore: update requirements

* revert: modify make_experience

* fix: fix inference

* fix: add padding side

* style: modify _on_learn_batch_end

* test: use mock tokenizer

* fix: use bf16 to avoid overflow

* fix: fix workflow

* [chat] fix gemini strategy

* [chat] fix

* sync: update colossalai strategy

* fix: fix args and model dtype

* fix: fix checkpoint test

* fix: fix requirements

* fix: fix missing import and wrong arg

* fix: temporarily skip gemini test in stage 3

* style: apply pre-commit

* fix: temporarily skip gemini test in stage 1&2

---------

Co-authored-by: Mingyan Jiang <1829166702@qq.com>

* [shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic (#4758)

* fix master param sync for hybrid plugin

* rewrite unwrap for ddp/fsdp

* rewrite unwrap for zero/gemini

* rewrite unwrap for hybrid plugin

* fix geemini unwrap

* fix bugs

* [bug] fix get_default_parser in examples (#4764)

* [doc] clean up outdated docs (#4765)

* [doc] clean up outdated docs

* [doc] fix linking

* [doc] fix linking

* [doc] add shardformer doc to sidebar (#4768)

* [chat]: add lora merge weights config (#4766)

* feat: modify lora merge weights fn

* feat: add lora merge weights config

* [lazy] support torch 2.0 (#4763)

* [lazy] support _like methods and clamp

* [lazy] pass transformers models

* [lazy] fix device move and requires grad

* [lazy] fix requires grad and refactor api

* [lazy] fix requires grad

* [bug] Fix the version check bug in colossalai run when generating the cmd. (#4713)

* Fix the version check bug in colossalai run when generating the cmd.

* polish code

* [feature] add gptq for inference (#4754)

* [gptq] add gptq kernel (#4416)

* add gptq

* refactor code

* fix tests

* replace auto-gptq

* rname inferance/quant

* refactor test

* add auto-gptq as an option

* reset requirements

* change assert and check auto-gptq

* add import warnings

* change test flash attn version

* remove example

* change requirements of flash_attn

* modify tests

* [skip ci] change requirements-test

* [gptq] faster gptq cuda kernel (#4494)

* [skip ci] add cuda kernels

* add license

* [skip ci] fix max_input_len

* format files & change test size

* [skip ci]

* [gptq] add gptq tensor parallel (#4538)

* add gptq tensor parallel

* add gptq tp

* delete print

* add test gptq check

* add test auto gptq check

* [gptq] combine gptq and kv cache manager (#4706)

* combine gptq and kv cache manager

* add init bits

* delete useless code

* add model path

* delete usless print and update test

* delete usless import

* move option gptq to shard config

* change replace linear to shardformer

* update bloom policy

* delete useless code

* fix import bug and delete uselss code

* change colossalai/gptq to colossalai/quant/gptq

* update import linear for tests

* delete useless code and mv gptq_kernel to kernel directory

* fix triton kernel

* add triton import

* [inference] chatglm2 infer demo (#4724)

* add chatglm2

* add

* gather needed kernels

* fix some bugs

* finish context forward

* finish context stage

* fix

* add

* pause

* add

* fix bugs

* finish chatglm

* fix bug

* change some logic

* fix bugs

* change some logics

* add

* add

* add

* fix

* fix tests

* fix

* [release] update version (#4775)

* [release] update version

* [doc] revert versions

* initial commit: add colossal llama 2 (#4784)

* [feature] ColossalEval: Evaluation Pipeline for LLMs (#4786)

* Add ColossalEval

* Delete evaluate in Chat

---------

Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
Co-authored-by: Tong Li <tong.li352711588@gmail.com>

* [doc] add llama2 domain-specific solution news (#4789)

* [doc] add llama2 domain-specific solution news

* [fix] fix weekly runing example (#4787)

* [fix] fix weekly runing example

* [fix] fix weekly runing example

* [doc] polish shardformer doc (#4779)

* fix example format in docstring

* polish shardformer doc

* [checkpointio] support unsharded checkpointIO for hybrid parallel (#4774)

* support unsharded saving/loading for model

* support optimizer unsharded saving

* update doc

* support unsharded loading for optimizer

* small fix

* update readme

* [lazy] support from_pretrained (#4801)

* [lazy] patch from pretrained

* [lazy] fix from pretrained and add tests

* [devops] update ci

* update

* [hotfix] change llama2 Colossal-LLaMA-2 script filename (#4800)

change filename:
pretraining.py -> trainin.py
there is no file named pretraing.py. wrong writing

* [misc] add last_epoch in CosineAnnealingWarmupLR (#4778)

* [doc] add lazy init docs (#4808)

* [hotfix] fix norm type error in zero optimizer (#4795)

* [hotfix] Correct several erroneous code comments (#4794)

* [format] applied code formatting on changed files in pull request 4595 (#4602)

Co-authored-by: github-actions <github-actions@github.com>

* fix format (#4815)

* [chat] fix gemini strategy (#4698)

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* g# This is a combination of 2 commits.

[chat] fix gemini strategy

fox

* [chat] fix gemini strategy

update llama2 example

[chat] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* fix

* fix

* fix

* fix

* fix

* Update train_prompts.py

* Update Qwen-7B results (#4821)

Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>

* [doc] update slack link (#4823)

* add autotune (#4822)

* update Colossal (#4832)

---------

Co-authored-by: flybird11111 <1829166702@qq.com>
Co-authored-by: Yuanheng Zhao <54058983+yuanheng-zhao@users.noreply.github.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
Co-authored-by: Baizhou Zhang <eddiezhang@pku.edu.cn>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions <github-actions@github.com>
Co-authored-by: digger yu <digger-yu@outlook.com>
Co-authored-by: Pengtai Xu <henryxu880@gmail.com>
Co-authored-by: Bin Jia <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: ppt0011 <143150326+ppt0011@users.noreply.github.com>
Co-authored-by: Xuanlei Zhao <43881818+oahzxl@users.noreply.github.com>
Co-authored-by: Hongxin Liu <lhx0217@gmail.com>
Co-authored-by: Wenhao Chen <cwher@outlook.com>
Co-authored-by: littsk <1214689160@qq.com>
Co-authored-by: Jianghai <72591262+CjhHa1@users.noreply.github.com>
Co-authored-by: Tong Li <tong.li352711588@gmail.com>
Co-authored-by: Yuanchen <70520919+chengeharrison@users.noreply.github.com>
Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
Co-authored-by: Desperado-Jia <502205863@qq.com>
Co-authored-by: Chandler-Bing <brp12138@163.com>
Co-authored-by: Yan haixu <40758050+hova88@users.noreply.github.com>
Xu-Kai added a commit that referenced this pull request Sep 29, 2023
* [shardformer] fix GPT2DoubleHeadsModel (#4703)

* [hotfix] Fix import error: colossal.kernel without triton installed (#4722)

* [hotfix] remove triton kernels from kernel init

* revise bloom/llama kernel imports for infer

* [shardformer] to fix whisper test failed due to significant accuracy differences. (#4710)

* [shardformer] fix whisper test failed

* [shardformer] fix whisper test failed

* [shardformer] fix whisper test failed

* [shardformer] fix whisper test failed

* [doc] fix llama2 code link (#4726)

* [doc] fix llama2 code link

* [doc] fix llama2 code link

* [doc] fix llama2 code link

* [doc] Add user document for Shardformer (#4702)

* create shardformer doc files

* add docstring for seq-parallel

* update ShardConfig docstring

* add links to llama example

* add outdated massage

* finish introduction & supporting information

* finish 'how shardformer works'

* finish shardformer.md English doc

* fix doctest fail

* add Chinese document

* [format] applied code formatting on changed files in pull request 4726 (#4727)

Co-authored-by: github-actions <github-actions@github.com>

* [doc] add shardformer support matrix/update tensor parallel documents (#4728)

* add compatibility matrix for shardformer doc

* update tp doc

* Optimized some syntax errors in the documentation and code under applications/ (#4127)

Co-authored-by: flybird11111 <1829166702@qq.com>

* [shardformer] update pipeline parallel document (#4725)

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [legacy] remove deterministic data loader test

* [shardformer] update seq parallel document (#4730)

* update doc of seq parallel

* fix typo

* [example] add gpt2 HybridParallelPlugin example (#4653)

* add gpt2 HybridParallelPlugin example

* update readme and testci

* update test ci

* fix test_ci bug

* update requirements

* add requirements

* update requirements

* add requirement

* rename file

* [doc] polish shardformer doc (#4735)

* arrange position of chapters

* fix typos in seq parallel doc

* [shardformer] add custom policy in hybrid parallel plugin (#4718)

* add custom policy

* update assert

* [example] llama2 add fine-tune example (#4673)

* [shardformer] update shardformer readme

[shardformer] update shardformer readme

[shardformer] update shardformer readme

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] change dataset

* [shardformer] change dataset

* [shardformer] fix CI

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

[example] update opt example

[example] resolve comments

fix

fix

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* fix

* update llama2 example

* update llama2 example

* fix

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* Update requirements.txt

* update llama2 example

* update llama2 example

* update llama2 example

* [doc] explaination of loading large pretrained models (#4741)

* [kernel] update triton init #4740 (#4740)

* [legacy] clean up legacy code (#4743)

* [legacy] remove outdated codes of pipeline (#4692)

* [legacy] remove cli of benchmark and update optim (#4690)

* [legacy] remove cli of benchmark and update optim

* [doc] fix cli doc test

* [legacy] fix engine clip grad norm

* [legacy] remove outdated colo tensor (#4694)

* [legacy] remove outdated colo tensor

* [test] fix test import

* [legacy] move outdated zero to legacy (#4696)

* [legacy] clean up utils (#4700)

* [legacy] clean up utils

* [example] update examples

* [legacy] clean up amp

* [legacy] fix amp module

* [legacy] clean up gpc (#4742)

* [legacy] clean up context

* [legacy] clean core, constants and global vars

* [legacy] refactor initialize

* [example] fix examples ci

* [example] fix examples ci

* [legacy] fix tests

* [example] fix gpt example

* [example] fix examples ci

* [devops] fix ci installation

* [example] fix examples ci

* [format] applied code formatting on changed files in pull request 4743 (#4750)

Co-authored-by: github-actions <github-actions@github.com>

* [misc] update pre-commit and run all files (#4752)

* [misc] update pre-commit

* [misc] run pre-commit

* [misc] remove useless configuration files

* [misc] ignore cuda for clang-format

* [doc] explain suitable use case for each plugin

* [doc] put individual plugin explanation in front

* [doc] add model examples for each plugin

* [doc] put native colossalai plugins first in description section

* [chat]: update rm, add wandb and fix bugs (#4471)

* feat: modify forward fn of critic and reward model

* feat: modify calc_action_log_probs

* to: add wandb in sft and rm trainer

* feat: update train_sft

* feat: update train_rm

* style: modify type annotation and add warning

* feat: pass tokenizer to ppo trainer

* to: modify trainer base and maker base

* feat: add wandb in ppo trainer

* feat: pass tokenizer to generate

* test: update generate fn tests

* test: update train tests

* fix: remove action_mask

* feat: remove unused code

* fix: fix wrong ignore_index

* fix: fix mock tokenizer

* chore: update requirements

* revert: modify make_experience

* fix: fix inference

* fix: add padding side

* style: modify _on_learn_batch_end

* test: use mock tokenizer

* fix: use bf16 to avoid overflow

* fix: fix workflow

* [chat] fix gemini strategy

* [chat] fix

* sync: update colossalai strategy

* fix: fix args and model dtype

* fix: fix checkpoint test

* fix: fix requirements

* fix: fix missing import and wrong arg

* fix: temporarily skip gemini test in stage 3

* style: apply pre-commit

* fix: temporarily skip gemini test in stage 1&2

---------

Co-authored-by: Mingyan Jiang <1829166702@qq.com>

* [shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic (#4758)

* fix master param sync for hybrid plugin

* rewrite unwrap for ddp/fsdp

* rewrite unwrap for zero/gemini

* rewrite unwrap for hybrid plugin

* fix geemini unwrap

* fix bugs

* [bug] fix get_default_parser in examples (#4764)

* [doc] clean up outdated docs (#4765)

* [doc] clean up outdated docs

* [doc] fix linking

* [doc] fix linking

* [doc] add shardformer doc to sidebar (#4768)

* [chat]: add lora merge weights config (#4766)

* feat: modify lora merge weights fn

* feat: add lora merge weights config

* [lazy] support torch 2.0 (#4763)

* [lazy] support _like methods and clamp

* [lazy] pass transformers models

* [lazy] fix device move and requires grad

* [lazy] fix requires grad and refactor api

* [lazy] fix requires grad

* [bug] Fix the version check bug in colossalai run when generating the cmd. (#4713)

* Fix the version check bug in colossalai run when generating the cmd.

* polish code

* [feature] add gptq for inference (#4754)

* [gptq] add gptq kernel (#4416)

* add gptq

* refactor code

* fix tests

* replace auto-gptq

* rname inferance/quant

* refactor test

* add auto-gptq as an option

* reset requirements

* change assert and check auto-gptq

* add import warnings

* change test flash attn version

* remove example

* change requirements of flash_attn

* modify tests

* [skip ci] change requirements-test

* [gptq] faster gptq cuda kernel (#4494)

* [skip ci] add cuda kernels

* add license

* [skip ci] fix max_input_len

* format files & change test size

* [skip ci]

* [gptq] add gptq tensor parallel (#4538)

* add gptq tensor parallel

* add gptq tp

* delete print

* add test gptq check

* add test auto gptq check

* [gptq] combine gptq and kv cache manager (#4706)

* combine gptq and kv cache manager

* add init bits

* delete useless code

* add model path

* delete usless print and update test

* delete usless import

* move option gptq to shard config

* change replace linear to shardformer

* update bloom policy

* delete useless code

* fix import bug and delete uselss code

* change colossalai/gptq to colossalai/quant/gptq

* update import linear for tests

* delete useless code and mv gptq_kernel to kernel directory

* fix triton kernel

* add triton import

* [inference] chatglm2 infer demo (#4724)

* add chatglm2

* add

* gather needed kernels

* fix some bugs

* finish context forward

* finish context stage

* fix

* add

* pause

* add

* fix bugs

* finish chatglm

* fix bug

* change some logic

* fix bugs

* change some logics

* add

* add

* add

* fix

* fix tests

* fix

* [release] update version (#4775)

* [release] update version

* [doc] revert versions

* initial commit: add colossal llama 2 (#4784)

* [feature] ColossalEval: Evaluation Pipeline for LLMs (#4786)

* Add ColossalEval

* Delete evaluate in Chat

---------

Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
Co-authored-by: Tong Li <tong.li352711588@gmail.com>

* [doc] add llama2 domain-specific solution news (#4789)

* [doc] add llama2 domain-specific solution news

* [fix] fix weekly runing example (#4787)

* [fix] fix weekly runing example

* [fix] fix weekly runing example

* [doc] polish shardformer doc (#4779)

* fix example format in docstring

* polish shardformer doc

* [checkpointio] support unsharded checkpointIO for hybrid parallel (#4774)

* support unsharded saving/loading for model

* support optimizer unsharded saving

* update doc

* support unsharded loading for optimizer

* small fix

* update readme

* [lazy] support from_pretrained (#4801)

* [lazy] patch from pretrained

* [lazy] fix from pretrained and add tests

* [devops] update ci

* update

* [hotfix] change llama2 Colossal-LLaMA-2 script filename (#4800)

change filename:
pretraining.py -> trainin.py
there is no file named pretraing.py. wrong writing

* [misc] add last_epoch in CosineAnnealingWarmupLR (#4778)

* [doc] add lazy init docs (#4808)

* [hotfix] fix norm type error in zero optimizer (#4795)

* [hotfix] Correct several erroneous code comments (#4794)

* [format] applied code formatting on changed files in pull request 4595 (#4602)

Co-authored-by: github-actions <github-actions@github.com>

* fix format (#4815)

* [chat] fix gemini strategy (#4698)

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* g# This is a combination of 2 commits.

[chat] fix gemini strategy

fox

* [chat] fix gemini strategy

update llama2 example

[chat] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* fix

* fix

* fix

* fix

* fix

* Update train_prompts.py

* Update Qwen-7B results (#4821)

Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>

* [doc] update slack link (#4823)

* add autotune (#4822)

* update Colossal (#4832)

* add int8 rotary embedding kernel

* remove useless code

---------

Co-authored-by: flybird11111 <1829166702@qq.com>
Co-authored-by: Yuanheng Zhao <54058983+yuanheng-zhao@users.noreply.github.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
Co-authored-by: Baizhou Zhang <eddiezhang@pku.edu.cn>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions <github-actions@github.com>
Co-authored-by: digger yu <digger-yu@outlook.com>
Co-authored-by: Pengtai Xu <henryxu880@gmail.com>
Co-authored-by: Bin Jia <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: ppt0011 <143150326+ppt0011@users.noreply.github.com>
Co-authored-by: Xuanlei Zhao <43881818+oahzxl@users.noreply.github.com>
Co-authored-by: Hongxin Liu <lhx0217@gmail.com>
Co-authored-by: Wenhao Chen <cwher@outlook.com>
Co-authored-by: littsk <1214689160@qq.com>
Co-authored-by: Jianghai <72591262+CjhHa1@users.noreply.github.com>
Co-authored-by: Tong Li <tong.li352711588@gmail.com>
Co-authored-by: Yuanchen <70520919+chengeharrison@users.noreply.github.com>
Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
Co-authored-by: Desperado-Jia <502205863@qq.com>
Co-authored-by: Chandler-Bing <brp12138@163.com>
Co-authored-by: Yan haixu <40758050+hova88@users.noreply.github.com>
Xu-Kai added a commit to Xu-Kai/ColossalAI that referenced this pull request Oct 3, 2023
…ch#4843)

* [shardformer] fix GPT2DoubleHeadsModel (hpcaitech#4703)

* [hotfix] Fix import error: colossal.kernel without triton installed (hpcaitech#4722)

* [hotfix] remove triton kernels from kernel init

* revise bloom/llama kernel imports for infer

* [shardformer] to fix whisper test failed due to significant accuracy differences. (hpcaitech#4710)

* [shardformer] fix whisper test failed

* [shardformer] fix whisper test failed

* [shardformer] fix whisper test failed

* [shardformer] fix whisper test failed

* [doc] fix llama2 code link (hpcaitech#4726)

* [doc] fix llama2 code link

* [doc] fix llama2 code link

* [doc] fix llama2 code link

* [doc] Add user document for Shardformer (hpcaitech#4702)

* create shardformer doc files

* add docstring for seq-parallel

* update ShardConfig docstring

* add links to llama example

* add outdated massage

* finish introduction & supporting information

* finish 'how shardformer works'

* finish shardformer.md English doc

* fix doctest fail

* add Chinese document

* [format] applied code formatting on changed files in pull request 4726 (hpcaitech#4727)

Co-authored-by: github-actions <github-actions@github.com>

* [doc] add shardformer support matrix/update tensor parallel documents (hpcaitech#4728)

* add compatibility matrix for shardformer doc

* update tp doc

* Optimized some syntax errors in the documentation and code under applications/ (hpcaitech#4127)

Co-authored-by: flybird11111 <1829166702@qq.com>

* [shardformer] update pipeline parallel document (hpcaitech#4725)

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [legacy] remove deterministic data loader test

* [shardformer] update seq parallel document (hpcaitech#4730)

* update doc of seq parallel

* fix typo

* [example] add gpt2 HybridParallelPlugin example (hpcaitech#4653)

* add gpt2 HybridParallelPlugin example

* update readme and testci

* update test ci

* fix test_ci bug

* update requirements

* add requirements

* update requirements

* add requirement

* rename file

* [doc] polish shardformer doc (hpcaitech#4735)

* arrange position of chapters

* fix typos in seq parallel doc

* [shardformer] add custom policy in hybrid parallel plugin (hpcaitech#4718)

* add custom policy

* update assert

* [example] llama2 add fine-tune example (hpcaitech#4673)

* [shardformer] update shardformer readme

[shardformer] update shardformer readme

[shardformer] update shardformer readme

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] change dataset

* [shardformer] change dataset

* [shardformer] fix CI

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

[example] update opt example

[example] resolve comments

fix

fix

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* fix

* update llama2 example

* update llama2 example

* fix

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* Update requirements.txt

* update llama2 example

* update llama2 example

* update llama2 example

* [doc] explaination of loading large pretrained models (hpcaitech#4741)

* [kernel] update triton init hpcaitech#4740 (hpcaitech#4740)

* [legacy] clean up legacy code (hpcaitech#4743)

* [legacy] remove outdated codes of pipeline (hpcaitech#4692)

* [legacy] remove cli of benchmark and update optim (hpcaitech#4690)

* [legacy] remove cli of benchmark and update optim

* [doc] fix cli doc test

* [legacy] fix engine clip grad norm

* [legacy] remove outdated colo tensor (hpcaitech#4694)

* [legacy] remove outdated colo tensor

* [test] fix test import

* [legacy] move outdated zero to legacy (hpcaitech#4696)

* [legacy] clean up utils (hpcaitech#4700)

* [legacy] clean up utils

* [example] update examples

* [legacy] clean up amp

* [legacy] fix amp module

* [legacy] clean up gpc (hpcaitech#4742)

* [legacy] clean up context

* [legacy] clean core, constants and global vars

* [legacy] refactor initialize

* [example] fix examples ci

* [example] fix examples ci

* [legacy] fix tests

* [example] fix gpt example

* [example] fix examples ci

* [devops] fix ci installation

* [example] fix examples ci

* [format] applied code formatting on changed files in pull request 4743 (hpcaitech#4750)

Co-authored-by: github-actions <github-actions@github.com>

* [misc] update pre-commit and run all files (hpcaitech#4752)

* [misc] update pre-commit

* [misc] run pre-commit

* [misc] remove useless configuration files

* [misc] ignore cuda for clang-format

* [doc] explain suitable use case for each plugin

* [doc] put individual plugin explanation in front

* [doc] add model examples for each plugin

* [doc] put native colossalai plugins first in description section

* [chat]: update rm, add wandb and fix bugs (hpcaitech#4471)

* feat: modify forward fn of critic and reward model

* feat: modify calc_action_log_probs

* to: add wandb in sft and rm trainer

* feat: update train_sft

* feat: update train_rm

* style: modify type annotation and add warning

* feat: pass tokenizer to ppo trainer

* to: modify trainer base and maker base

* feat: add wandb in ppo trainer

* feat: pass tokenizer to generate

* test: update generate fn tests

* test: update train tests

* fix: remove action_mask

* feat: remove unused code

* fix: fix wrong ignore_index

* fix: fix mock tokenizer

* chore: update requirements

* revert: modify make_experience

* fix: fix inference

* fix: add padding side

* style: modify _on_learn_batch_end

* test: use mock tokenizer

* fix: use bf16 to avoid overflow

* fix: fix workflow

* [chat] fix gemini strategy

* [chat] fix

* sync: update colossalai strategy

* fix: fix args and model dtype

* fix: fix checkpoint test

* fix: fix requirements

* fix: fix missing import and wrong arg

* fix: temporarily skip gemini test in stage 3

* style: apply pre-commit

* fix: temporarily skip gemini test in stage 1&2

---------

Co-authored-by: Mingyan Jiang <1829166702@qq.com>

* [shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic (hpcaitech#4758)

* fix master param sync for hybrid plugin

* rewrite unwrap for ddp/fsdp

* rewrite unwrap for zero/gemini

* rewrite unwrap for hybrid plugin

* fix geemini unwrap

* fix bugs

* [bug] fix get_default_parser in examples (hpcaitech#4764)

* [doc] clean up outdated docs (hpcaitech#4765)

* [doc] clean up outdated docs

* [doc] fix linking

* [doc] fix linking

* [doc] add shardformer doc to sidebar (hpcaitech#4768)

* [chat]: add lora merge weights config (hpcaitech#4766)

* feat: modify lora merge weights fn

* feat: add lora merge weights config

* [lazy] support torch 2.0 (hpcaitech#4763)

* [lazy] support _like methods and clamp

* [lazy] pass transformers models

* [lazy] fix device move and requires grad

* [lazy] fix requires grad and refactor api

* [lazy] fix requires grad

* [bug] Fix the version check bug in colossalai run when generating the cmd. (hpcaitech#4713)

* Fix the version check bug in colossalai run when generating the cmd.

* polish code

* [feature] add gptq for inference (hpcaitech#4754)

* [gptq] add gptq kernel (hpcaitech#4416)

* add gptq

* refactor code

* fix tests

* replace auto-gptq

* rname inferance/quant

* refactor test

* add auto-gptq as an option

* reset requirements

* change assert and check auto-gptq

* add import warnings

* change test flash attn version

* remove example

* change requirements of flash_attn

* modify tests

* [skip ci] change requirements-test

* [gptq] faster gptq cuda kernel (hpcaitech#4494)

* [skip ci] add cuda kernels

* add license

* [skip ci] fix max_input_len

* format files & change test size

* [skip ci]

* [gptq] add gptq tensor parallel (hpcaitech#4538)

* add gptq tensor parallel

* add gptq tp

* delete print

* add test gptq check

* add test auto gptq check

* [gptq] combine gptq and kv cache manager (hpcaitech#4706)

* combine gptq and kv cache manager

* add init bits

* delete useless code

* add model path

* delete usless print and update test

* delete usless import

* move option gptq to shard config

* change replace linear to shardformer

* update bloom policy

* delete useless code

* fix import bug and delete uselss code

* change colossalai/gptq to colossalai/quant/gptq

* update import linear for tests

* delete useless code and mv gptq_kernel to kernel directory

* fix triton kernel

* add triton import

* [inference] chatglm2 infer demo (hpcaitech#4724)

* add chatglm2

* add

* gather needed kernels

* fix some bugs

* finish context forward

* finish context stage

* fix

* add

* pause

* add

* fix bugs

* finish chatglm

* fix bug

* change some logic

* fix bugs

* change some logics

* add

* add

* add

* fix

* fix tests

* fix

* [release] update version (hpcaitech#4775)

* [release] update version

* [doc] revert versions

* initial commit: add colossal llama 2 (hpcaitech#4784)

* [feature] ColossalEval: Evaluation Pipeline for LLMs (hpcaitech#4786)

* Add ColossalEval

* Delete evaluate in Chat

---------

Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
Co-authored-by: Tong Li <tong.li352711588@gmail.com>

* [doc] add llama2 domain-specific solution news (hpcaitech#4789)

* [doc] add llama2 domain-specific solution news

* [fix] fix weekly runing example (hpcaitech#4787)

* [fix] fix weekly runing example

* [fix] fix weekly runing example

* [doc] polish shardformer doc (hpcaitech#4779)

* fix example format in docstring

* polish shardformer doc

* [checkpointio] support unsharded checkpointIO for hybrid parallel (hpcaitech#4774)

* support unsharded saving/loading for model

* support optimizer unsharded saving

* update doc

* support unsharded loading for optimizer

* small fix

* update readme

* [lazy] support from_pretrained (hpcaitech#4801)

* [lazy] patch from pretrained

* [lazy] fix from pretrained and add tests

* [devops] update ci

* update

* [hotfix] change llama2 Colossal-LLaMA-2 script filename (hpcaitech#4800)

change filename:
pretraining.py -> trainin.py
there is no file named pretraing.py. wrong writing

* [misc] add last_epoch in CosineAnnealingWarmupLR (hpcaitech#4778)

* [doc] add lazy init docs (hpcaitech#4808)

* [hotfix] fix norm type error in zero optimizer (hpcaitech#4795)

* [hotfix] Correct several erroneous code comments (hpcaitech#4794)

* [format] applied code formatting on changed files in pull request 4595 (hpcaitech#4602)

Co-authored-by: github-actions <github-actions@github.com>

* fix format (hpcaitech#4815)

* [chat] fix gemini strategy (hpcaitech#4698)

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* g# This is a combination of 2 commits.

[chat] fix gemini strategy

fox

* [chat] fix gemini strategy

update llama2 example

[chat] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* fix

* fix

* fix

* fix

* fix

* Update train_prompts.py

* Update Qwen-7B results (hpcaitech#4821)

Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>

* [doc] update slack link (hpcaitech#4823)

* add autotune (hpcaitech#4822)

* update Colossal (hpcaitech#4832)

* add int8 rotary embedding kernel

* remove useless code

---------

Co-authored-by: flybird11111 <1829166702@qq.com>
Co-authored-by: Yuanheng Zhao <54058983+yuanheng-zhao@users.noreply.github.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
Co-authored-by: Baizhou Zhang <eddiezhang@pku.edu.cn>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions <github-actions@github.com>
Co-authored-by: digger yu <digger-yu@outlook.com>
Co-authored-by: Pengtai Xu <henryxu880@gmail.com>
Co-authored-by: Bin Jia <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: ppt0011 <143150326+ppt0011@users.noreply.github.com>
Co-authored-by: Xuanlei Zhao <43881818+oahzxl@users.noreply.github.com>
Co-authored-by: Hongxin Liu <lhx0217@gmail.com>
Co-authored-by: Wenhao Chen <cwher@outlook.com>
Co-authored-by: littsk <1214689160@qq.com>
Co-authored-by: Jianghai <72591262+CjhHa1@users.noreply.github.com>
Co-authored-by: Tong Li <tong.li352711588@gmail.com>
Co-authored-by: Yuanchen <70520919+chengeharrison@users.noreply.github.com>
Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
Co-authored-by: Desperado-Jia <502205863@qq.com>
Co-authored-by: Chandler-Bing <brp12138@163.com>
Co-authored-by: Yan haixu <40758050+hova88@users.noreply.github.com>
Xu-Kai added a commit to Xu-Kai/ColossalAI that referenced this pull request Oct 13, 2023
…ch#4843)

* [shardformer] fix GPT2DoubleHeadsModel (hpcaitech#4703)

* [hotfix] Fix import error: colossal.kernel without triton installed (hpcaitech#4722)

* [hotfix] remove triton kernels from kernel init

* revise bloom/llama kernel imports for infer

* [shardformer] to fix whisper test failed due to significant accuracy differences. (hpcaitech#4710)

* [shardformer] fix whisper test failed

* [shardformer] fix whisper test failed

* [shardformer] fix whisper test failed

* [shardformer] fix whisper test failed

* [doc] fix llama2 code link (hpcaitech#4726)

* [doc] fix llama2 code link

* [doc] fix llama2 code link

* [doc] fix llama2 code link

* [doc] Add user document for Shardformer (hpcaitech#4702)

* create shardformer doc files

* add docstring for seq-parallel

* update ShardConfig docstring

* add links to llama example

* add outdated massage

* finish introduction & supporting information

* finish 'how shardformer works'

* finish shardformer.md English doc

* fix doctest fail

* add Chinese document

* [format] applied code formatting on changed files in pull request 4726 (hpcaitech#4727)

Co-authored-by: github-actions <github-actions@github.com>

* [doc] add shardformer support matrix/update tensor parallel documents (hpcaitech#4728)

* add compatibility matrix for shardformer doc

* update tp doc

* Optimized some syntax errors in the documentation and code under applications/ (hpcaitech#4127)

Co-authored-by: flybird11111 <1829166702@qq.com>

* [shardformer] update pipeline parallel document (hpcaitech#4725)

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [legacy] remove deterministic data loader test

* [shardformer] update seq parallel document (hpcaitech#4730)

* update doc of seq parallel

* fix typo

* [example] add gpt2 HybridParallelPlugin example (hpcaitech#4653)

* add gpt2 HybridParallelPlugin example

* update readme and testci

* update test ci

* fix test_ci bug

* update requirements

* add requirements

* update requirements

* add requirement

* rename file

* [doc] polish shardformer doc (hpcaitech#4735)

* arrange position of chapters

* fix typos in seq parallel doc

* [shardformer] add custom policy in hybrid parallel plugin (hpcaitech#4718)

* add custom policy

* update assert

* [example] llama2 add fine-tune example (hpcaitech#4673)

* [shardformer] update shardformer readme

[shardformer] update shardformer readme

[shardformer] update shardformer readme

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] change dataset

* [shardformer] change dataset

* [shardformer] fix CI

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

[example] update opt example

[example] resolve comments

fix

fix

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* fix

* update llama2 example

* update llama2 example

* fix

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* Update requirements.txt

* update llama2 example

* update llama2 example

* update llama2 example

* [doc] explaination of loading large pretrained models (hpcaitech#4741)

* [kernel] update triton init hpcaitech#4740 (hpcaitech#4740)

* [legacy] clean up legacy code (hpcaitech#4743)

* [legacy] remove outdated codes of pipeline (hpcaitech#4692)

* [legacy] remove cli of benchmark and update optim (hpcaitech#4690)

* [legacy] remove cli of benchmark and update optim

* [doc] fix cli doc test

* [legacy] fix engine clip grad norm

* [legacy] remove outdated colo tensor (hpcaitech#4694)

* [legacy] remove outdated colo tensor

* [test] fix test import

* [legacy] move outdated zero to legacy (hpcaitech#4696)

* [legacy] clean up utils (hpcaitech#4700)

* [legacy] clean up utils

* [example] update examples

* [legacy] clean up amp

* [legacy] fix amp module

* [legacy] clean up gpc (hpcaitech#4742)

* [legacy] clean up context

* [legacy] clean core, constants and global vars

* [legacy] refactor initialize

* [example] fix examples ci

* [example] fix examples ci

* [legacy] fix tests

* [example] fix gpt example

* [example] fix examples ci

* [devops] fix ci installation

* [example] fix examples ci

* [format] applied code formatting on changed files in pull request 4743 (hpcaitech#4750)

Co-authored-by: github-actions <github-actions@github.com>

* [misc] update pre-commit and run all files (hpcaitech#4752)

* [misc] update pre-commit

* [misc] run pre-commit

* [misc] remove useless configuration files

* [misc] ignore cuda for clang-format

* [doc] explain suitable use case for each plugin

* [doc] put individual plugin explanation in front

* [doc] add model examples for each plugin

* [doc] put native colossalai plugins first in description section

* [chat]: update rm, add wandb and fix bugs (hpcaitech#4471)

* feat: modify forward fn of critic and reward model

* feat: modify calc_action_log_probs

* to: add wandb in sft and rm trainer

* feat: update train_sft

* feat: update train_rm

* style: modify type annotation and add warning

* feat: pass tokenizer to ppo trainer

* to: modify trainer base and maker base

* feat: add wandb in ppo trainer

* feat: pass tokenizer to generate

* test: update generate fn tests

* test: update train tests

* fix: remove action_mask

* feat: remove unused code

* fix: fix wrong ignore_index

* fix: fix mock tokenizer

* chore: update requirements

* revert: modify make_experience

* fix: fix inference

* fix: add padding side

* style: modify _on_learn_batch_end

* test: use mock tokenizer

* fix: use bf16 to avoid overflow

* fix: fix workflow

* [chat] fix gemini strategy

* [chat] fix

* sync: update colossalai strategy

* fix: fix args and model dtype

* fix: fix checkpoint test

* fix: fix requirements

* fix: fix missing import and wrong arg

* fix: temporarily skip gemini test in stage 3

* style: apply pre-commit

* fix: temporarily skip gemini test in stage 1&2

---------

Co-authored-by: Mingyan Jiang <1829166702@qq.com>

* [shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic (hpcaitech#4758)

* fix master param sync for hybrid plugin

* rewrite unwrap for ddp/fsdp

* rewrite unwrap for zero/gemini

* rewrite unwrap for hybrid plugin

* fix geemini unwrap

* fix bugs

* [bug] fix get_default_parser in examples (hpcaitech#4764)

* [doc] clean up outdated docs (hpcaitech#4765)

* [doc] clean up outdated docs

* [doc] fix linking

* [doc] fix linking

* [doc] add shardformer doc to sidebar (hpcaitech#4768)

* [chat]: add lora merge weights config (hpcaitech#4766)

* feat: modify lora merge weights fn

* feat: add lora merge weights config

* [lazy] support torch 2.0 (hpcaitech#4763)

* [lazy] support _like methods and clamp

* [lazy] pass transformers models

* [lazy] fix device move and requires grad

* [lazy] fix requires grad and refactor api

* [lazy] fix requires grad

* [bug] Fix the version check bug in colossalai run when generating the cmd. (hpcaitech#4713)

* Fix the version check bug in colossalai run when generating the cmd.

* polish code

* [feature] add gptq for inference (hpcaitech#4754)

* [gptq] add gptq kernel (hpcaitech#4416)

* add gptq

* refactor code

* fix tests

* replace auto-gptq

* rname inferance/quant

* refactor test

* add auto-gptq as an option

* reset requirements

* change assert and check auto-gptq

* add import warnings

* change test flash attn version

* remove example

* change requirements of flash_attn

* modify tests

* [skip ci] change requirements-test

* [gptq] faster gptq cuda kernel (hpcaitech#4494)

* [skip ci] add cuda kernels

* add license

* [skip ci] fix max_input_len

* format files & change test size

* [skip ci]

* [gptq] add gptq tensor parallel (hpcaitech#4538)

* add gptq tensor parallel

* add gptq tp

* delete print

* add test gptq check

* add test auto gptq check

* [gptq] combine gptq and kv cache manager (hpcaitech#4706)

* combine gptq and kv cache manager

* add init bits

* delete useless code

* add model path

* delete usless print and update test

* delete usless import

* move option gptq to shard config

* change replace linear to shardformer

* update bloom policy

* delete useless code

* fix import bug and delete uselss code

* change colossalai/gptq to colossalai/quant/gptq

* update import linear for tests

* delete useless code and mv gptq_kernel to kernel directory

* fix triton kernel

* add triton import

* [inference] chatglm2 infer demo (hpcaitech#4724)

* add chatglm2

* add

* gather needed kernels

* fix some bugs

* finish context forward

* finish context stage

* fix

* add

* pause

* add

* fix bugs

* finish chatglm

* fix bug

* change some logic

* fix bugs

* change some logics

* add

* add

* add

* fix

* fix tests

* fix

* [release] update version (hpcaitech#4775)

* [release] update version

* [doc] revert versions

* initial commit: add colossal llama 2 (hpcaitech#4784)

* [feature] ColossalEval: Evaluation Pipeline for LLMs (hpcaitech#4786)

* Add ColossalEval

* Delete evaluate in Chat

---------

Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
Co-authored-by: Tong Li <tong.li352711588@gmail.com>

* [doc] add llama2 domain-specific solution news (hpcaitech#4789)

* [doc] add llama2 domain-specific solution news

* [fix] fix weekly runing example (hpcaitech#4787)

* [fix] fix weekly runing example

* [fix] fix weekly runing example

* [doc] polish shardformer doc (hpcaitech#4779)

* fix example format in docstring

* polish shardformer doc

* [checkpointio] support unsharded checkpointIO for hybrid parallel (hpcaitech#4774)

* support unsharded saving/loading for model

* support optimizer unsharded saving

* update doc

* support unsharded loading for optimizer

* small fix

* update readme

* [lazy] support from_pretrained (hpcaitech#4801)

* [lazy] patch from pretrained

* [lazy] fix from pretrained and add tests

* [devops] update ci

* update

* [hotfix] change llama2 Colossal-LLaMA-2 script filename (hpcaitech#4800)

change filename:
pretraining.py -> trainin.py
there is no file named pretraing.py. wrong writing

* [misc] add last_epoch in CosineAnnealingWarmupLR (hpcaitech#4778)

* [doc] add lazy init docs (hpcaitech#4808)

* [hotfix] fix norm type error in zero optimizer (hpcaitech#4795)

* [hotfix] Correct several erroneous code comments (hpcaitech#4794)

* [format] applied code formatting on changed files in pull request 4595 (hpcaitech#4602)

Co-authored-by: github-actions <github-actions@github.com>

* fix format (hpcaitech#4815)

* [chat] fix gemini strategy (hpcaitech#4698)

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* g# This is a combination of 2 commits.

[chat] fix gemini strategy

fox

* [chat] fix gemini strategy

update llama2 example

[chat] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* fix

* fix

* fix

* fix

* fix

* Update train_prompts.py

* Update Qwen-7B results (hpcaitech#4821)

Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>

* [doc] update slack link (hpcaitech#4823)

* add autotune (hpcaitech#4822)

* update Colossal (hpcaitech#4832)

* add int8 rotary embedding kernel

* remove useless code

---------

Co-authored-by: flybird11111 <1829166702@qq.com>
Co-authored-by: Yuanheng Zhao <54058983+yuanheng-zhao@users.noreply.github.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
Co-authored-by: Baizhou Zhang <eddiezhang@pku.edu.cn>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions <github-actions@github.com>
Co-authored-by: digger yu <digger-yu@outlook.com>
Co-authored-by: Pengtai Xu <henryxu880@gmail.com>
Co-authored-by: Bin Jia <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: ppt0011 <143150326+ppt0011@users.noreply.github.com>
Co-authored-by: Xuanlei Zhao <43881818+oahzxl@users.noreply.github.com>
Co-authored-by: Hongxin Liu <lhx0217@gmail.com>
Co-authored-by: Wenhao Chen <cwher@outlook.com>
Co-authored-by: littsk <1214689160@qq.com>
Co-authored-by: Jianghai <72591262+CjhHa1@users.noreply.github.com>
Co-authored-by: Tong Li <tong.li352711588@gmail.com>
Co-authored-by: Yuanchen <70520919+chengeharrison@users.noreply.github.com>
Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
Co-authored-by: Desperado-Jia <502205863@qq.com>
Co-authored-by: Chandler-Bing <brp12138@163.com>
Co-authored-by: Yan haixu <40758050+hova88@users.noreply.github.com>
Xu-Kai added a commit to Xu-Kai/ColossalAI that referenced this pull request Oct 13, 2023
…ch#4843)

* [shardformer] fix GPT2DoubleHeadsModel (hpcaitech#4703)

* [hotfix] Fix import error: colossal.kernel without triton installed (hpcaitech#4722)

* [hotfix] remove triton kernels from kernel init

* revise bloom/llama kernel imports for infer

* [shardformer] to fix whisper test failed due to significant accuracy differences. (hpcaitech#4710)

* [shardformer] fix whisper test failed

* [shardformer] fix whisper test failed

* [shardformer] fix whisper test failed

* [shardformer] fix whisper test failed

* [doc] fix llama2 code link (hpcaitech#4726)

* [doc] fix llama2 code link

* [doc] fix llama2 code link

* [doc] fix llama2 code link

* [doc] Add user document for Shardformer (hpcaitech#4702)

* create shardformer doc files

* add docstring for seq-parallel

* update ShardConfig docstring

* add links to llama example

* add outdated massage

* finish introduction & supporting information

* finish 'how shardformer works'

* finish shardformer.md English doc

* fix doctest fail

* add Chinese document

* [format] applied code formatting on changed files in pull request 4726 (hpcaitech#4727)

Co-authored-by: github-actions <github-actions@github.com>

* [doc] add shardformer support matrix/update tensor parallel documents (hpcaitech#4728)

* add compatibility matrix for shardformer doc

* update tp doc

* Optimized some syntax errors in the documentation and code under applications/ (hpcaitech#4127)

Co-authored-by: flybird11111 <1829166702@qq.com>

* [shardformer] update pipeline parallel document (hpcaitech#4725)

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [legacy] remove deterministic data loader test

* [shardformer] update seq parallel document (hpcaitech#4730)

* update doc of seq parallel

* fix typo

* [example] add gpt2 HybridParallelPlugin example (hpcaitech#4653)

* add gpt2 HybridParallelPlugin example

* update readme and testci

* update test ci

* fix test_ci bug

* update requirements

* add requirements

* update requirements

* add requirement

* rename file

* [doc] polish shardformer doc (hpcaitech#4735)

* arrange position of chapters

* fix typos in seq parallel doc

* [shardformer] add custom policy in hybrid parallel plugin (hpcaitech#4718)

* add custom policy

* update assert

* [example] llama2 add fine-tune example (hpcaitech#4673)

* [shardformer] update shardformer readme

[shardformer] update shardformer readme

[shardformer] update shardformer readme

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] change dataset

* [shardformer] change dataset

* [shardformer] fix CI

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

[example] update opt example

[example] resolve comments

fix

fix

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* fix

* update llama2 example

* update llama2 example

* fix

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* Update requirements.txt

* update llama2 example

* update llama2 example

* update llama2 example

* [doc] explaination of loading large pretrained models (hpcaitech#4741)

* [kernel] update triton init hpcaitech#4740 (hpcaitech#4740)

* [legacy] clean up legacy code (hpcaitech#4743)

* [legacy] remove outdated codes of pipeline (hpcaitech#4692)

* [legacy] remove cli of benchmark and update optim (hpcaitech#4690)

* [legacy] remove cli of benchmark and update optim

* [doc] fix cli doc test

* [legacy] fix engine clip grad norm

* [legacy] remove outdated colo tensor (hpcaitech#4694)

* [legacy] remove outdated colo tensor

* [test] fix test import

* [legacy] move outdated zero to legacy (hpcaitech#4696)

* [legacy] clean up utils (hpcaitech#4700)

* [legacy] clean up utils

* [example] update examples

* [legacy] clean up amp

* [legacy] fix amp module

* [legacy] clean up gpc (hpcaitech#4742)

* [legacy] clean up context

* [legacy] clean core, constants and global vars

* [legacy] refactor initialize

* [example] fix examples ci

* [example] fix examples ci

* [legacy] fix tests

* [example] fix gpt example

* [example] fix examples ci

* [devops] fix ci installation

* [example] fix examples ci

* [format] applied code formatting on changed files in pull request 4743 (hpcaitech#4750)

Co-authored-by: github-actions <github-actions@github.com>

* [misc] update pre-commit and run all files (hpcaitech#4752)

* [misc] update pre-commit

* [misc] run pre-commit

* [misc] remove useless configuration files

* [misc] ignore cuda for clang-format

* [doc] explain suitable use case for each plugin

* [doc] put individual plugin explanation in front

* [doc] add model examples for each plugin

* [doc] put native colossalai plugins first in description section

* [chat]: update rm, add wandb and fix bugs (hpcaitech#4471)

* feat: modify forward fn of critic and reward model

* feat: modify calc_action_log_probs

* to: add wandb in sft and rm trainer

* feat: update train_sft

* feat: update train_rm

* style: modify type annotation and add warning

* feat: pass tokenizer to ppo trainer

* to: modify trainer base and maker base

* feat: add wandb in ppo trainer

* feat: pass tokenizer to generate

* test: update generate fn tests

* test: update train tests

* fix: remove action_mask

* feat: remove unused code

* fix: fix wrong ignore_index

* fix: fix mock tokenizer

* chore: update requirements

* revert: modify make_experience

* fix: fix inference

* fix: add padding side

* style: modify _on_learn_batch_end

* test: use mock tokenizer

* fix: use bf16 to avoid overflow

* fix: fix workflow

* [chat] fix gemini strategy

* [chat] fix

* sync: update colossalai strategy

* fix: fix args and model dtype

* fix: fix checkpoint test

* fix: fix requirements

* fix: fix missing import and wrong arg

* fix: temporarily skip gemini test in stage 3

* style: apply pre-commit

* fix: temporarily skip gemini test in stage 1&2

---------

Co-authored-by: Mingyan Jiang <1829166702@qq.com>

* [shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic (hpcaitech#4758)

* fix master param sync for hybrid plugin

* rewrite unwrap for ddp/fsdp

* rewrite unwrap for zero/gemini

* rewrite unwrap for hybrid plugin

* fix geemini unwrap

* fix bugs

* [bug] fix get_default_parser in examples (hpcaitech#4764)

* [doc] clean up outdated docs (hpcaitech#4765)

* [doc] clean up outdated docs

* [doc] fix linking

* [doc] fix linking

* [doc] add shardformer doc to sidebar (hpcaitech#4768)

* [chat]: add lora merge weights config (hpcaitech#4766)

* feat: modify lora merge weights fn

* feat: add lora merge weights config

* [lazy] support torch 2.0 (hpcaitech#4763)

* [lazy] support _like methods and clamp

* [lazy] pass transformers models

* [lazy] fix device move and requires grad

* [lazy] fix requires grad and refactor api

* [lazy] fix requires grad

* [bug] Fix the version check bug in colossalai run when generating the cmd. (hpcaitech#4713)

* Fix the version check bug in colossalai run when generating the cmd.

* polish code

* [feature] add gptq for inference (hpcaitech#4754)

* [gptq] add gptq kernel (hpcaitech#4416)

* add gptq

* refactor code

* fix tests

* replace auto-gptq

* rname inferance/quant

* refactor test

* add auto-gptq as an option

* reset requirements

* change assert and check auto-gptq

* add import warnings

* change test flash attn version

* remove example

* change requirements of flash_attn

* modify tests

* [skip ci] change requirements-test

* [gptq] faster gptq cuda kernel (hpcaitech#4494)

* [skip ci] add cuda kernels

* add license

* [skip ci] fix max_input_len

* format files & change test size

* [skip ci]

* [gptq] add gptq tensor parallel (hpcaitech#4538)

* add gptq tensor parallel

* add gptq tp

* delete print

* add test gptq check

* add test auto gptq check

* [gptq] combine gptq and kv cache manager (hpcaitech#4706)

* combine gptq and kv cache manager

* add init bits

* delete useless code

* add model path

* delete usless print and update test

* delete usless import

* move option gptq to shard config

* change replace linear to shardformer

* update bloom policy

* delete useless code

* fix import bug and delete uselss code

* change colossalai/gptq to colossalai/quant/gptq

* update import linear for tests

* delete useless code and mv gptq_kernel to kernel directory

* fix triton kernel

* add triton import

* [inference] chatglm2 infer demo (hpcaitech#4724)

* add chatglm2

* add

* gather needed kernels

* fix some bugs

* finish context forward

* finish context stage

* fix

* add

* pause

* add

* fix bugs

* finish chatglm

* fix bug

* change some logic

* fix bugs

* change some logics

* add

* add

* add

* fix

* fix tests

* fix

* [release] update version (hpcaitech#4775)

* [release] update version

* [doc] revert versions

* initial commit: add colossal llama 2 (hpcaitech#4784)

* [feature] ColossalEval: Evaluation Pipeline for LLMs (hpcaitech#4786)

* Add ColossalEval

* Delete evaluate in Chat

---------

Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
Co-authored-by: Tong Li <tong.li352711588@gmail.com>

* [doc] add llama2 domain-specific solution news (hpcaitech#4789)

* [doc] add llama2 domain-specific solution news

* [fix] fix weekly runing example (hpcaitech#4787)

* [fix] fix weekly runing example

* [fix] fix weekly runing example

* [doc] polish shardformer doc (hpcaitech#4779)

* fix example format in docstring

* polish shardformer doc

* [checkpointio] support unsharded checkpointIO for hybrid parallel (hpcaitech#4774)

* support unsharded saving/loading for model

* support optimizer unsharded saving

* update doc

* support unsharded loading for optimizer

* small fix

* update readme

* [lazy] support from_pretrained (hpcaitech#4801)

* [lazy] patch from pretrained

* [lazy] fix from pretrained and add tests

* [devops] update ci

* update

* [hotfix] change llama2 Colossal-LLaMA-2 script filename (hpcaitech#4800)

change filename:
pretraining.py -> trainin.py
there is no file named pretraing.py. wrong writing

* [misc] add last_epoch in CosineAnnealingWarmupLR (hpcaitech#4778)

* [doc] add lazy init docs (hpcaitech#4808)

* [hotfix] fix norm type error in zero optimizer (hpcaitech#4795)

* [hotfix] Correct several erroneous code comments (hpcaitech#4794)

* [format] applied code formatting on changed files in pull request 4595 (hpcaitech#4602)

Co-authored-by: github-actions <github-actions@github.com>

* fix format (hpcaitech#4815)

* [chat] fix gemini strategy (hpcaitech#4698)

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* g# This is a combination of 2 commits.

[chat] fix gemini strategy

fox

* [chat] fix gemini strategy

update llama2 example

[chat] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* fix

* fix

* fix

* fix

* fix

* Update train_prompts.py

* Update Qwen-7B results (hpcaitech#4821)

Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>

* [doc] update slack link (hpcaitech#4823)

* add autotune (hpcaitech#4822)

* update Colossal (hpcaitech#4832)

* add int8 rotary embedding kernel

* remove useless code

---------

Co-authored-by: flybird11111 <1829166702@qq.com>
Co-authored-by: Yuanheng Zhao <54058983+yuanheng-zhao@users.noreply.github.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
Co-authored-by: Baizhou Zhang <eddiezhang@pku.edu.cn>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions <github-actions@github.com>
Co-authored-by: digger yu <digger-yu@outlook.com>
Co-authored-by: Pengtai Xu <henryxu880@gmail.com>
Co-authored-by: Bin Jia <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: ppt0011 <143150326+ppt0011@users.noreply.github.com>
Co-authored-by: Xuanlei Zhao <43881818+oahzxl@users.noreply.github.com>
Co-authored-by: Hongxin Liu <lhx0217@gmail.com>
Co-authored-by: Wenhao Chen <cwher@outlook.com>
Co-authored-by: littsk <1214689160@qq.com>
Co-authored-by: Jianghai <72591262+CjhHa1@users.noreply.github.com>
Co-authored-by: Tong Li <tong.li352711588@gmail.com>
Co-authored-by: Yuanchen <70520919+chengeharrison@users.noreply.github.com>
Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
Co-authored-by: Desperado-Jia <502205863@qq.com>
Co-authored-by: Chandler-Bing <brp12138@163.com>
Co-authored-by: Yan haixu <40758050+hova88@users.noreply.github.com>
Xu-Kai added a commit to Xu-Kai/ColossalAI that referenced this pull request Oct 13, 2023
…ch#4843)

* [shardformer] fix GPT2DoubleHeadsModel (hpcaitech#4703)

* [hotfix] Fix import error: colossal.kernel without triton installed (hpcaitech#4722)

* [hotfix] remove triton kernels from kernel init

* revise bloom/llama kernel imports for infer

* [shardformer] to fix whisper test failed due to significant accuracy differences. (hpcaitech#4710)

* [shardformer] fix whisper test failed

* [shardformer] fix whisper test failed

* [shardformer] fix whisper test failed

* [shardformer] fix whisper test failed

* [doc] fix llama2 code link (hpcaitech#4726)

* [doc] fix llama2 code link

* [doc] fix llama2 code link

* [doc] fix llama2 code link

* [doc] Add user document for Shardformer (hpcaitech#4702)

* create shardformer doc files

* add docstring for seq-parallel

* update ShardConfig docstring

* add links to llama example

* add outdated massage

* finish introduction & supporting information

* finish 'how shardformer works'

* finish shardformer.md English doc

* fix doctest fail

* add Chinese document

* [format] applied code formatting on changed files in pull request 4726 (hpcaitech#4727)

Co-authored-by: github-actions <github-actions@github.com>

* [doc] add shardformer support matrix/update tensor parallel documents (hpcaitech#4728)

* add compatibility matrix for shardformer doc

* update tp doc

* Optimized some syntax errors in the documentation and code under applications/ (hpcaitech#4127)

Co-authored-by: flybird11111 <1829166702@qq.com>

* [shardformer] update pipeline parallel document (hpcaitech#4725)

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [shardformer] update pipeline parallel document

* [legacy] remove deterministic data loader test

* [shardformer] update seq parallel document (hpcaitech#4730)

* update doc of seq parallel

* fix typo

* [example] add gpt2 HybridParallelPlugin example (hpcaitech#4653)

* add gpt2 HybridParallelPlugin example

* update readme and testci

* update test ci

* fix test_ci bug

* update requirements

* add requirements

* update requirements

* add requirement

* rename file

* [doc] polish shardformer doc (hpcaitech#4735)

* arrange position of chapters

* fix typos in seq parallel doc

* [shardformer] add custom policy in hybrid parallel plugin (hpcaitech#4718)

* add custom policy

* update assert

* [example] llama2 add fine-tune example (hpcaitech#4673)

* [shardformer] update shardformer readme

[shardformer] update shardformer readme

[shardformer] update shardformer readme

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] update llama2/opt finetune example and shardformer update to llama2

* [shardformer] change dataset

* [shardformer] change dataset

* [shardformer] fix CI

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

* [shardformer] fix

[example] update opt example

[example] resolve comments

fix

fix

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* [example] llama2 add finetune example

* fix

* update llama2 example

* update llama2 example

* fix

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* update llama2 example

* Update requirements.txt

* update llama2 example

* update llama2 example

* update llama2 example

* [doc] explaination of loading large pretrained models (hpcaitech#4741)

* [kernel] update triton init hpcaitech#4740 (hpcaitech#4740)

* [legacy] clean up legacy code (hpcaitech#4743)

* [legacy] remove outdated codes of pipeline (hpcaitech#4692)

* [legacy] remove cli of benchmark and update optim (hpcaitech#4690)

* [legacy] remove cli of benchmark and update optim

* [doc] fix cli doc test

* [legacy] fix engine clip grad norm

* [legacy] remove outdated colo tensor (hpcaitech#4694)

* [legacy] remove outdated colo tensor

* [test] fix test import

* [legacy] move outdated zero to legacy (hpcaitech#4696)

* [legacy] clean up utils (hpcaitech#4700)

* [legacy] clean up utils

* [example] update examples

* [legacy] clean up amp

* [legacy] fix amp module

* [legacy] clean up gpc (hpcaitech#4742)

* [legacy] clean up context

* [legacy] clean core, constants and global vars

* [legacy] refactor initialize

* [example] fix examples ci

* [example] fix examples ci

* [legacy] fix tests

* [example] fix gpt example

* [example] fix examples ci

* [devops] fix ci installation

* [example] fix examples ci

* [format] applied code formatting on changed files in pull request 4743 (hpcaitech#4750)

Co-authored-by: github-actions <github-actions@github.com>

* [misc] update pre-commit and run all files (hpcaitech#4752)

* [misc] update pre-commit

* [misc] run pre-commit

* [misc] remove useless configuration files

* [misc] ignore cuda for clang-format

* [doc] explain suitable use case for each plugin

* [doc] put individual plugin explanation in front

* [doc] add model examples for each plugin

* [doc] put native colossalai plugins first in description section

* [chat]: update rm, add wandb and fix bugs (hpcaitech#4471)

* feat: modify forward fn of critic and reward model

* feat: modify calc_action_log_probs

* to: add wandb in sft and rm trainer

* feat: update train_sft

* feat: update train_rm

* style: modify type annotation and add warning

* feat: pass tokenizer to ppo trainer

* to: modify trainer base and maker base

* feat: add wandb in ppo trainer

* feat: pass tokenizer to generate

* test: update generate fn tests

* test: update train tests

* fix: remove action_mask

* feat: remove unused code

* fix: fix wrong ignore_index

* fix: fix mock tokenizer

* chore: update requirements

* revert: modify make_experience

* fix: fix inference

* fix: add padding side

* style: modify _on_learn_batch_end

* test: use mock tokenizer

* fix: use bf16 to avoid overflow

* fix: fix workflow

* [chat] fix gemini strategy

* [chat] fix

* sync: update colossalai strategy

* fix: fix args and model dtype

* fix: fix checkpoint test

* fix: fix requirements

* fix: fix missing import and wrong arg

* fix: temporarily skip gemini test in stage 3

* style: apply pre-commit

* fix: temporarily skip gemini test in stage 1&2

---------

Co-authored-by: Mingyan Jiang <1829166702@qq.com>

* [shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic (hpcaitech#4758)

* fix master param sync for hybrid plugin

* rewrite unwrap for ddp/fsdp

* rewrite unwrap for zero/gemini

* rewrite unwrap for hybrid plugin

* fix geemini unwrap

* fix bugs

* [bug] fix get_default_parser in examples (hpcaitech#4764)

* [doc] clean up outdated docs (hpcaitech#4765)

* [doc] clean up outdated docs

* [doc] fix linking

* [doc] fix linking

* [doc] add shardformer doc to sidebar (hpcaitech#4768)

* [chat]: add lora merge weights config (hpcaitech#4766)

* feat: modify lora merge weights fn

* feat: add lora merge weights config

* [lazy] support torch 2.0 (hpcaitech#4763)

* [lazy] support _like methods and clamp

* [lazy] pass transformers models

* [lazy] fix device move and requires grad

* [lazy] fix requires grad and refactor api

* [lazy] fix requires grad

* [bug] Fix the version check bug in colossalai run when generating the cmd. (hpcaitech#4713)

* Fix the version check bug in colossalai run when generating the cmd.

* polish code

* [feature] add gptq for inference (hpcaitech#4754)

* [gptq] add gptq kernel (hpcaitech#4416)

* add gptq

* refactor code

* fix tests

* replace auto-gptq

* rname inferance/quant

* refactor test

* add auto-gptq as an option

* reset requirements

* change assert and check auto-gptq

* add import warnings

* change test flash attn version

* remove example

* change requirements of flash_attn

* modify tests

* [skip ci] change requirements-test

* [gptq] faster gptq cuda kernel (hpcaitech#4494)

* [skip ci] add cuda kernels

* add license

* [skip ci] fix max_input_len

* format files & change test size

* [skip ci]

* [gptq] add gptq tensor parallel (hpcaitech#4538)

* add gptq tensor parallel

* add gptq tp

* delete print

* add test gptq check

* add test auto gptq check

* [gptq] combine gptq and kv cache manager (hpcaitech#4706)

* combine gptq and kv cache manager

* add init bits

* delete useless code

* add model path

* delete usless print and update test

* delete usless import

* move option gptq to shard config

* change replace linear to shardformer

* update bloom policy

* delete useless code

* fix import bug and delete uselss code

* change colossalai/gptq to colossalai/quant/gptq

* update import linear for tests

* delete useless code and mv gptq_kernel to kernel directory

* fix triton kernel

* add triton import

* [inference] chatglm2 infer demo (hpcaitech#4724)

* add chatglm2

* add

* gather needed kernels

* fix some bugs

* finish context forward

* finish context stage

* fix

* add

* pause

* add

* fix bugs

* finish chatglm

* fix bug

* change some logic

* fix bugs

* change some logics

* add

* add

* add

* fix

* fix tests

* fix

* [release] update version (hpcaitech#4775)

* [release] update version

* [doc] revert versions

* initial commit: add colossal llama 2 (hpcaitech#4784)

* [feature] ColossalEval: Evaluation Pipeline for LLMs (hpcaitech#4786)

* Add ColossalEval

* Delete evaluate in Chat

---------

Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
Co-authored-by: Tong Li <tong.li352711588@gmail.com>

* [doc] add llama2 domain-specific solution news (hpcaitech#4789)

* [doc] add llama2 domain-specific solution news

* [fix] fix weekly runing example (hpcaitech#4787)

* [fix] fix weekly runing example

* [fix] fix weekly runing example

* [doc] polish shardformer doc (hpcaitech#4779)

* fix example format in docstring

* polish shardformer doc

* [checkpointio] support unsharded checkpointIO for hybrid parallel (hpcaitech#4774)

* support unsharded saving/loading for model

* support optimizer unsharded saving

* update doc

* support unsharded loading for optimizer

* small fix

* update readme

* [lazy] support from_pretrained (hpcaitech#4801)

* [lazy] patch from pretrained

* [lazy] fix from pretrained and add tests

* [devops] update ci

* update

* [hotfix] change llama2 Colossal-LLaMA-2 script filename (hpcaitech#4800)

change filename:
pretraining.py -> trainin.py
there is no file named pretraing.py. wrong writing

* [misc] add last_epoch in CosineAnnealingWarmupLR (hpcaitech#4778)

* [doc] add lazy init docs (hpcaitech#4808)

* [hotfix] fix norm type error in zero optimizer (hpcaitech#4795)

* [hotfix] Correct several erroneous code comments (hpcaitech#4794)

* [format] applied code formatting on changed files in pull request 4595 (hpcaitech#4602)

Co-authored-by: github-actions <github-actions@github.com>

* fix format (hpcaitech#4815)

* [chat] fix gemini strategy (hpcaitech#4698)

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* [chat] fix gemini strategy

* g# This is a combination of 2 commits.

[chat] fix gemini strategy

fox

* [chat] fix gemini strategy

update llama2 example

[chat] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* [fix] fix gemini strategy

* fix

* fix

* fix

* fix

* fix

* Update train_prompts.py

* Update Qwen-7B results (hpcaitech#4821)

Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>

* [doc] update slack link (hpcaitech#4823)

* add autotune (hpcaitech#4822)

* update Colossal (hpcaitech#4832)

* add int8 rotary embedding kernel

* remove useless code

---------

Co-authored-by: flybird11111 <1829166702@qq.com>
Co-authored-by: Yuanheng Zhao <54058983+yuanheng-zhao@users.noreply.github.com>
Co-authored-by: binmakeswell <binmakeswell@gmail.com>
Co-authored-by: Baizhou Zhang <eddiezhang@pku.edu.cn>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions <github-actions@github.com>
Co-authored-by: digger yu <digger-yu@outlook.com>
Co-authored-by: Pengtai Xu <henryxu880@gmail.com>
Co-authored-by: Bin Jia <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: ppt0011 <143150326+ppt0011@users.noreply.github.com>
Co-authored-by: Xuanlei Zhao <43881818+oahzxl@users.noreply.github.com>
Co-authored-by: Hongxin Liu <lhx0217@gmail.com>
Co-authored-by: Wenhao Chen <cwher@outlook.com>
Co-authored-by: littsk <1214689160@qq.com>
Co-authored-by: Jianghai <72591262+CjhHa1@users.noreply.github.com>
Co-authored-by: Tong Li <tong.li352711588@gmail.com>
Co-authored-by: Yuanchen <70520919+chengeharrison@users.noreply.github.com>
Co-authored-by: Xu Yuanchen <yuanchen.xu00@gmail.com>
Co-authored-by: Desperado-Jia <502205863@qq.com>
Co-authored-by: Chandler-Bing <brp12138@163.com>
Co-authored-by: Yan haixu <40758050+hova88@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants