-
Notifications
You must be signed in to change notification settings - Fork 904
update docs #6073
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update docs #6073
Conversation
Summary of ChangesHello @Jintao-Huang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request delivers a comprehensive update to the project's documentation, primarily focusing on enhancing the clarity and detail of command-line parameter descriptions. The changes aim to provide users with more precise information on parameter usage, default behaviors, and their impact on training and inference, particularly for advanced features like multimodal models, quantization, and various tuning methods. Additionally, it refines examples for custom datasets and improves overall readability, ensuring that both new and experienced users can quickly understand and effectively utilize the system's capabilities. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request updates several documentation files, primarily focusing on 命令行参数.md
to enhance clarity and provide more detailed explanations for various parameters. The changes include adding more examples, clarifying parameter behaviors, and improving the overall structure. Additionally, helpful comments regarding memory requirements have been added to example shell scripts. Overall, these are great improvements to the documentation. I've found one minor typo in the Chinese documentation and suggested a fix.
docs/source/Instruction/命令行参数.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be a typo here. 若位准备
should probably be 若未准备
.
- enable_channel_loss: 打开channel loss,默认为`False`。你需要在数据集中准备"channel"字段,ms-swift会根据该字段分组统计loss(若位准备"channel"字段,则归为默认`None` channel)。数据集格式参考[channel loss](../Customization/自定义数据集.md#channel-loss)。channel loss兼容packing/padding_free/loss_scale等技术。 | |
- enable_channel_loss: 打开channel loss,默认为`False`。你需要在数据集中准备"channel"字段,ms-swift会根据该字段分组统计loss(若未准备"channel"字段,则归为默认`None` channel)。数据集格式参考[channel loss](../Customization/自定义数据集.md#channel-loss)。channel loss兼容packing/padding_free/loss_scale等技术。 |
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request provides a comprehensive update to the documentation, particularly for the command-line parameters in both Chinese and English. The changes add significant detail, examples, and clarifications, which will greatly improve usability for developers. I've identified a few minor typos and formatting issues in the English documentation that should be addressed to further enhance its quality.
- Note: The concept of `model_type` in MS-Swift differs from the `model_type` in `config.json`. | ||
- Custom models typically require manually registering a `model_type` and `template`. See the [Custom Model Documentation](../Customization/Custom-model.md) for details. | ||
- model_revision: Model version. Default is `None`. | ||
- task_type: Default is `'causal_lm'`. Options include `'causal_lm'`, `'seq_cls'`, `'embedding'`, `'reranker'`, and`'generative_reranker'`. Examples for seq_cls can be found [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/seq_cls), and examples for embedding can be found [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/embedding). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There appears to be a missing space between and
and 'generative_reranker'
. It should be and 'generative_reranker'
for better readability.
- task_type: Default is `'causal_lm'`. Options include `'causal_lm'`, `'seq_cls'`, `'embedding'`, `'reranker'`, and`'generative_reranker'`. Examples for seq_cls can be found [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/seq_cls), and examples for embedding can be found [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/embedding). | |
- task_type: Default is `'causal_lm'`. Options include `'causal_lm'`, `'seq_cls'`, `'embedding'`, `'reranker'`, and `'generative_reranker'`. Examples for seq_cls can be found [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/seq_cls), and examples for embedding can be found [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/embedding). |
- Note: This parameter applies to all multimodal models. The Qwen2.5-VL specific parameter `MAX_PIXELS` (see bottom of doc) only affects Qwen2.5-VL. | ||
- 🔥agent_template: Agent template that defines how the tool list `'tools'` is converted into the `'system'` message, how tool calls are extracted from model responses during inference/deployment, and the formatting of `{"role": "tool_call", "content": "xxx"}` and `{"role": "tool_response", "content": "xxx"}` in `messages`. Options include `'react_en'`, `'hermes'`, `'glm4'`, `'qwen_en'`, `'toolbench'`, etc. See [here](https://github.com/modelscope/ms-swift/blob/main/swift/plugin/agent_template/__init__.py) for more. Default is `None`, automatically selected based on model type. Refer to [Agent Documentation](./Agent-support.md). | ||
- norm_bbox: Controls how bounding boxes ("bbox" in dataset, containing absolute coordinates; see [Custom Dataset Documentation](../Customization/Custom-dataset.md#grounding)) are normalized. Options: `'norm1000'` (scale coordinates to thousandths), `'none'` (no scaling). Default is `None`, automatically chosen based on model. | ||
Controls how to scale bounding boxes (bbox). Options are 'norm1000' and 'none'. 'norm1000' represents scaling bbox coordinates to one-thousandths, and 'none' means no scaling. Default is None, automatically selected based on the model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- deepspeed_autotp_size: DeepSpeed tensor parallelism size, default is 1. When using DeepSpeed AutoTP, the argument `--deepspeed` must be set to 'zero0', 'zero1', or 'zero2'. (Note: This feature only supports full-parameter training.) | ||
- 🔥output_dir: Default is `None`, automatically set to `'output/<model_name>'`. | ||
- 🔥gradient_checkpointing: Whether to use gradient checkpointing. Default is `True`. This significantly reduces GPU memory usage but slows down training. | ||
- 🔥vit_gradient_checkpointing: - 🔥vit_gradient_checkpointing: For multimodal model training, whether to enable gradient checkpointing for the ViT (Vision Transformer) component. Default is `None`, meaning it follows the value of `gradient_checkpointing`. For an example, please refer to [here](https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/vit_gradient_checkpointing.sh). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The prefix - 🔥vit_gradient_checkpointing:
is duplicated in this line. Please remove the extra prefix to correct the formatting.
- 🔥vit_gradient_checkpointing: - 🔥vit_gradient_checkpointing: For multimodal model training, whether to enable gradient checkpointing for the ViT (Vision Transformer) component. Default is `None`, meaning it follows the value of `gradient_checkpointing`. For an example, please refer to [here](https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/vit_gradient_checkpointing.sh). | |
- 🔥vit_gradient_checkpointing: For multimodal model training, whether to enable gradient checkpointing for the ViT (Vision Transformer) component. Default is `None`, meaning it follows the value of `gradient_checkpointing`. For an example, please refer to [here](https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/vit_gradient_checkpointing.sh). |
#### BOFT | ||
|
||
BOFT uses the three parameters `target_modules`, `target_regex`, and `modules_to_save`. | ||
BOFT uses the three parameters `target_modules`, `target_regex`, and `modules_to_save`, , whose meanings are described in the documentation above. Additional parameters include: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an extra comma after ...save,
. Please remove it to fix the typo.
BOFT uses the three parameters `target_modules`, `target_regex`, and `modules_to_save`, , whose meanings are described in the documentation above. Additional parameters include: | |
BOFT uses the three parameters `target_modules`, `target_regex`, and `modules_to_save`, whose meanings are described in the documentation above. Additional parameters include: |
- vllm_disable_cascade_attn: Whether to forcibly disable the V1 engine’s cascade-attention implementation to avoid potential numerical issues. Defaults to False; vLLM’s internal heuristics determine whether cascade attention is actually used. | ||
- 🔥vllm_limit_mm_per_prompt: Controls the use of multiple media in vllm, default is `None`. For example, you can pass in `--vllm_limit_mm_per_prompt '{"image": 5, "video": 2}'`. | ||
- vllm_max_lora_rank: Default is `16`. This is the parameter supported by vllm for lora. | ||
- vllm_quantization: vllm is able to quantize model with this argument,supported values can be found [here](https://docs.vllm.ai/en/latest/serving/engine_args.html). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a full-width comma ,
in this line, which is a typo in an English document. Please replace it with a standard comma ,
.
- vllm_quantization: vllm is able to quantize model with this argument,supported values can be found [here](https://docs.vllm.ai/en/latest/serving/engine_args.html). | |
- vllm_quantization: vllm is able to quantize model with this argument, supported values can be found [here](https://docs.vllm.ai/en/latest/serving/engine_args.html). |
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
本次PR主要对文档进行了大规模的更新和优化,特别是针对命令行参数的说明文档,使其更加清晰、详细和易于理解。同时,也修复了一些文档中的笔误和格式问题,并补充了更多示例和注意事项。整体来看,这些改动极大地提升了文档的质量和可用性。
我发现了一些小的格式和笔误问题,并已在具体的代码审查评论中提出修改建议。请查阅并采纳。
docs/source/Instruction/命令行参数.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 🔥use_liger_kernel: Whether to enable the [Liger](https://github.com/linkedin/Liger-Kernel) kernel to accelerate training and reduce GPU memory consumption. Defaults to False. Example shell script can be found [here](https://github.com/modelscope/ms-swift/blob/main/examples/train/liger). | ||
- Note: liger_kernel does not support device_map. Please use DDP/DeepSpeed for multi-GPU training. | ||
- average_tokens_across_devices: Whether to average the number of tokens across devices. If set to True, `num_tokens_in_batch` will be synchronized using all_reduce for accurate loss calculation. Default is False. | ||
-- Note: Liger kernel does not support `device_map`. Use DDP or DeepSpeed for multi-GPU training. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The note for use_liger_kernel
seems to have incorrect formatting. It starts with --
which is not standard for a sub-list item in markdown. It should probably be -
for correct rendering.
-- Note: Liger kernel does not support `device_map`. Use DDP or DeepSpeed for multi-GPU training. | |
- Note: Liger kernel does not support `device_map`. Use DDP or DeepSpeed for multi-GPU training. |
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request provides a comprehensive update to the documentation for command-line parameters in both Chinese and English. The changes significantly improve clarity, add more details, examples, and links, and fix several typos and inconsistencies. The overall quality of the documentation is much improved. I have one point of feedback regarding the removal of a note about the default behavior of use_logits_to_keep
for multimodal models, which might be important for users.
docs/source/Instruction/命令行参数.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The note about the default value of use_logits_to_keep
for multimodal models being False
for stability has been removed. This seems like important information for users training multimodal models. Was this removal intentional? If so, it might be good to clarify the new behavior. If not, consider re-adding the note.
- cached_dataset: Use a cached dataset (generated with `swift export --to_cached_dataset true ...`) during training to avoid GPU time spent on tokenizing large datasets. Default: `[]`. | ||
- cached_dataset: Use cached datasets during training (generated via the command `swift export --to_cached_dataset true ...`) to avoid GPU memory being occupied by tokenization when training with large datasets. Default is `[]`. Example: [here](https://github.com/modelscope/ms-swift/tree/main/examples/export/cached_dataset). | ||
- Note: cached_dataset supports `--packing` but does not support `--lazy_tokenize` or `--streaming`. | ||
- use_logits_to_keep: Pass `logits_to_keep` in the `forward` method based on labels to reduce the computation and storage of unnecessary logits, thereby reducing memory usage and accelerating training. The default is `None`, which enables automatic selection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The note about the default value of use_logits_to_keep
for multimodal models being False
for stability has been removed. This seems like important information for users training multimodal models. Was this removal intentional? If so, it might be good to clarify the new behavior. If not, consider re-adding the note.
No description provided.