Skip to content

Conversation

Jintao-Huang
Copy link
Collaborator

No description provided.

Copy link
Contributor

Summary of Changes

Hello @Jintao-Huang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers a comprehensive update to the project's documentation, primarily focusing on enhancing the clarity and detail of command-line parameter descriptions. The changes aim to provide users with more precise information on parameter usage, default behaviors, and their impact on training and inference, particularly for advanced features like multimodal models, quantization, and various tuning methods. Additionally, it refines examples for custom datasets and improves overall readability, ensuring that both new and experienced users can quickly understand and effectively utilize the system's capabilities.

Highlights

  • Documentation Clarity and Detail: Numerous command-line parameters across basic, model, data, template, training, tuner, RLHF, VLLM, SGLang, merge, specific model parameters, and environment variables have received enhanced explanations, usage tips, and notes for clearer understanding.
  • Multi-label Dataset Examples: The custom dataset documentation now includes explicit JSONL examples for multi-label tasks, demonstrating how to represent cases with no labels or specific labels.
  • Parameter Behavior Refinements: Descriptions for key parameters such as adapters, model_kwargs, max_model_len, device_map, max_memory, loss_scale, beta, packing, and lazy_tokenize have been updated to provide more precise behavioral details and recommended usage.
  • Multimodal Training Specifics: Improved documentation for multimodal training parameters, including vit_gradient_checkpointing, freeze_vit, freeze_aligner, max_pixels, and specific Qwen-VL related parameters, now includes notes on OOM prevention and ensuring model alignment with official inference code.
  • Training Efficiency and Resume Guidance: Added remarks on the trade-offs of gradient_checkpointing and the benefits of packing and padding_free for memory and speed. Clearer instructions and notes for resume_from_checkpoint and resume_only_model have also been provided, especially regarding data skipping.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates several documentation files, primarily focusing on 命令行参数.md to enhance clarity and provide more detailed explanations for various parameters. The changes include adding more examples, clarifying parameter behaviors, and improving the overall structure. Additionally, helpful comments regarding memory requirements have been added to example shell scripts. Overall, these are great improvements to the documentation. I've found one minor typo in the Chinese documentation and suggested a fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There seems to be a typo here. 若位准备 should probably be 若未准备.

Suggested change
- enable_channel_loss: 打开channel loss,默认为`False`。你需要在数据集中准备"channel"字段,ms-swift会根据该字段分组统计loss(若位准备"channel"字段,则归为默认`None` channel)。数据集格式参考[channel loss](../Customization/自定义数据集.md#channel-loss)。channel loss兼容packing/padding_free/loss_scale等技术。
- enable_channel_loss: 打开channel loss,默认为`False`。你需要在数据集中准备"channel"字段,ms-swift会根据该字段分组统计loss(若未准备"channel"字段,则归为默认`None` channel)。数据集格式参考[channel loss](../Customization/自定义数据集.md#channel-loss)。channel loss兼容packing/padding_free/loss_scale等技术。

@Jintao-Huang
Copy link
Collaborator Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request provides a comprehensive update to the documentation, particularly for the command-line parameters in both Chinese and English. The changes add significant detail, examples, and clarifications, which will greatly improve usability for developers. I've identified a few minor typos and formatting issues in the English documentation that should be addressed to further enhance its quality.

- Note: The concept of `model_type` in MS-Swift differs from the `model_type` in `config.json`.
- Custom models typically require manually registering a `model_type` and `template`. See the [Custom Model Documentation](../Customization/Custom-model.md) for details.
- model_revision: Model version. Default is `None`.
- task_type: Default is `'causal_lm'`. Options include `'causal_lm'`, `'seq_cls'`, `'embedding'`, `'reranker'`, and`'generative_reranker'`. Examples for seq_cls can be found [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/seq_cls), and examples for embedding can be found [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/embedding).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There appears to be a missing space between and and 'generative_reranker'. It should be and 'generative_reranker' for better readability.

Suggested change
- task_type: Default is `'causal_lm'`. Options include `'causal_lm'`, `'seq_cls'`, `'embedding'`, `'reranker'`, and`'generative_reranker'`. Examples for seq_cls can be found [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/seq_cls), and examples for embedding can be found [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/embedding).
- task_type: Default is `'causal_lm'`. Options include `'causal_lm'`, `'seq_cls'`, `'embedding'`, `'reranker'`, and `'generative_reranker'`. Examples for seq_cls can be found [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/seq_cls), and examples for embedding can be found [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/embedding).

- Note: This parameter applies to all multimodal models. The Qwen2.5-VL specific parameter `MAX_PIXELS` (see bottom of doc) only affects Qwen2.5-VL.
- 🔥agent_template: Agent template that defines how the tool list `'tools'` is converted into the `'system'` message, how tool calls are extracted from model responses during inference/deployment, and the formatting of `{"role": "tool_call", "content": "xxx"}` and `{"role": "tool_response", "content": "xxx"}` in `messages`. Options include `'react_en'`, `'hermes'`, `'glm4'`, `'qwen_en'`, `'toolbench'`, etc. See [here](https://github.com/modelscope/ms-swift/blob/main/swift/plugin/agent_template/__init__.py) for more. Default is `None`, automatically selected based on model type. Refer to [Agent Documentation](./Agent-support.md).
- norm_bbox: Controls how bounding boxes ("bbox" in dataset, containing absolute coordinates; see [Custom Dataset Documentation](../Customization/Custom-dataset.md#grounding)) are normalized. Options: `'norm1000'` (scale coordinates to thousandths), `'none'` (no scaling). Default is `None`, automatically chosen based on model.
Controls how to scale bounding boxes (bbox). Options are 'norm1000' and 'none'. 'norm1000' represents scaling bbox coordinates to one-thousandths, and 'none' means no scaling. Default is None, automatically selected based on the model.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This line appears to be a duplicate of the line above it. Please remove it to avoid redundancy.

- deepspeed_autotp_size: DeepSpeed tensor parallelism size, default is 1. When using DeepSpeed AutoTP, the argument `--deepspeed` must be set to 'zero0', 'zero1', or 'zero2'. (Note: This feature only supports full-parameter training.)
- 🔥output_dir: Default is `None`, automatically set to `'output/<model_name>'`.
- 🔥gradient_checkpointing: Whether to use gradient checkpointing. Default is `True`. This significantly reduces GPU memory usage but slows down training.
- 🔥vit_gradient_checkpointing: - 🔥vit_gradient_checkpointing: For multimodal model training, whether to enable gradient checkpointing for the ViT (Vision Transformer) component. Default is `None`, meaning it follows the value of `gradient_checkpointing`. For an example, please refer to [here](https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/vit_gradient_checkpointing.sh).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The prefix - 🔥vit_gradient_checkpointing: is duplicated in this line. Please remove the extra prefix to correct the formatting.

Suggested change
- 🔥vit_gradient_checkpointing: - 🔥vit_gradient_checkpointing: For multimodal model training, whether to enable gradient checkpointing for the ViT (Vision Transformer) component. Default is `None`, meaning it follows the value of `gradient_checkpointing`. For an example, please refer to [here](https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/vit_gradient_checkpointing.sh).
- 🔥vit_gradient_checkpointing: For multimodal model training, whether to enable gradient checkpointing for the ViT (Vision Transformer) component. Default is `None`, meaning it follows the value of `gradient_checkpointing`. For an example, please refer to [here](https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/vit_gradient_checkpointing.sh).

#### BOFT

BOFT uses the three parameters `target_modules`, `target_regex`, and `modules_to_save`.
BOFT uses the three parameters `target_modules`, `target_regex`, and `modules_to_save`, , whose meanings are described in the documentation above. Additional parameters include:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is an extra comma after ...save,. Please remove it to fix the typo.

Suggested change
BOFT uses the three parameters `target_modules`, `target_regex`, and `modules_to_save`, , whose meanings are described in the documentation above. Additional parameters include:
BOFT uses the three parameters `target_modules`, `target_regex`, and `modules_to_save`, whose meanings are described in the documentation above. Additional parameters include:

- vllm_disable_cascade_attn: Whether to forcibly disable the V1 engine’s cascade-attention implementation to avoid potential numerical issues. Defaults to False; vLLM’s internal heuristics determine whether cascade attention is actually used.
- 🔥vllm_limit_mm_per_prompt: Controls the use of multiple media in vllm, default is `None`. For example, you can pass in `--vllm_limit_mm_per_prompt '{"image": 5, "video": 2}'`.
- vllm_max_lora_rank: Default is `16`. This is the parameter supported by vllm for lora.
- vllm_quantization: vllm is able to quantize model with this argument,supported values can be found [here](https://docs.vllm.ai/en/latest/serving/engine_args.html).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a full-width comma in this line, which is a typo in an English document. Please replace it with a standard comma ,.

Suggested change
- vllm_quantization: vllm is able to quantize model with this argumentsupported values can be found [here](https://docs.vllm.ai/en/latest/serving/engine_args.html).
- vllm_quantization: vllm is able to quantize model with this argument, supported values can be found [here](https://docs.vllm.ai/en/latest/serving/engine_args.html).

@Jintao-Huang
Copy link
Collaborator Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

本次PR主要对文档进行了大规模的更新和优化,特别是针对命令行参数的说明文档,使其更加清晰、详细和易于理解。同时,也修复了一些文档中的笔误和格式问题,并补充了更多示例和注意事项。整体来看,这些改动极大地提升了文档的质量和可用性。

我发现了一些小的格式和笔误问题,并已在具体的代码审查评论中提出修改建议。请查阅并采纳。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

此处 注册 疑似为 注意 的笔误,建议修改以提高文档清晰度。

Suggested change
- 注册:该参数适用于所有的多模态模型。而Qwen2.5-VL特有的模型参数`MAX_PIXELS`(你可以在文档最下面找到)只针对Qwen2.5-VL模型。
- 注意:该参数适用于所有的多模态模型。而Qwen2.5-VL特有的模型参数`MAX_PIXELS`(你可以在文档最下面找到)只针对Qwen2.5-VL模型。

- 🔥use_liger_kernel: Whether to enable the [Liger](https://github.com/linkedin/Liger-Kernel) kernel to accelerate training and reduce GPU memory consumption. Defaults to False. Example shell script can be found [here](https://github.com/modelscope/ms-swift/blob/main/examples/train/liger).
- Note: liger_kernel does not support device_map. Please use DDP/DeepSpeed for multi-GPU training.
- average_tokens_across_devices: Whether to average the number of tokens across devices. If set to True, `num_tokens_in_batch` will be synchronized using all_reduce for accurate loss calculation. Default is False.
-- Note: Liger kernel does not support `device_map`. Use DDP or DeepSpeed for multi-GPU training.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The note for use_liger_kernel seems to have incorrect formatting. It starts with -- which is not standard for a sub-list item in markdown. It should probably be - for correct rendering.

Suggested change
-- Note: Liger kernel does not support `device_map`. Use DDP or DeepSpeed for multi-GPU training.
- Note: Liger kernel does not support `device_map`. Use DDP or DeepSpeed for multi-GPU training.

@Jintao-Huang
Copy link
Collaborator Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request provides a comprehensive update to the documentation for command-line parameters in both Chinese and English. The changes significantly improve clarity, add more details, examples, and links, and fix several typos and inconsistencies. The overall quality of the documentation is much improved. I have one point of feedback regarding the removal of a note about the default behavior of use_logits_to_keep for multimodal models, which might be important for users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The note about the default value of use_logits_to_keep for multimodal models being False for stability has been removed. This seems like important information for users training multimodal models. Was this removal intentional? If so, it might be good to clarify the new behavior. If not, consider re-adding the note.

- cached_dataset: Use a cached dataset (generated with `swift export --to_cached_dataset true ...`) during training to avoid GPU time spent on tokenizing large datasets. Default: `[]`.
- cached_dataset: Use cached datasets during training (generated via the command `swift export --to_cached_dataset true ...`) to avoid GPU memory being occupied by tokenization when training with large datasets. Default is `[]`. Example: [here](https://github.com/modelscope/ms-swift/tree/main/examples/export/cached_dataset).
- Note: cached_dataset supports `--packing` but does not support `--lazy_tokenize` or `--streaming`.
- use_logits_to_keep: Pass `logits_to_keep` in the `forward` method based on labels to reduce the computation and storage of unnecessary logits, thereby reducing memory usage and accelerating training. The default is `None`, which enables automatic selection.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The note about the default value of use_logits_to_keep for multimodal models being False for stability has been removed. This seems like important information for users training multimodal models. Was this removal intentional? If so, it might be good to clarify the new behavior. If not, consider re-adding the note.

@Jintao-Huang Jintao-Huang merged commit f5c0c70 into modelscope:main Oct 10, 2025
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants