-
Notifications
You must be signed in to change notification settings - Fork 994
[bugfix] fix packing_length #6594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @Jintao-Huang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request primarily focuses on enhancing documentation clarity and integrating a new parameter related to sequence packing. It clarifies how data parallelism is calculated, introduces a Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a fix for packing_length by correctly passing it to the PackingDataset. It also includes several documentation updates, such as adding the packing_length parameter, clarifying the data parallelism formula, and providing guidance on precision testing for multimodal models. The changes are logical and improve the codebase. I have one suggestion to enhance the clarity of the new packing_length documentation.
| - 🔥packing: Whether to use sequence packing to improve computational efficiency (achieving better load balancing across nodes and processes, and higher GPU utilization), at the cost of additional preprocessing time, while also stabilizing GPU memory usage. Defaults to `False`. Currently supported for CPT, SFT, DPO, KTO and RM. | ||
| - Note: **Sequences within the same batch remain mutually invisible**, except for Qwen3-Next. | ||
| - Note: **Packing reduces the number of samples in the dataset; please adjust the gradient accumulation steps and learning rate accordingly**. | ||
| - packing_length: the length to use for packing. Defaults to None, in which case it is set to max_length. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description for packing_length is a bit brief. For better clarity, consider expanding it to explain what the 'length' refers to in the context of packing. Also, it might be helpful to add the 🔥 emoji to highlight it as an important parameter, similar to packing.
| - packing_length: the length to use for packing. Defaults to None, in which case it is set to max_length. | |
| - 🔥packing_length: Specifies the target sequence length for packing. When packing is enabled, multiple short sequences are concatenated into a single sequence of this length to improve efficiency. Defaults to `None`, in which case it is set to `max_length`. |
No description provided.