Fix bug of sp/cp #6163

tastelikefeet · 2025-10-16T04:40:58Z

PR type

Bug Fix
New Feature
Document Updates
More Models or Datasets Support

PR information

Write the detail information belongs to this PR.

Experiment results

Paste your experiment result here(if needed).

gemini-code-assist · 2025-10-16T04:41:15Z

Summary of Changes

Hello @tastelikefeet, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses several bugs related to sequence parallelism (sp/cp) in the training pipeline. The changes include refining the train_emb.sh script by removing an unnecessary argument, enhancing the ulysses.py module to correctly handle causal attention and sequence length parameters for Flash Attention 2, generalizing tensor gathering operations, and optimizing tensor splitting. Additionally, the utils.py file was updated to ensure proper gathering of logits during loss computation in a distributed setting. These fixes aim to improve the stability and correctness of sequence parallel training.

Highlights

Training Script Simplification: The --label_names labels argument was removed from the train_emb.sh script, streamlining the training command.
Causal Attention Enforcement: In ulysses.py, a fix was introduced to explicitly set module.is_causal = True for non-text model modules when using Flash Attention 2, ensuring correct causal masking behavior.
Flash Attention 2 Sequence Length Handling: The _attention function in ulysses.py was updated to correctly derive cu_seq_lens_q, cu_seq_lens_k, max_length_q, and max_length_k from position_ids when cu_seq_lens_q is present in kwargs, which is crucial for efficient Flash Attention 2 operation with packed sequences.
Generalized Tensor Gathering: The gather method in ulysses.py was modified to allow gathered_sp to be initialized with arbitrary dimensions beyond just 2D, making the sequence parallel gathering more flexible and robust.
Tensor Splitting Optimization: Redundant .detach().clone() calls were removed from the _split_packed method in ulysses.py, potentially improving performance by avoiding unnecessary tensor copies.
Corrected Logits Gathering: In utils.py, an additional call to GatherLoss.apply was added for outputs.logits before the loss calculation, ensuring that logits are correctly gathered in sequence parallel training scenarios.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces several bug fixes related to sequence parallelism (sp) and context parallelism (cp). The changes correctly handle N-dimensional tensors during the gather operation, fix gradient flow by removing unnecessary .detach().clone() calls, and ensure flash attention receives the correct parameters for packed sequences. Overall, these are good improvements. I have one suggestion in swift/trainers/utils.py regarding a potentially redundant line of code that affects readability.

swift/trainers/utils.py

tastelikefeet added 2 commits October 15, 2025 13:52

fix

bfade1c

fix

f887ff1

gemini-code-assist bot reviewed Oct 16, 2025

View reviewed changes

swift/trainers/utils.py Outdated Show resolved Hide resolved

fix

d34078b

hjh0119 approved these changes Oct 16, 2025

View reviewed changes

Jintao-Huang approved these changes Oct 16, 2025

View reviewed changes

fix

fecb206

tastelikefeet merged commit 0081f8b into modelscope:main Oct 16, 2025
1 of 2 checks passed

Jintao-Huang pushed a commit that referenced this pull request Oct 16, 2025

Fix bug of sp/cp (#6163)

b9c2a47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix bug of sp/cp #6163

Fix bug of sp/cp #6163

Uh oh!

tastelikefeet commented Oct 16, 2025

Uh oh!

gemini-code-assist bot commented Oct 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix bug of sp/cp #6163

Fix bug of sp/cp #6163

Uh oh!

Conversation

tastelikefeet commented Oct 16, 2025

PR type

PR information

Experiment results

Uh oh!

gemini-code-assist bot commented Oct 16, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants