Skip to content

Conversation

@tastelikefeet
Copy link
Collaborator

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

Write the detail information belongs to this PR.

Experiment results

Paste your experiment result here(if needed).

tastelikefeet added 2 commits October 15, 2025 13:52
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @tastelikefeet, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses several bugs related to sequence parallelism (sp/cp) in the training pipeline. The changes include refining the train_emb.sh script by removing an unnecessary argument, enhancing the ulysses.py module to correctly handle causal attention and sequence length parameters for Flash Attention 2, generalizing tensor gathering operations, and optimizing tensor splitting. Additionally, the utils.py file was updated to ensure proper gathering of logits during loss computation in a distributed setting. These fixes aim to improve the stability and correctness of sequence parallel training.

Highlights

  • Training Script Simplification: The --label_names labels argument was removed from the train_emb.sh script, streamlining the training command.
  • Causal Attention Enforcement: In ulysses.py, a fix was introduced to explicitly set module.is_causal = True for non-text model modules when using Flash Attention 2, ensuring correct causal masking behavior.
  • Flash Attention 2 Sequence Length Handling: The _attention function in ulysses.py was updated to correctly derive cu_seq_lens_q, cu_seq_lens_k, max_length_q, and max_length_k from position_ids when cu_seq_lens_q is present in kwargs, which is crucial for efficient Flash Attention 2 operation with packed sequences.
  • Generalized Tensor Gathering: The gather method in ulysses.py was modified to allow gathered_sp to be initialized with arbitrary dimensions beyond just 2D, making the sequence parallel gathering more flexible and robust.
  • Tensor Splitting Optimization: Redundant .detach().clone() calls were removed from the _split_packed method in ulysses.py, potentially improving performance by avoiding unnecessary tensor copies.
  • Corrected Logits Gathering: In utils.py, an additional call to GatherLoss.apply was added for outputs.logits before the loss calculation, ensuring that logits are correctly gathered in sequence parallel training scenarios.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several bug fixes related to sequence parallelism (sp) and context parallelism (cp). The changes correctly handle N-dimensional tensors during the gather operation, fix gradient flow by removing unnecessary .detach().clone() calls, and ensure flash attention receives the correct parameters for packed sequences. Overall, these are good improvements. I have one suggestion in swift/trainers/utils.py regarding a potentially redundant line of code that affects readability.

@tastelikefeet tastelikefeet merged commit 0081f8b into modelscope:main Oct 16, 2025
1 of 2 checks passed
Jintao-Huang pushed a commit that referenced this pull request Oct 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants