sequence parallel by meichangsu1 · Pull Request #25 · modelscope/twinkle

meichangsu1 · 2026-02-03T00:52:47Z

No description provided.

Add `to_transformers_dict` function to convert InputFeature instances into a dictionary compatible with transformers models. The function extracts relevant keys and ensures values are either numpy arrays or torch tensors as required by the transformers library.

- Update `eval` function to pass `adapter_name="default"` to `forward_only`, `calculate_loss`, and `calculate_metric` methods - In `train` function, set optimizer for adapter and include `adapter_name` in `get_train_configs`, `forward_backward`, and `clip_grad_and_step` calls - Ensures proper adapter-specific operations during training and evaluation

- Precompute decay and no-decay parameter name lists before optimizer group creation - Add explicit param_names field to optimizer groups for better debugging and transparency - Maintain identical functional behavior while improving code readability

Modify TransformersModel to only apply sp_strategy.postprocess_outputs when labels are None, preventing unintended postprocessing during training or evaluation with labels present. This ensures postprocessing is reserved for inference scenarios.

Add conditional loss reduction using sp_strategy when labels are present in inputs. This ensures that the loss calculation accounts for the sp_strategy's specific reduction logic, improving model training consistency and alignment with the strategy's objectives.

- Add comprehensive docstring to `_get_sp_group_from_device_mesh` explaining how SP groups are derived when no explicit "sp" mesh dimension exists - Include inline comments in backward passes and attention logic to clarify gradient handling and layout transformations - Improve readability and maintainability of sequence parallel implementation

gemini-code-assist · 2026-02-03T00:53:10Z

Summary of Changes

Hello @meichangsu1, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refines the implementation of sequence parallelism within the twinkle framework, focusing on better integration with various training strategies and adapter-based fine-tuning. It introduces explicit strategy selection for TransformersModel, ensures adapter-aware operations in a sample SFT cookbook, and significantly enhances the clarity of the sequence parallelism logic through detailed code comments. Additionally, a new utility function is provided to facilitate data preparation for transformers models.

Highlights

Enhanced Sequence Parallelism Integration: Improved handling of sequence parallelism (SP) by adding conditional logic for post-processing outputs and integrating loss reduction across SP ranks, ensuring correct behavior in distributed training.
Flexible Training Strategy Selection: Introduced the ability to choose between 'accelerate' and 'native_fsdp' strategies for TransformersModel initialization, enhancing flexibility for distributed training setups.
Adapter-Aware Operations: Modified single_controller_sp.py to explicitly pass adapter_name="default" to model operations, ensuring correct behavior when using adapters like LoRA in a sequence parallel context.
Improved Code Clarity for SP: Added comprehensive docstrings and comments to the sequence_parallel.py module, explaining the intricate logic of SP group derivation, data padding, splitting, gathering, and gradient handling.
New Data Utility Function: Introduced to_transformers_dict in input_feature.py to streamline the conversion of InputFeature objects into a format compatible with transformers models.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

cookbook/sft/single_controller_sp.py
- Updated model calls (forward_only, calculate_loss, calculate_metric, forward_backward, clip_grad_and_step, get_train_configs) to include adapter_name="default".
- Added model.set_optimizer call with adapter_name="default".
src/twinkle/data_format/input_feature.py
- Added to_transformers_dict function to convert InputFeature to a transformers-compatible dictionary.
src/twinkle/model/transformers/strategy/sequence_parallel.py
- Added extensive docstrings and comments to functions and methods explaining sequence parallel logic (e.g., _get_sp_group_from_device_mesh, _LossReduction, _SeqAllToAllAttention, pad, gather, split).
- Removed the _mask_qkv method.
src/twinkle/model/transformers/transformers.py
- Imported NativeFSDPStrategy.
- Added strategy parameter to TransformersModel.__init__ to select between 'accelerate' and 'native_fsdp'.
- Modified self.strategy initialization based on the new strategy parameter.
- Updated forward and forward_only to conditionally call sp_strategy.postprocess_outputs only when labels is None.
- Integrated self.sp_strategy.reduce_loss into calculate_loss for sequence parallel setups.
- Refactored _get_optimizer_grouped_parameters to include param_names and improve parameter grouping logic.

Activity

No activity (comments, reviews, etc.) was provided for this pull request.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces support for sequence parallelism, primarily by adding a NativeFSDPStrategy and integrating it into the TransformersModel. The changes also involve extensive additions of comments and docstrings to clarify the new sequence parallel logic, especially in src/twinkle/model/transformers/strategy/sequence_parallel.py. My review focuses on improving code efficiency and maintainability in the new and modified code sections. I've suggested using constants for magic strings, optimizing data handling functions, and improving the efficiency of parameter grouping logic.

gemini-code-assist · 2026-02-03T00:58:46Z

cookbook/sft/single_controller_sp.py

+        model.forward_only(inputs=batch, adapter_name="default")
+        model.calculate_loss(adapter_name="default")
+    metrics = model.calculate_metric(is_training=False, adapter_name="default")


The string "default" is used as a magic string for adapter_name in multiple places in this file (e.g., lines 76, 77, 78, 101, 102, 104, 115, 119). It would be better to define this as a constant at the beginning of the file to improve readability and maintainability. For example: DEFAULT_ADAPTER_NAME = "default".

gemini-code-assist · 2026-02-03T00:58:46Z

src/twinkle/data_format/input_feature.py

+    _keys = ['input_ids', 'input_embeddings', 'attention_mask', 'position_ids', 'labels', 'completion_mask', 'logits_to_keep', 'num_items_in_batch']
+    for key in list(feature.keys()):


For performance, it's better to use a set for _keys for O(1) average time complexity for membership testing. Also, iterating over list(feature.keys()) is inefficient as it creates a new list. You can iterate directly over the dictionary keys.

Additionally, import torch is inside the function. According to PEP 8, imports should usually be at the top of the file, unless there's a specific reason for lazy loading.

Suggested change

_keys = ['input_ids', 'input_embeddings', 'attention_mask', 'position_ids', 'labels', 'completion_mask', 'logits_to_keep', 'num_items_in_batch']

for key in list(feature.keys()):

_keys = {'input_ids', 'input_embeddings', 'attention_mask', 'position_ids', 'labels', 'completion_mask', 'logits_to_keep', 'num_items_in_batch'}

for key in feature:

gemini-code-assist · 2026-02-03T00:58:46Z

src/twinkle/model/transformers/transformers.py

+        decay_param_names = [
+            n for n, p in params.items() if n in decay_parameters and p.requires_grad
+        ]
+        no_decay_param_names = [
+            n for n, p in params.items() if n not in decay_parameters and p.requires_grad
+        ]


This logic iterates over params.items() twice to create decay_param_names and no_decay_param_names. You could achieve the same result with a single loop for better performance, especially if params is large.

Consider this alternative:

decay_param_names = [] no_decay_param_names = [] for n, p in params.items(): if p.requires_grad: if n in decay_parameters: decay_param_names.append(n) else: no_decay_param_names.append(n)

Also, for better performance of n in decay_parameters, consider converting decay_parameters to a set after it's created.

meichangsu1 added 7 commits February 2, 2026 18:19

wip

d3224be

gemini-code-assist bot reviewed Feb 3, 2026

View reviewed changes

meichangsu1 merged commit 3837bc9 into dev Feb 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sequence parallel#25

sequence parallel#25
meichangsu1 merged 7 commits intodevfrom
sp_ljl_dev

meichangsu1 commented Feb 3, 2026

Uh oh!

gemini-code-assist bot commented Feb 3, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 3, 2026

Uh oh!

gemini-code-assist bot Feb 3, 2026

Uh oh!

gemini-code-assist bot Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		_keys = ['input_ids', 'input_embeddings', 'attention_mask', 'position_ids', 'labels', 'completion_mask', 'logits_to_keep', 'num_items_in_batch']
		for key in list(feature.keys()):

Conversation

meichangsu1 commented Feb 3, 2026

Uh oh!

gemini-code-assist bot commented Feb 3, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant