Skip to content

Support dpo/grpo/gkd/sft padding_free#181

Merged
tastelikefeet merged 20 commits intomodelscope:mainfrom
tastelikefeet:feat/padding_free_fix
Apr 27, 2026
Merged

Support dpo/grpo/gkd/sft padding_free#181
tastelikefeet merged 20 commits intomodelscope:mainfrom
tastelikefeet:feat/padding_free_fix

Conversation

@tastelikefeet
Copy link
Copy Markdown
Collaborator

@tastelikefeet tastelikefeet commented Apr 23, 2026

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

  1. Support padding-free
  2. Support no-argument constructor on Dataset
  3. Support logger print information on slave processes
  4. Change the default value of variable_seq_lengths to True to support padding-free

Experiment results

  1. Tested padding-free on megatron/transformers grpo/dpo/sft/gkd
  2. Tested dataset constructor on grpo

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request centralizes the logic for unpacking packed sequences (padding-free mode) into the InputProcessor class, moving it out of specific loss implementations like GRPO. It introduces a canonical method to detect packing and unpack tensors such as log-probabilities and labels into a per-sequence batch format. These changes are integrated into both Megatron and Transformers sequence parallel strategies. The review feedback identifies several improvement opportunities: ensuring the boundary detection logic explicitly includes the first sequence, relaxing the packing detection heuristic to support sequences of length one, and optimizing the unpacking process for better performance and consistency.

Comment thread src/twinkle/processor/base.py Outdated
Comment thread src/twinkle/processor/base.py
Comment thread src/twinkle/processor/base.py
@tastelikefeet tastelikefeet changed the title [WIP]Support dpo/grpo simple padding_free Support dpo/grpo simple padding_free Apr 26, 2026
@tastelikefeet
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the handling of packed sequences by moving unpacking logic into the InputProcessor and introduces a require_logits attribute to loss classes to optimize memory usage. It also updates Megatron and Transformers models to support variable sequence lengths by default, implements rank-aware logging, and includes FSDP2 compatibility fixes for LoRA dtypes. Feedback points out an inconsistent error message in the sampling configuration and a potential crash in the sequence unpacking utility when processing empty lists.

Comment thread src/twinkle/data_format/sampling.py
Comment thread src/twinkle/processor/base.py
@tastelikefeet tastelikefeet changed the title Support dpo/grpo simple padding_free Support dpo/grpo/gkd padding_free Apr 27, 2026
@tastelikefeet tastelikefeet changed the title Support dpo/grpo/gkd padding_free Support dpo/grpo/gkd/sft padding_free Apr 27, 2026
Comment thread cookbook/rl/dpo_lora.py Outdated
@tastelikefeet tastelikefeet merged commit 55d377a into modelscope:main Apr 27, 2026
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants