[Chord] bug fix in phi loss #6952

Birdylx · 2025-12-08T11:38:25Z

PR type

Bug Fix

PR information

fix phi loss bug, the coeff should be stoped gradient as paper describe

Write the detail information belongs to this PR.

Experiment results

Paste your experiment result here(if needed).

gemini-code-assist · 2025-12-08T11:38:36Z

Summary of Changes

Hello @Birdylx, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical bug in the phi loss computation within the Chord reinforcement learning from human feedback (RLHF) trainer. The fix ensures that a specific coefficient, chord_sft_loss, is correctly detached from the gradient computation graph when used to calculate per_token_probs for the phi function. This adjustment is crucial for maintaining the integrity of the gradient flow and ensuring the training process adheres to the original paper's specifications, leading to more accurate and stable model optimization.

Highlights

Bug Fix in Phi Loss Calculation: Addressed a bug in the phi loss calculation within the Chord RLHF trainer, specifically ensuring that a coefficient's gradient is properly stopped as described in the relevant paper.
Gradient Detachment: Implemented chord_sft_loss.detach() when computing per_token_probs for the phi function, preventing unintended gradient flow through this term and aligning the implementation with the theoretical model.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request correctly fixes a critical bug in the CHORD-ϕ loss calculation. By detaching chord_sft_loss before computing the phi weighting factor, it ensures that gradients are calculated correctly, treating phi as a constant weight as described in the CHORD paper. This change is essential for the correct implementation of the algorithm.

gemini-code-assist · 2025-12-08T11:39:53Z

swift/trainers/rlhf_trainer/utils.py

        chord_sft_loss = per_token_loss_func(outputs, labels)

        if trainer.args.chord_enable_phi_function:
-            per_token_probs = torch.exp(-chord_sft_loss)


The phi weighting factor should be treated as a constant during backpropagation, as its purpose is to scale the SFT loss based on the model's confidence. By not detaching chord_sft_loss here, incorrect gradients are propagated through phi, which can lead to training instability or failure to converge. The gradient of the final SFT loss should only come from the original loss term, scaled by phi.

Suggested change

per_token_probs = torch.exp(-chord_sft_loss)

per_token_probs = torch.exp(-chord_sft_loss.detach())

hjh0119 · 2025-12-08T12:02:22Z

Thank you for your contribution

Could you share a reference for why the phi function should stop gradients?

Birdylx · 2025-12-08T12:09:31Z

@hjh0119 here is the original implementation https://github.com/modelscope/Trinity-RFT/blob/613194d45fee0eef9145fb73dbda69cab17fd6f4/trinity/algorithm/policy_loss_fn/chord_policy_loss.py#L97
phi is the weight based on logprob, we do not want gradient for this, it is not learnable.

fix chord phi loss bug

ae10b77

gemini-code-assist bot reviewed Dec 8, 2025

View reviewed changes

hjh0119 approved these changes Dec 8, 2025

View reviewed changes

hjh0119 merged commit d77d409 into modelscope:main Dec 8, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Chord] bug fix in phi loss #6952

[Chord] bug fix in phi loss #6952

Birdylx commented Dec 8, 2025

Uh oh!

gemini-code-assist bot commented Dec 8, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 8, 2025

Uh oh!

hjh0119 commented Dec 8, 2025

Uh oh!

Birdylx commented Dec 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	per_token_probs = torch.exp(-chord_sft_loss)
	per_token_probs = torch.exp(-chord_sft_loss.detach())

[Chord] bug fix in phi loss #6952

[Chord] bug fix in phi loss #6952

Conversation

Birdylx commented Dec 8, 2025

PR type

PR information

Experiment results

Uh oh!

gemini-code-assist bot commented Dec 8, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

hjh0119 commented Dec 8, 2025

Uh oh!

Birdylx commented Dec 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants