Fix: Sync ref_model
's seq_len
with trainer configuration in grpo yaml
#438
+3
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
This change fixes a runtime
ValueError
in GRPO training that occurred when using a sequence length different from the default. Theref_model
was not inheriting theseq_len
from the main trainer configuration, causing it to fall back to the job config default value (e.g., 2048). This led to a dimension mismatch error with the rotary position embeddings when the trainer was configured with a longer sequence length.The fix explicitly sets
seq_len: ${trainer.training.seq_len}
in theref_model.training
section of the relevant GRPO YAML files, like this one. This ensures the reference model always uses the same sequence length as the trainer, resolving the crash.Error Log:
if trainer.training.seq_len !=2048: then the error looks like this:
Test Plan:
Run any of the modified GRPO configurations with a
seq_len
in thetrainer.training
section that is different from the default (e.g., 8192). The training will now proceed without mismatch error.Test Log:
Training successful: