support truncation_strategy split & cached_dataset (qwen3.5)#9211
Conversation
There was a problem hiding this comment.
Code Review
This pull request enables the use of the split truncation strategy in conjunction with cached datasets during pre-training. The changes include documentation updates in both Chinese and English, an updated example script demonstrating the correct workflow, and a modification in sft.py to relax restrictions on multimodal models for this strategy. Feedback was provided to ensure the Chinese documentation is consistent with the English version by adding a warning that the split strategy must be explicitly set during both the dataset export and training phases when using cached datasets.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request enables compatibility between the split truncation strategy and cached_dataset for pre-training. It updates the documentation to specify that when using this combination, input_ids are stored and consistent parameters must be used across export and training. The changes also include updates to the validation logic in sft.py and the example pre-training script. I have no feedback to provide.
No description provided.