-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Description
After encountering several errors and incorrect results, I'd like to share my experience reproducing open-r1/OpenR1-Qwen-7B based on Qwen/Qwen2.5-Math-7B-Instruct.
The training commands below are configured for a node of 8 x H100s (80GB).
1. Modify the config file of Qwen/Qwen2.5-Math-7B-Instruct
After downloading the model Qwen/Qwen2.5-Math-7B-Instruct, we should modify the model config file following https://huggingface.co/open-r1/OpenR1-Qwen-7B/blob/main/config.json
2. Modify the training recipes correctly
If you follow the official installation steps and run the following training command:
accelerate launch --config_file recipes/accelerate_configs/zero3.yaml src/open_r1/sft.py \
--config recipes/OpenR1-Qwen-7B/sft/config.yaml
The first issue you’ll encounter is #366.
To resolve this issue, the effective way is to modify the corresponding recipe: https://github.com/huggingface/open-r1/blob/main/recipes/OpenR1-Qwen-7B/sft/config.yaml by changing the line 29
- use_liger_kernel: true
+ use_liger: true
Other solutions will lead to other issues, such as:
- Downgrading
trlto0.15.x: it might not work or have this issue: https://huggingface.co/open-r1/OpenR1-Qwen-7B/discussions/7 ; - Set
use_liger_kernel: false: then we cannot set thegradient_accumulation_steps: 2since it will bring an OOM problem;
3. Modify the sft.py file
Following this issue: #494, we could change the sft.py https://github.com/huggingface/open-r1/blob/main/src/open_r1/sft.py as follows:
- tokenizer.pad_token = tokenizer.eos_token
+ if tokenizer.pad_token is None:
+ tokenizer.pad_token = tokenizer.eos_token
4. The reproducing results on AIME24, MATH-500
I tested the saved model (the total step is 3219 steps), and the performance is:
- AIME24: 46.7;
- MATH-500: 92.4;
I tested open-r1/OpenR1-Qwen-7B, and the performance is:
- AIME24: 50.0;
- MATH-500: 92.8;
It seems that this model is saved at step 3150.