Almost successfully reproducing open-r1/OpenR1-Qwen-7B based on Qwen/Qwen2.5-Math-7B-Instruct. Here are the training configurations.

**After encountering several errors and incorrect results**, I'd like to share my experience reproducing `open-r1/OpenR1-Qwen-7B` based on `Qwen/Qwen2.5-Math-7B-Instruct`.

**The training commands below are configured for a node of 8 x H100s (80GB)**.

### 1. Modify the config file of Qwen/Qwen2.5-Math-7B-Instruct
 
After downloading the model `Qwen/Qwen2.5-Math-7B-Instruct`, we should modify `the model config file` following https://huggingface.co/open-r1/OpenR1-Qwen-7B/blob/main/config.json


### 2. Modify the training recipes correctly

If you **follow the official installation steps** and **run the following training command**:

```
accelerate launch --config_file recipes/accelerate_configs/zero3.yaml src/open_r1/sft.py \
    --config recipes/OpenR1-Qwen-7B/sft/config.yaml
```

The first issue you’ll encounter is https://github.com/huggingface/open-r1/issues/366.

To resolve this issue, the effective way is to **modify the corresponding recipe**: https://github.com/huggingface/open-r1/blob/main/recipes/OpenR1-Qwen-7B/sft/config.yaml by **changing the line 29**
```
- use_liger_kernel: true
+ use_liger: true
```

Other solutions will lead to other issues, such as:
- Downgrading `trl` to `0.15.x`: it might not work or have this issue: https://huggingface.co/open-r1/OpenR1-Qwen-7B/discussions/7 ; 
- Set `use_liger_kernel: false`: then we cannot set the `gradient_accumulation_steps: 2` since **it will bring an OOM problem**;

### 3. Modify the sft.py file

Following this issue: https://github.com/huggingface/open-r1/pull/494, we could change the `sft.py` https://github.com/huggingface/open-r1/blob/main/src/open_r1/sft.py as follows:

```
- tokenizer.pad_token = tokenizer.eos_token
+ if tokenizer.pad_token is None:
+    tokenizer.pad_token = tokenizer.eos_token
```

### 4. The reproducing results on AIME24, MATH-500

I tested the saved model (the total step is 3219 steps), and the performance is:
- **AIME24: 46.7**;
- **MATH-500: 92.4**;

I tested `open-r1/OpenR1-Qwen-7B`, and the performance is:
- **AIME24: 50.0**;
- **MATH-500: 92.8**;

It seems that this model is saved at step 3150.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Almost successfully reproducing open-r1/OpenR1-Qwen-7B based on Qwen/Qwen2.5-Math-7B-Instruct. Here are the training configurations. #545

1. Modify the config file of Qwen/Qwen2.5-Math-7B-Instruct

2. Modify the training recipes correctly

3. Modify the sft.py file

4. The reproducing results on AIME24, MATH-500

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Almost successfully reproducing open-r1/OpenR1-Qwen-7B based on Qwen/Qwen2.5-Math-7B-Instruct. Here are the training configurations. #545

Description

1. Modify the config file of Qwen/Qwen2.5-Math-7B-Instruct

2. Modify the training recipes correctly

3. Modify the sft.py file

4. The reproducing results on AIME24, MATH-500

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions