[GRPO] Reorganize letter counting configs by wizeng23 · Pull Request #1570 · oumi-ai/oumi

wizeng23 · 2025-03-25T23:31:12Z

Description

Changed directory structure in preparation for future PR to add letter counting evaluation.
Changed model for GRPO letter counting to Deepseek R1 distilled Qwen 1.5B, as reasoning models should have better performance.
Removed some unnecessary shard_for_eval params for smaller models.
Fixed broken documentation link.

Related issues

Towards OPE-1122

Before submitting

This PR only changes documentation. (You can ignore the following checks in that case)
Did you read the contributor guideline Pull Request guidelines?
Did you link the issue(s) related to this PR in the section above?
Did you add / update tests where needed?

…onfigs

nikg7 · 2025-03-25T23:47:41Z

configs/examples/letter_counting/grpo/train.yaml


 model:
-  model_name: "Qwen/Qwen2-0.5B-Instruct"
+  model_name: "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"


Does this model train w/o errors ? How much slower it is compared to 0.5B model ?

There was an unrelated error for model training, which I'm pretty sure didn't exist when i submitted my last PR. After fixing that, it trains. It's a bit slower, at 10 min for 5 steps instead of 6 min. The training speed seems rather variable though; for the 1.5B model, it goes from 5 steps after 10 min to 200 steps after 43 min. It's not just the first step that's slow (like with compilation), but the first couple.

wizeng23 added 9 commits March 19, 2025 17:02

merge main

cb7ee60

Initial letter counting eval

74310ec

merge main

a8b3e3d

a

44032ac

merge main

d3f8571

merge main

e666fd1

Add conversation conversion support to Letter count dataset

8c7b566

Merge branch 'wizeng/o1122-letter-count' into wizeng/o1122-refactor-c…

23c0793

…onfigs

Remove changes that will be done in future PR

98a024a

wizeng23 requested review from kaisopos, nikg7, oelachqar and taenin March 25, 2025 23:31

nikg7 reviewed Mar 25, 2025

View reviewed changes

nikg7 approved these changes Mar 25, 2025

View reviewed changes

kaisopos approved these changes Mar 26, 2025

View reviewed changes

Fix unrelated bug

20d82fe

taenin approved these changes Mar 26, 2025

View reviewed changes

wizeng23 merged commit a550278 into main Mar 26, 2025
2 checks passed

wizeng23 deleted the wizeng/o1122-refactor-configs branch March 26, 2025 19:58

penfever pushed a commit that referenced this pull request Aug 27, 2025

[GRPO] Reorganize letter counting configs (#1570)

636dc65

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[GRPO] Reorganize letter counting configs#1570

[GRPO] Reorganize letter counting configs#1570
wizeng23 merged 10 commits intomainfrom
wizeng/o1122-refactor-configs

wizeng23 commented Mar 25, 2025

Uh oh!

nikg7 Mar 25, 2025

Uh oh!

wizeng23 Mar 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

Conversation

wizeng23 commented Mar 25, 2025

Description

Related issues

Before submitting

Uh oh!

nikg7 Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

wizeng23 Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants