RLHF Flow-GRPO implementation POC by ifilipis · Pull Request #808 · ostris/ai-toolkit

ifilipis · 2026-04-28T22:44:55Z

Added reinforcement learning (Flow-GRPO) that seems to work quite universally across models.

It implements Flow-GRPO and lets you vote live and thus create a LoRA

Default parameters seem quite well tuned for quick results within a few iterations.
For extra fast results, set LR to 0.005 and behold the speed
The only difference in this implementation vs original: rewards are binary instead of relying on a ranking model

There's a new job type dropdown for creating Flow-GRPO tasks, and GRPO job has a voting interface that lets you generate samples and vote on them

Stuff yet to do:

Manual checkpoints
Reduce memory usage (Z-Image takes 40+ GB) and improve speed
UI polishing
- Prevent prompt settings from resetting on leaving the tab
- Fix skip button color when selected
- Add fixed/random seed
- Improve progress indication when generating candidates
- Maybe fix concurrency of training/generation
- Improve candidate images layout
- Add input images for editing models
Keep testing the algorithm on all models

Update Flow GRPO sampling flow and job UI/API handling

Co-authored-by: Codex <codex@openai.com>

codex and others added 13 commits April 17, 2026 14:24

Add native online Flow-GRPO session backend

98d1924

Fix online GRPO session stability

8c4bd41

Add Flow-GRPO trainer and live voting UI

124d4fe

Fix Flow-GRPO config/runtime validation and voting UI behavior

5d25250

Fix Flow-GRPO live voting and rollout handling

36364a0

Fix Flow GRPO candidate grouping

10ffaf8

Fix Flow-GRPO quantized text encoder restore

bde0bf7

Update Flow GRPO sampling flow and job UI/API handling

3b242ff

Update Flow GRPO sampling flow and job UI/API handling

7622daf

Update Flow GRPO sampling flow and job UI/API handling

Match Flow-GRPO CPS log-prob reduction

a25e747

Co-authored-by: Codex <codex@openai.com>

Fix Flow-GRPO rollout initial latents

973613d

Reduce Flow-GRPO training memory

9a5f633

Log Flow-GRPO training timestep progress

147d67e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RLHF Flow-GRPO implementation POC#808

RLHF Flow-GRPO implementation POC#808
ifilipis wants to merge 13 commits intoostris:mainfrom
ifilipis:rlhf

ifilipis commented Apr 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ifilipis commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ifilipis commented Apr 28, 2026 •

edited

Loading