Skip to content

RLHF Flow-GRPO implementation POC#808

Open
ifilipis wants to merge 13 commits intoostris:mainfrom
ifilipis:rlhf
Open

RLHF Flow-GRPO implementation POC#808
ifilipis wants to merge 13 commits intoostris:mainfrom
ifilipis:rlhf

Conversation

@ifilipis
Copy link
Copy Markdown

@ifilipis ifilipis commented Apr 28, 2026

Added reinforcement learning (Flow-GRPO) that seems to work quite universally across models.

It implements Flow-GRPO and lets you vote live and thus create a LoRA

Default parameters seem quite well tuned for quick results within a few iterations.
For extra fast results, set LR to 0.005 and behold the speed
The only difference in this implementation vs original: rewards are binary instead of relying on a ranking model

There's a new job type dropdown for creating Flow-GRPO tasks, and GRPO job has a voting interface that lets you generate samples and vote on them

Stuff yet to do:

  • Manual checkpoints
  • Reduce memory usage (Z-Image takes 40+ GB) and improve speed
  • UI polishing
    • Prevent prompt settings from resetting on leaving the tab
    • Fix skip button color when selected
    • Add fixed/random seed
    • Improve progress indication when generating candidates
    • Maybe fix concurrency of training/generation
    • Improve candidate images layout
    • Add input images for editing models
  • Keep testing the algorithm on all models

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants