feat: wire training.target_kl + training.early_stopping by timzsu · Pull Request #37 · mlsys-io/FlowMesh

timzsu · 2026-05-12T10:12:27Z

Purpose

Closes the TODO from #34: wire training.target_kl and training.early_stopping end-to-end in PPO. When early_stopping=true and per-step KL exceeds target_kl, training halts cleanly and is reported as successful.

Changes

src/worker/executors/ppo_executor.py: add _EarlyStopSignal and _EarlyStopPPOTrainer(PPOTrainer). The subclass overrides log() and raises _EarlyStopSignal when logs["objective/kl"] > self.target_kl. Both PPOTrainer(...) sites in build_trainer() now produce the subclass; _install_kl_early_stopping stamps the threshold, and ppo_trainer.train() is wrapped in try/except _EarlyStopSignal.
examples/templates/ppo_training_llama_1b.yaml: flip early_stopping: true so the smoke template exercises the new path; the multi-gpu and ministral templates stay false to keep the disabled case demonstrable.
tests/worker/test_ppo_early_stopping.py (new): above-target raises; below / equal / missing key / threshold-unset don't.

Design

TRL's PPOTrainer.train() calls self.log(metrics) per step but never checks control.should_training_stop, so a stock TrainerCallback cannot end PPO training. Raising from an overridden log() lets the exception unwind the loop naturally; the executor catches it so the save / metadata path runs as usual.

Activation rules:

early_stopping: true + target_kl > 0 → arm.
early_stopping: false (or missing) → no-op; non-zero target_kl is logged as ignored.
early_stopping: true without a positive target_kl → ExecutionError.

Test Plan

uv run pytest tests/worker/test_ppo_early_stopping.py tests/worker/test_ppo_config_mapping.py tests/server tests/shared tests/sdk tests/cli.
Live e2e: rebuild flowmesh_worker_gpu, restart one GPU worker, submit examples/templates/ppo_training_llama_1b.yaml, expect the early-stop log lines and a shorter runtime.

Test Result

492 server/shared/sdk/cli + 20 PPO unit tests pass; mypy clean.
Live e2e on ppo_training_llama_1b.yaml (1 GPU worker): training_successful=True, runtime 59 s vs ~153 s in refactor: clean up legacy template fields and surface remaining knobs #34. Log lines observed:
- PPO KL early-stop enabled at target_kl=0.1000
- PPO early stop: objective/kl=0.4548 > target_kl=0.1000
- PPO training stopped early by KL threshold

Pre-submission Checklist

I have read the contribution guidelines.
I have run pre-commit run --all-files and fixed any issues.
I have added or updated tests covering my changes (if applicable).
I have verified that uv run pytest tests/ passes locally.
If I changed shared schemas or proto definitions, I have checked downstream compatibility across Server and Worker.
If I changed the SDK or CLI, I have verified the affected packages work (uv sync --all-packages --group ci --frozen).
If this is a breaking change, I have prefixed the PR title with [BREAKING] and described migration steps above.
I have updated documentation or config examples if user-facing behavior changed.

PPO-side early stopping via a PPOTrainer subclass. RFC #28 PR-3 closes the TODO carried over from PR-34. - ppo_executor: add _EarlyStopSignal exception and _EarlyStopPPOTrainer subclass that overrides log(); it calls super().log() then raises _EarlyStopSignal when objective/kl exceeds the trainer's target_kl. TRL's PPO loop ignores TrainerControl.should_training_stop, so a stock TrainerCallback can't end training — letting an exception propagate out of log() unwinds the loop cleanly. Both build_trainer construction sites now produce the subclass; it behaves identically when target_kl is unset. - _install_kl_early_stopping reads training.early_stopping and training.target_kl and stamps target_kl on the trainer. Misconfig (early_stopping=true without a positive target_kl) raises ExecutionError; target_kl set without the flag is logged as ignored. - ppo_trainer.train() now runs under try/except _EarlyStopSignal so the trip is logged and training_successful stays True. - examples/templates/ppo_training_llama_1b.yaml: flip early_stopping to true so the smoke template exercises the new path; the multi-gpu and ministral templates stay false to demonstrate the disabled case. - tests/worker/test_ppo_early_stopping.py: five focused tests (above-target raises, below/equal/missing-key/threshold-unset don't) that patch PPOTrainer.log so only the override is exercised. Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>

- ppo_executor: bool(training_config.get('early_stopping', False)) read string flags like 'false' as truthy. Switch to to_bool from shared.utils.parsing so YAML-stringified booleans canonicalise correctly. - tests: add coverage for _install_kl_early_stopping activation — flag missing / disabled across False, 'false', 'False', 0, 'no', 'off'; armed across True, 'true', 'True', 1, 'yes', 'on'; enabled with no positive target_kl raises ExecutionError. Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>

kaiitunnz

Looks good overall with minor comments.

… log - _install_kl_early_stopping: drop minimum=0 on the safe_float call; the explicit `target_kl <= 0` check below already rejects bad values without silently coercing negatives to 0. - Remove the outer "PPO training stopped early by KL threshold" log; the inner log from _EarlyStopPPOTrainer.log ("PPO early stop: objective/kl=... > target_kl=...") already records the trip with the actual KL value. Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>

kaiitunnz

LGTM.

timzsu changed the title ~~feat(ppo): wire training.target_kl + training.early_stopping~~ feat: wire training.target_kl + training.early_stopping May 12, 2026

timzsu marked this pull request as ready for review May 12, 2026 10:29

timzsu requested a review from kaiitunnz May 12, 2026 10:57

kaiitunnz requested changes May 12, 2026

View reviewed changes

Comment thread src/worker/executors/ppo_executor.py Outdated

Comment thread src/worker/executors/ppo_executor.py Outdated

timzsu requested a review from kaiitunnz May 12, 2026 13:10

kaiitunnz approved these changes May 12, 2026

View reviewed changes

timzsu merged commit b5f80d1 into main May 12, 2026
11 checks passed

timzsu deleted the zsu/rfc28-ppo-early-stop branch May 12, 2026 13:15

timzsu mentioned this pull request May 12, 2026

[RFC]: Cleanup workflow templates. #28

Closed

1 task

timzsu linked an issue May 12, 2026 that may be closed by this pull request

[RFC]: Cleanup workflow templates. #28

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: wire training.target_kl + training.early_stopping#37

feat: wire training.target_kl + training.early_stopping#37
timzsu merged 3 commits into
mainfrom
zsu/rfc28-ppo-early-stop

timzsu commented May 12, 2026 •

edited

Loading

Uh oh!

kaiitunnz left a comment

Uh oh!

Uh oh!

Uh oh!

kaiitunnz left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

timzsu commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Changes

Design

Test Plan

Test Result

Uh oh!

kaiitunnz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kaiitunnz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

timzsu commented May 12, 2026 •

edited

Loading