refactor: clean up legacy template fields and surface remaining knobs#34
Merged
Conversation
Remove resources.replicas, spec.sloSeconds, and spec.parallel from templates. Per RFC #28, none of these fields are consumed by the dispatcher or worker selection at runtime, so leaving them in example workflows misrepresents supported behavior. Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
Remove generation.do_sample and training.optimize_cuda_cache from PPO templates. TRL's PPOTrainer.train() hardcodes do_sample=True, and the optimize_cuda_cache flag has no step-based contract under the custom PPO loop, so neither field reaches a backend. Inference templates keep do_sample because transformers_executor consumes it. Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
AgentExecutor resolves spec.agent.timeout (default 600s) once per run and threads it into both the single-task and batch streaming paths. Validates that the value is a positive number and surfaces a clear ExecutionError otherwise. Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
PPOExecutor now maps the legacy template knobs onto TRL PPOConfig: training.log_with -> report_to (null/empty disables logging via "none"; strings/lists pass through). training.tracker_project_name -> project. training.padding_side already threaded through the tokenizer kwargs by build_hf_load_kwargs; locked down with a test. Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
VLLMExecutor only forwarded "revision" when set under model.vllm, silently dropping the common "model.source.revision" templates were already setting. Pick the vllm-scoped key first, otherwise fall back to the source revision, otherwise omit the kwarg. Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
Drop quotes from string scalars where YAML allows, and convert quoted ints (cpu, count) to bare integers. Quotes are retained where the YAML parser needs them (structural chars, leading whitespace, embedded escapes, apostrophes, reserved literals like "no" used as a string value). Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
The save_model wrapper unconditionally read ppo_trainer.deepspeed to back it up around save, but HF Trainer only sets that attribute when deepspeed is actually enabled, so the single-GPU PPO path crashed on the first _save_checkpoint with AttributeError. Read via getattr with a None fallback; the restore is already guarded by is_deepspeed_enabled, so the placeholder is never read back. Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
- Templates: rename training.log_with to training.report_to and training.tracker_project_name to training.project so the YAML keys match the TRL PPOConfig fields verbatim. - PPO executor: read the canonical keys, simplify _resolve_report_to (string mode returns the string, list mode keeps only str items), and remove the getattr in the save wrapper by moving the deepspeed backup inside the existing is_deepspeed_enabled guard. - vLLM revision helper: tighten the parameter type to str|None and drop redundant str() casts (callers already produce strings). Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
1 task
Other executors only read `spec.model_revision` (i.e. model.source.revision); vLLM kept a redundant model.vllm.revision lookup that no template used. Stamp the engine kwarg directly from spec.model_revision, drop the helper and the duplicate accepted_engine_args entry, and move the revision line in inference_vllm_all_args.yaml under model.source so it documents the only supported location. Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
- AgentExecutor: spec.agent.timeout must be a positive int; reject floats outright (including whole-valued floats like 600.0) so we don't silently truncate sub-second values like 0.5 to 0. - PPO _build_ppo_config: add a TODO marker next to the kl_coef read noting that training.target_kl and training.early_stopping are intentionally unwired pending RFC #28 PR-3; templates keep the fields so the spec doesn't churn between PRs. Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
kaiitunnz
requested changes
May 12, 2026
Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
PR-2 of RFC #28. Removes template fields the runtime ignores, surfaces remaining legacy knobs under names that match the underlying backend, and unifies YAML scalar quoting across
examples/templates/. Also fixes a latent crash in the single-GPU PPO save wrapper that surfaced when validating the PPO config wiring end-to-end. PPOtarget_kl/early_stoppingare deferred to PR-3 because they need a real KL-watching early-stop loop; templates keep the fields and the executor carries aTODOmarker so the spec doesn't churn between PRs.Changes
Template cleanup:
examples/templates/*.yaml— dropresources.replicas,spec.sloSeconds, andspec.parallelblocks; none of those fields reaches the dispatcher or worker selection.examples/templates/ppo_*.yaml— dropgeneration.do_sampleandtraining.optimize_cuda_cache; TRL'sPPOTrainer.trainhardcodesdo_sample=True, and the cache flag has no step-based contract under the custom PPO loop.examples/templates/*.yaml— unify YAML scalars on bare form: drop unnecessary quotes, promote quoted ints (e.g.cpu: "8"→cpu: 8), keep quotes only where YAML grammar requires them (split: "train[:1%]",save_strategy: "no"so it doesn't parse asFalse, etc.).examples/templates/inference_vllm_all_args.yaml— moverevisionfrommodel.vllmtomodel.sourceso all vLLM templates use the same canonical revision location.Executor surfacing:
src/worker/executors/agent_executor.py— honourspec.agent.timeout(default 600 s); requires a positive integer, rejecting floats outright so sub-second values like0.5don't silently truncate to0. Threaded through both the single-task and batch streaming paths.src/worker/executors/ppo_executor.py+ PPO templates — adopt the canonicaltraining.report_toandtraining.projectfield names directly (matchingPPOConfig); a small helper normalisesnull/ empty values to TRL's"none"sentinel.training.padding_sidewas already wired throughbuild_hf_load_kwargs; locked down with a test. ATODO(rfc28-pr3)next to thekl_coefread flagstraining.target_kl/training.early_stoppingas intentionally unwired pending PR-3.src/worker/executors/vllm_executor.py— stamprevisionfromspec.model_revision(the canonicalmodel.source.revision); drop the redundantmodel.vllm.revisionlookup and the resolver helper, aligning vLLM with the other executors.tokenizer_revisionstays undermodel.vllmsince it's a distinct knob.PPO save fix:
src/worker/executors/ppo_executor.py—_wrapped_save_modelunconditionally readppo_trainer.deepspeed, but HF Trainer only sets that attribute when DeepSpeed is enabled. Single-GPU PPO therefore crashed on the first_save_checkpointwithAttributeError. The backup/restore now lives inside the existingis_deepspeed_enabledguard.Tests:
tests/worker/test_agent_timeout.py— agent timeout resolution (int-only contract).tests/worker/test_ppo_config_mapping.py—report_to/project/padding_sidewiring.Design
The fields handled here split into two groups:
PPOConfig, vLLM engine,AgentExecutor) supports the knob, but the worker never threaded the template value through, or the template used a non-canonical name. The fix is a small, local change; where the legacy template names diverged from the backend (log_with,tracker_project_name, vLLM-scopedrevision), we align on the backend name rather than maintaining an alias.target_kl/early_stoppingare a separate case — TRL doesn't take them directly, and FlowMesh-owned KL early-stopping is a non-trivial change that warrants its own PR. The TODO marker keeps the connection between templates and executor explicit without dropping the fields.Test Plan
uv run pytest tests/server tests/shared tests/sdk tests/cliplus the focused unit tests for the executor wiring and the existing template-parse coverage intests/server/task/test_template_validation.py.echo_local,inference_hf_tiny,inference_vllm_tiny,inference_vllm_all_args(after the revision relocation),lora_sft_llama,ppo_training_llama_1bend-to-end on a single GPU worker.Test Result
echo_local— DONE.inference_hf_tiny— DONE; transformers path unchanged.inference_vllm_tiny— DONE; vLLM picked up the revision frommodel.source.revision.inference_vllm_all_args— DONE; confirmsmodel.source.revisionis the only revision source after the helper was removed.lora_sft_llama— DONE;training_successful=True,final_lora_archiveproduced (padding_side wiring exercised).ppo_training_llama_1b— DONE;training_successful=True, runtime ~122 s, ran pastsave_steps=25/50, confirming both thePPOConfigwiring and the_save_checkpointfix.agent_simple_test— skipped; pre-existing requirement on the worker'ssecrets.yaml(UTU_LLM_API_KEY), unrelated to this PR.Pre-submission Checklist
pre-commit run --all-filesand fixed any issues.uv run pytest tests/passes locally.uv sync --all-packages --group ci --frozen).[BREAKING]and described migration steps above.