Skip to content

change create rft command to use the selector#325

Merged
xzrderek merged 7 commits intomainfrom
derekx/fix-create-rft-command
Nov 10, 2025
Merged

change create rft command to use the selector#325
xzrderek merged 7 commits intomainfrom
derekx/fix-create-rft-command

Conversation

@xzrderek
Copy link
Contributor

@xzrderek xzrderek commented Nov 10, 2025

Note

Refactors create rft to use the upload selector, adds robust dataset inference (dataloader/input_dataset or builder), simplifies selection UX, and removes evaluator trace persistence.

  • CLI: create rft
    • Uses upload’s selector (_prompt_select) to pick exactly one evaluation test; derives evaluator_id and entry accordingly.
    • Adds _resolve_selected_test to map evaluator_id → source file and function.
    • Improves dataset inference: try dataloader JSONL, then input_dataset, then auto-detect and materialize a dataset builder.
    • Skips upload when evaluator exists; polls until ACTIVE before proceeding.
    • Cleans up last-used evaluator logic and related persistence.
  • CLI: upload
    • Simplifies interactive selection to single-select (arrow + enter); fallback prompt takes a single number.
    • Removes evaluator trace saving.
  • fireworks_rft
    • Removes evaluator trace load/save utilities and updates __all__.
  • Tests
    • Overhauls and adds cases for selector-driven flow, evaluator existence paths, dataset inference (dataloader/input_dataset/builder), and ambiguity handling.

Written by Cursor Bugbot for commit 2d75acf. This will update automatically on new commits. Configure here.

# Stub selector to return the single test; stub upload and polling
import eval_protocol.cli_commands.upload as upload_mod

monkeypatch.setattr(upload_mod, "_prompt_select", lambda tests, non_interactive=False: tests[:1])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Imported Functions Evade Monkeypatch

The monkeypatch targets upload_mod._prompt_select, but create_rft.py imports _prompt_select directly into its namespace with from .upload import _prompt_select. This means the monkeypatch won't affect the function that create_rft_command actually calls. The patch should target cr._prompt_select instead to properly mock the imported function.

Fix in Cursor Fix in Web

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Trust Explicit Evaluator IDs for RFT.

When a user explicitly provides --evaluator-id for an existing evaluator and the evaluator check fails (network error or evaluator doesn't exist yet), the code errors out if multiple tests exist and the evaluator ID doesn't match any discovered test. This prevents users from creating RFT jobs for existing evaluators that were uploaded with custom IDs or from different projects. The code should trust the explicitly provided evaluator_id instead of requiring it to match a discovered test.

eval_protocol/cli_commands/create_rft.py#L420-L429

break
# If still unresolved and multiple tests exist, fail fast to avoid uploading unintended evaluators
if selected_entry is None and len(tests) > 1:
print(
f"Error: Multiple evaluation tests found, and the selected evaluator_id {evaluator_id} does not match any discovered test.\n"
" Please re-run specifying the evaluator id.\n"
" Hints:\n"
" - eval-protocol create rft --evaluator-id <existing-evaluator-id>\n"
)
return 1

Fix in Cursor Fix in Web


Bug: Auto-Extraction Forgets User's Test Choice

When auto-extracting dataset JSONL without --dataset-jsonl, the code re-discovers all tests and only extracts if exactly one test exists. However, if the user selected a specific test via the selector earlier (when --evaluator-id was not provided) from multiple tests, this logic fails because it doesn't use the originally selected test. The selected test information from lines 334-356 is not preserved, causing dataset extraction to fail when multiple tests exist even though the user already selected one.

eval_protocol/cli_commands/create_rft.py#L483-L506

# Prefer explicit --dataset-jsonl, else attempt to extract from data loader or input_dataset of the single discovered test
if not dataset_jsonl:
tests = _discover_tests(project_root)
if len(tests) == 1:
func_name = tests[0].qualname.split(".")[-1]
# Try data_loaders first (existing behavior)
dataset_jsonl = _extract_jsonl_from_dataloader(tests[0].file_path, func_name)
if dataset_jsonl:
# Display relative path for readability
try:
rel = os.path.relpath(dataset_jsonl, project_root)
except Exception:
rel = dataset_jsonl
print(f"✓ Using JSONL from data loader: {rel}")
else:
# Fall back to input_dataset (dataset_path)
dataset_jsonl = _extract_jsonl_from_input_dataset(tests[0].file_path, func_name)
if dataset_jsonl:
# Display relative path for readability
try:
rel = os.path.relpath(dataset_jsonl, project_root)
except Exception:
rel = dataset_jsonl
print(f"✓ Using JSONL from input_dataset: {rel}")

Fix in Cursor Fix in Web


@xzrderek xzrderek merged commit 270dc4e into main Nov 10, 2025
8 checks passed
@xzrderek xzrderek deleted the derekx/fix-create-rft-command branch November 10, 2025 08:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant