change create rft command to use the selector by xzrderek · Pull Request #325 · eval-protocol/python-sdk

xzrderek · 2025-11-10T07:08:46Z

Note

Refactors create rft to use the upload selector, adds robust dataset inference (dataloader/input_dataset or builder), simplifies selection UX, and removes evaluator trace persistence.

CLI: create rft
- Uses upload’s selector (_prompt_select) to pick exactly one evaluation test; derives evaluator_id and entry accordingly.
- Adds _resolve_selected_test to map evaluator_id → source file and function.
- Improves dataset inference: try dataloader JSONL, then input_dataset, then auto-detect and materialize a dataset builder.
- Skips upload when evaluator exists; polls until ACTIVE before proceeding.
- Cleans up last-used evaluator logic and related persistence.
CLI: upload
- Simplifies interactive selection to single-select (arrow + enter); fallback prompt takes a single number.
- Removes evaluator trace saving.
fireworks_rft
- Removes evaluator trace load/save utilities and updates __all__.
Tests
- Overhauls and adds cases for selector-driven flow, evaluator existence paths, dataset inference (dataloader/input_dataset/builder), and ambiguity handling.

^{Written by Cursor Bugbot for commit 2d75acf. This will update automatically on new commits. Configure here.}

eval_protocol/cli_commands/create_rft.py

tests/test_cli_create_rft_infer.py

cursor · 2025-11-10T07:25:00Z

tests/test_cli_create_rft_infer.py

+    # Stub selector to return the single test; stub upload and polling
+    import eval_protocol.cli_commands.upload as upload_mod
+
+    monkeypatch.setattr(upload_mod, "_prompt_select", lambda tests, non_interactive=False: tests[:1])


Bug: Imported Functions Evade Monkeypatch

The monkeypatch targets upload_mod._prompt_select, but create_rft.py imports _prompt_select directly into its namespace with from .upload import _prompt_select. This means the monkeypatch won't affect the function that create_rft_command actually calls. The patch should target cr._prompt_select instead to properly mock the imported function.

cursor

Bug: Trust Explicit Evaluator IDs for RFT.

When a user explicitly provides --evaluator-id for an existing evaluator and the evaluator check fails (network error or evaluator doesn't exist yet), the code errors out if multiple tests exist and the evaluator ID doesn't match any discovered test. This prevents users from creating RFT jobs for existing evaluators that were uploaded with custom IDs or from different projects. The code should trust the explicitly provided evaluator_id instead of requiring it to match a discovered test.

eval_protocol/cli_commands/create_rft.py#L420-L429

python-sdk/eval_protocol/cli_commands/create_rft.py

Lines 420 to 429 in 480fe2a

    
                       break 
        
           # If still unresolved and multiple tests exist, fail fast to avoid uploading unintended evaluators 
        
           if selected_entry is None and len(tests) > 1: 
        
               print( 
        
                   f"Error: Multiple evaluation tests found, and the selected evaluator_id {evaluator_id} does not match any discovered test.\n" 
        
                   "       Please re-run specifying the evaluator id.\n" 
        
                   "       Hints:\n" 
        
                   "         - eval-protocol create rft --evaluator-id <existing-evaluator-id>\n" 
        
               ) 
        
               return 1

Bug: Auto-Extraction Forgets User's Test Choice

When auto-extracting dataset JSONL without --dataset-jsonl, the code re-discovers all tests and only extracts if exactly one test exists. However, if the user selected a specific test via the selector earlier (when --evaluator-id was not provided) from multiple tests, this logic fails because it doesn't use the originally selected test. The selected test information from lines 334-356 is not preserved, causing dataset extraction to fail when multiple tests exist even though the user already selected one.

eval_protocol/cli_commands/create_rft.py#L483-L506

python-sdk/eval_protocol/cli_commands/create_rft.py

Lines 483 to 506 in 480fe2a

    
           # Prefer explicit --dataset-jsonl, else attempt to extract from data loader or input_dataset of the single discovered test 
        
           if not dataset_jsonl: 
        
               tests = _discover_tests(project_root) 
        
               if len(tests) == 1: 
        
                   func_name = tests[0].qualname.split(".")[-1] 
        
                   # Try data_loaders first (existing behavior) 
        
                   dataset_jsonl = _extract_jsonl_from_dataloader(tests[0].file_path, func_name) 
        
                   if dataset_jsonl: 
        
                       # Display relative path for readability 
        
                       try: 
        
                           rel = os.path.relpath(dataset_jsonl, project_root) 
        
                       except Exception: 
        
                           rel = dataset_jsonl 
        
                       print(f"✓ Using JSONL from data loader: {rel}") 
        
                   else: 
        
                       # Fall back to input_dataset (dataset_path) 
        
                       dataset_jsonl = _extract_jsonl_from_input_dataset(tests[0].file_path, func_name) 
        
                       if dataset_jsonl: 
        
                           # Display relative path for readability 
        
                           try: 
        
                               rel = os.path.relpath(dataset_jsonl, project_root) 
        
                           except Exception: 
        
                               rel = dataset_jsonl 
        
                           print(f"✓ Using JSONL from input_dataset: {rel}")

eval_protocol/cli_commands/create_rft.py

change create rft command to use the selector

fa46b85