Skip to content

auto select evaluators correctly#323

Merged
xzrderek merged 4 commits intomainfrom
auto_select_evaluator_correctly
Nov 9, 2025
Merged

auto select evaluators correctly#323
xzrderek merged 4 commits intomainfrom
auto_select_evaluator_correctly

Conversation

@benjibc
Copy link
Contributor

@benjibc benjibc commented Nov 9, 2025

  • Changes made:

    • Added robust evaluator-id inference and interactive selection in eval_protocol/cli_commands/create_rft.py (handles last-used, project/home traces, multiple traces with interactive/--yes behavior, fallback to single test).
    • Persist last used evaluator id for seamless subsequent eval-protocol create rft runs.
    • Added tests covering all branches in tests/test_cli_create_rft_infer.py.
  • You can now run eval-protocol create rft without specifying --evaluator-id. It will:

    • Use the last selected evaluator if available.
    • Pick the only trace if just one exists.
    • Prompt to choose when multiple are available (or auto-pick most recent with --yes).
    • Fall back to a single discovered test when no traces exist.

Note

Adds robust evaluator auto-selection (last-used/traces with interactive or most-recent), persists last evaluator, skips upload when evaluator exists with ACTIVE polling, and adds comprehensive tests.

  • CLI create_rft:
    • Evaluator inference: Auto-select via last-used pointer, project/home trace discovery, interactive prompt (or most-recent when --yes).
    • Persistence: Save last-used evaluator to .eval_protocol/last_evaluator.json after successful ACTIVE ensure.
    • Upload short-circuit: If evaluator exists (via GET), skip upload; poll until ACTIVE, with dashboard guidance on timeout.
    • Entry resolution: Map evaluator_id to discovered tests; fail fast if multiple tests and no match.
    • Dataset ID: _build_trimmed_dataset_id hardened to handle empty/non-alpha starts.
  • Tests:
    • Add tests/test_cli_create_rft_infer.py covering last-used loading/saving, trace selection (single/multiple, interactive/non-interactive), fallback to single test, end-to-end create_rft paths, and dataset-id derivation.

Written by Cursor Bugbot for commit cca18e6. This will update automatically on new commits. Configure here.

@xzrderek xzrderek merged commit c1df8b5 into main Nov 9, 2025
8 checks passed
@xzrderek xzrderek deleted the auto_select_evaluator_correctly branch November 9, 2025 03:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants