Conversation
james-rl
approved these changes
Mar 5, 2026
Contributor
james-rl
left a comment
There was a problem hiding this comment.
Looks good. Some questions for you
src/commands/benchmark-job/status.ts
Outdated
|
|
||
| // Polling config | ||
| const POLL_INTERVAL_MS = 10 * 1000; // 10 seconds | ||
| const MAX_WAIT_MS = 60 * 60 * 1000; // 1 hour |
Contributor
There was a problem hiding this comment.
is this long enough? It looks like this is the time for the entire job to complete.
| .option( | ||
| "--scenarios <ids...>", | ||
| "Scenario IDs to run (alternative to --benchmark)", | ||
| ) |
Contributor
There was a problem hiding this comment.
consider adding short flags -b and -s for benchmark and scenario
src/utils/commands.ts
Outdated
| benchmarkJob | ||
| .command("status <id>") | ||
| .description("Get benchmark job status and results") | ||
| .option("-w, --wait", "Wait for job to complete before showing results") |
Contributor
There was a problem hiding this comment.
I saw -w and assumed it meant watch -- I think that this letter is confusing.
ross-rl
pushed a commit
that referenced
this pull request
Mar 5, 2026
🤖 I have created a release *beep* *boop* --- ## [1.12.0](v1.11.2...v1.12.0) (2026-03-05) ### Features * **benchmark:** add benchmark job run, status ([#142](#142)) ([80e26c1](80e26c1)) * **blueprint:** support blueprint create metadata ([#141](#141)) ([4579d91](4579d91)) * **cli:** add llms.txt ([#139](#139)) ([db21f81](db21f81)) ### Bug Fixes * using the new format for mcp-configs ([#132](#132)) ([9deeb1c](9deeb1c)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
rli bmj run - Run a benchmark job with an agent
rli bmj status - Get benchmark job status and results
Features
Auto-upsert secrets: Automatically creates BMJ_* secrets from environment variables
- E.g., ANTHROPIC_API_KEY → BMJ_ANTHROPIC_API_KEY
- Skips creation if secret already exists
- Logs all secret operations
Agent configurations with automatic env var handling:
| Agent | Env Vars | Required |
|-------------|---------------------------------------------------|-----------|
| claude-code | ANTHROPIC_API_KEY, CLAUDE_CODE_OAUTH_TOKEN | Yes (any) |
| codex | OPENAI_API_KEY | Yes |
| opencode | ANTHROPIC_API_KEY, OPENAI_API_KEY, GOOGLE_API_KEY | No |
| goose | ANTHROPIC_API_KEY, OPENAI_API_KEY, GOOGLE_API_KEY | No |
| gemini-cli | GEMINI_API_KEY, GOOGLE_API_KEY | Yes (any) |
Benchmark resolution: Searches both list and listPublic endpoints when resolving benchmark names
Default orchestrator config: n_concurrent_trials=10, n_attempts=1, timeout_multiplier=1.0, quiet=false
Default agent timeout: 1800 seconds (30 minutes)
Type of Change
Related Issues
Closes #
Changes Made
Testing
Checklist
Screenshots (if applicable)
Additional Notes