Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/guides/about_the_framework.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ The Python CLI entry point is `run-evals`, defined in
run-evals --json ./eval_set.json

# Mode 2: Direct CLI arguments (what you used in Part 1)
run-evals --task question_answer --model google/gemini-2.0-flash --dataset samples.json
run-evals --task question_answer --model google/gemini-2.5-flash --dataset samples.json
Comment thread
ericwindmill marked this conversation as resolved.
```

### JSON runner
Expand Down
2 changes: 1 addition & 1 deletion docs/guides/using_the_cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ my-project/
- The starter task uses `func: analyze_codebase` — fine for a smoke test, but
you'll want to change `func` to match your eval type (`question_answer`,
`bug_fix`, `code_gen`, etc.)
- The job defaults to `google/gemini-2.0-flash`. Update `models:` to the
- The job defaults to `google/gemini-2.5-flash`. Update `models:` to the
Comment thread
ericwindmill marked this conversation as resolved.
provider(s) you want to test.
- `files` points at `../../` (your project root). Update if your workspace
lives elsewhere.
Expand Down
2 changes: 1 addition & 1 deletion packages/devals_cli/example/evals/jobs/local_dev.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@
# Which models to evaluate. Format: "provider/model-name"
# If omitted, falls back to DEFAULT_MODELS from the Python registries.
models:
- google/gemini-2.0-flash
- google/gemini-2.5-flash
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The model google/gemini-2.5-flash is likely a typo for google/gemini-2.0-flash. Using an incorrect model name in the example configuration will lead to errors for users trying to run the example.

  - google/gemini-2.0-flash


# =============================================================================
# VARIANTS (Optional)
Expand Down
2 changes: 1 addition & 1 deletion packages/devals_cli/lib/src/commands/init_command.dart
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ class InitCommand extends Command<int> {
File(jobPath).writeAsStringSync(
initJobTemplate(
name: 'local_dev',
models: ['google/gemini-2.0-flash'],
models: ['google/gemini-2.5-flash'],
Comment thread
ericwindmill marked this conversation as resolved.
tasks: ['get_started'],
),
);
Expand Down
Loading