Should agent completion be evaluated from final response or external evidence? #1673
Unanswered
productmakerjason
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi OpenAI Evals community — I’m testing an agent-eval boundary:
a model/agent saying “completed” is not the same as verified completion.
For agent workflows, what should an eval treat as the source of truth?
Final response, tool trace, target-system state, external scorer, or receipt-like evidence?
I’m trying to understand whether “completion proof” is better modeled as an eval template or app-level harness logic.
Beta Was this translation helpful? Give feedback.
All reactions