fix(github): Resolve action targets before defaults#285
Conversation
Teach the prompt and GitHub skills that explicit provider targets win while ambient configuration fills omitted targets. Add fake-repo evals for contextual references and explicit issue targets without exercising live GitHub resources. Refs GH-284 Co-Authored-By: GPT-5 Codex <noreply@openai.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
| "GitHub issue creation from a multi-user Slack thread preserves the original reporter separately from the action requester.", | ||
| pass: [ | ||
| "The assistant posts exactly one reply.", | ||
| "The reply reports a created GitHub issue in getsentry/warden with an issue URL or issue number.", | ||
| "The reply reports a created GitHub issue in getsentry/junior-eval-warden-never-exists with an issue URL or issue number.", |
There was a problem hiding this comment.
🔴 Attribution eval expects successful issue creation against a nonexistent repository
The PR changed the attribution test's target repo from getsentry/warden (a real repo) to getsentry/junior-eval-warden-never-exists (explicitly nonexistent, per the -never-exists naming convention). However, the pass criteria at line 112 still require "The reply reports a created GitHub issue in getsentry/junior-eval-warden-never-exists with an issue URL or issue number." Since the repo doesn't exist, gh issue create --repo getsentry/junior-eval-warden-never-exists will fail with a "not found" error from the GitHub API, causing the assistant to report an error instead of a created issue. The eval will therefore always fail this criterion.
Contrast with the two new tests added in this PR
The two new tests (lines 330–419) correctly handle fake repos by explicitly requiring that no GitHub commands are run: "observed_tool_invocations does not include a bash invocation with gh issue create" and "Do not run GitHub commands against either fake repo." The attribution test doesn't follow this pattern — it still expects the issue to be created successfully.
(Refers to lines 96-112)
Prompt for agents
The attribution eval at line 66 was changed to use fake/nonexistent repos (getsentry/junior-eval-warden-never-exists and getsentry/junior-eval-ops-reference-never-exists) to comply with the new eval quality rule about not hitting real external targets. However, the pass criteria still expect a successfully created GitHub issue (line 112: 'The reply reports a created GitHub issue in getsentry/junior-eval-warden-never-exists with an issue URL or issue number').
Since gh issue create will fail against a nonexistent repo, this eval needs to be restructured. Two approaches:
1. Similar to the two new target-classification tests (lines 330-419): reword the user prompt to ask for a draft issue instead of actual creation, add a criterion that no gh commands are run, and validate the attribution from the draft content.
2. Keep the test using a real repo (like getsentry/junior) since attribution verification genuinely requires live issue creation, and document it as an eval that explicitly opts into live provider access (which the README rule allows).
Approach 1 is preferred since it fully aligns with the new eval quality rule, but it requires reworking the user prompt and criteria to test attribution from draft content rather than a created issue.
Was this helpful? React with 👍 or 👎 to provide feedback.
Clarify GitHub target resolution so explicit repo or issue targets win while channel defaults fill omitted targets. This addresses the config ambiguity from GH-284 without adding broad prompt prose.
Fake Repo Evals
Adds two fake/nonexistent repo evals covering contextual references versus explicit issue targets, and documents that evals should avoid live external URLs or provider mutations unless they explicitly opt in.
Refs #284