Skip to content

fix(github): Resolve action targets before defaults#285

Merged
dcramer merged 1 commit intomainfrom
fix/github-config-target-semantics
May 5, 2026
Merged

fix(github): Resolve action targets before defaults#285
dcramer merged 1 commit intomainfrom
fix/github-config-target-semantics

Conversation

@dcramer
Copy link
Copy Markdown
Member

@dcramer dcramer commented May 5, 2026

Clarify GitHub target resolution so explicit repo or issue targets win while channel defaults fill omitted targets. This addresses the config ambiguity from GH-284 without adding broad prompt prose.

Fake Repo Evals

Adds two fake/nonexistent repo evals covering contextual references versus explicit issue targets, and documents that evals should avoid live external URLs or provider mutations unless they explicitly opt in.

Refs #284

Teach the prompt and GitHub skills that explicit provider targets win while ambient configuration fills omitted targets. Add fake-repo evals for contextual references and explicit issue targets without exercising live GitHub resources.

Refs GH-284
Co-Authored-By: GPT-5 Codex <noreply@openai.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
junior-docs Ready Ready Preview, Comment May 5, 2026 7:31pm

Request Review

@dcramer dcramer marked this pull request as ready for review May 5, 2026 19:38
@dcramer dcramer merged commit 129a533 into main May 5, 2026
13 of 14 checks passed
@dcramer dcramer deleted the fix/github-config-target-semantics branch May 5, 2026 19:38
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 4 additional findings in Devin Review.

Open in Devin Review

Comment on lines 109 to +112
"GitHub issue creation from a multi-user Slack thread preserves the original reporter separately from the action requester.",
pass: [
"The assistant posts exactly one reply.",
"The reply reports a created GitHub issue in getsentry/warden with an issue URL or issue number.",
"The reply reports a created GitHub issue in getsentry/junior-eval-warden-never-exists with an issue URL or issue number.",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Attribution eval expects successful issue creation against a nonexistent repository

The PR changed the attribution test's target repo from getsentry/warden (a real repo) to getsentry/junior-eval-warden-never-exists (explicitly nonexistent, per the -never-exists naming convention). However, the pass criteria at line 112 still require "The reply reports a created GitHub issue in getsentry/junior-eval-warden-never-exists with an issue URL or issue number." Since the repo doesn't exist, gh issue create --repo getsentry/junior-eval-warden-never-exists will fail with a "not found" error from the GitHub API, causing the assistant to report an error instead of a created issue. The eval will therefore always fail this criterion.

Contrast with the two new tests added in this PR

The two new tests (lines 330–419) correctly handle fake repos by explicitly requiring that no GitHub commands are run: "observed_tool_invocations does not include a bash invocation with gh issue create" and "Do not run GitHub commands against either fake repo." The attribution test doesn't follow this pattern — it still expects the issue to be created successfully.

(Refers to lines 96-112)

Prompt for agents
The attribution eval at line 66 was changed to use fake/nonexistent repos (getsentry/junior-eval-warden-never-exists and getsentry/junior-eval-ops-reference-never-exists) to comply with the new eval quality rule about not hitting real external targets. However, the pass criteria still expect a successfully created GitHub issue (line 112: 'The reply reports a created GitHub issue in getsentry/junior-eval-warden-never-exists with an issue URL or issue number').

Since gh issue create will fail against a nonexistent repo, this eval needs to be restructured. Two approaches:

1. Similar to the two new target-classification tests (lines 330-419): reword the user prompt to ask for a draft issue instead of actual creation, add a criterion that no gh commands are run, and validate the attribution from the draft content.

2. Keep the test using a real repo (like getsentry/junior) since attribution verification genuinely requires live issue creation, and document it as an eval that explicitly opts into live provider access (which the README rule allows).

Approach 1 is preferred since it fully aligns with the new eval quality rule, but it requires reworking the user prompt and criteria to test attribution from draft content rather than a created issue.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant