Skip to content

test: pin AI-review workflow Codex-action contract#528

Merged
igerber merged 4 commits into
mainfrom
test/openai-review-workflow-contract
Jun 6, 2026
Merged

test: pin AI-review workflow Codex-action contract#528
igerber merged 4 commits into
mainfrom
test/openai-review-workflow-contract

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Jun 6, 2026

Summary

  • Add TestWorkflowCodexActionContract to tests/test_openai_review.py (7 tests) pinning the AI-review workflow (.github/workflows/ai_pr_review.yml) Codex-action wiring as producer↔consumer couplings, not just string presence:
    • openai/codex-action@v1 pin + prompt-file input (scoped to the Run Codex step)
    • compiled-prompt path agrees between the build step (PROMPT=) and the action input (prompt-file:)
    • final-message output wired from the id:-tagged Codex step → CODEX_FINAL_MESSAGE env → JS body of the post-comment step (the ref id must equal the Codex step's actual id:)
    • all three unified-diff exclude pathspecs (real-data JSON/CSV + notebook .ipynb)
    • comment markers + the invariant that the prev-review fetch filter is a prefix shared by the canonical and rerun markers
  • Each assertion binds to its specific step block (via a _step_block helper mirroring the sibling TestWorkflowDoesNotExecutePRHeadCode._extract_step_block) and anchors to the live key line / JS assignment (start-of-line, MULTILINE), so a literal left in a same-step comment can't make the pin pass spuriously.
  • Close the TODO row "AI review CI: pin workflow contract via test" (Revert AI review CI to Codex + gpt-5.4 (reverts #404, #415) #416): removed from the Testing/Docs table and the Tier B backlog.

Methodology references (required if estimator / math changes)

  • N/A — no estimator / math / methodology changes. Test + TODO only.

Validation

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

🤖 Generated with Claude Code

igerber and others added 3 commits June 6, 2026 07:51
Add TestWorkflowCodexActionContract to tests/test_openai_review.py pinning
the load-bearing wiring of .github/workflows/ai_pr_review.yml as
producer<->consumer couplings (not just string presence):

- openai/codex-action@v1 pin + prompt-file input
- compiled-prompt path agrees between the build step (PROMPT=) and the
  action input (prompt-file:)
- final-message output flows from the id:-tagged Codex step into the
  post-comment step (broken id => blank comment, caught)
- the three unified-diff exclude pathspecs (real-data JSON/CSV + .ipynb)
- comment markers + the invariant that the prev-review fetch filter is a
  prefix shared by both the canonical and rerun markers

These cover the elements not already guarded by TestWorkflowPromptHardening
(wrapper/close-tag sanitization), TestWorkflowCommentPosting (rerun gates),
or TestWorkflowDoesNotExecutePRHeadCode (sandbox: read-only).

Closes the TODO row "AI review CI: pin workflow contract via test" (#416);
removes it from the Testing/Docs table and the Tier B backlog.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address local AI-review feedback on the new TestWorkflowCodexActionContract:

- P2: assertions scanned the whole workflow text, so drift could pass if a
  literal appeared in a comment or an unrelated step. Bind each assertion to
  the specific step block (Run Codex / Build review prompt / Post PR comment /
  Fetch previous AI review) via a `_step_block` helper that mirrors the sibling
  TestWorkflowDoesNotExecutePRHeadCode._extract_step_block. The final-message
  test now asserts the post-comment ref id equals the Codex step's actual `id:`
  (not just "some `id:` exists somewhere"). Verified: a stray action pin in a
  comment with a different action in the real step now fails (global check
  would have passed).

- P3: reworded the class docstring "Tracks TODO.md ..." -> "Covers the former
  TODO.md item ..." since the row is removed in the same change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address local AI-review R2-P2 (test robustness): within a step block the
assertions still used bare substrings, so a literal left in a same-step
comment after the real key was commented out / moved could pass spuriously.

Anchor each check to the live key line / JS assignment (start-of-line,
MULTILINE): `^\s*uses: openai/codex-action@v1$`, `^\s*prompt-file: ...$`,
`^\s*PROMPT=...$`, the `^\s*CODEX_FINAL_MESSAGE: ${{ steps.<id>.outputs.
final-message }}$` env mapping (plus a check that the JS reads
process.env.CODEX_FINAL_MESSAGE), `const marker =`/`const rerunMarker =`
JS assignments, and the `.includes("<!-- ai-pr-review:codex:")` filter.
The pathspec check now requires a live (non-comment) command line.

Verified: a commented-out action pin / marker that the bare-substring
check accepted now fails the anchored check.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 6, 2026

Overall Assessment

✅ Looks good

Executive Summary

  • No estimator, math, weighting, variance/SE, identification, or default-behavior changes.
  • Methodology impact: none; docs/methodology/REGISTRY.md does not apply to these test/TODO-only changes.
  • The new workflow contract tests are scoped to the intended workflow steps and cover the former TODO item.
  • TODO cleanup is consistent: the workflow-contract test row was removed from both the Testing/Docs table and Tier B backlog.
  • I could not execute the tests because pytest is not installed in this environment.

Methodology

No findings.

Severity: N/A
Impact: No causal estimators or methodology contracts are changed.
Concrete fix: N/A

Code Quality

No P0/P1 findings.

Severity: P3
Impact: Minor test-hardening opportunity: tests/test_openai_review.py:L2574-L2578 checks process.env.CODEX_FINAL_MESSAGE with a bare substring, while nearby comments say assertions are anchored to live lines. A same-step JS comment containing that literal could satisfy the assertion.
Concrete fix: Replace the substring assertion with a multiline regex anchored to the live assignment, e.g. matching const msg = (process.env.CODEX_FINAL_MESSAGE || "").trim();.

Performance

No findings.

Severity: N/A
Impact: Test-only changes; no library runtime path affected.
Concrete fix: N/A

Maintainability

No findings.

Severity: N/A
Impact: The helper mirrors existing step-block extraction style and keeps assertions localized.
Concrete fix: N/A

Tech Debt

No findings.

Severity: N/A
Impact: The removed TODO row appears to be addressed by the new TestWorkflowCodexActionContract coverage.
Concrete fix: N/A

Security

No findings.

Severity: N/A
Impact: No secrets or production workflow behavior changes were introduced.
Concrete fix: N/A

Documentation/Tests

No blocking findings.

Severity: P3
Impact: Targeted verification could not be run here: pytest is unavailable (pytest: command not found). Static inspection shows the new tests match the checked-out workflow blocks.
Concrete fix: Ensure CI runs tests/test_openai_review.py::TestWorkflowCodexActionContract.

CI Codex R1 P3: this was the one assertion still using a bare substring
while the class comment claims all checks anchor to live lines. Anchor it
to the live `const msg = (process.env.CODEX_FINAL_MESSAGE ...` JS
assignment (start-of-line, MULTILINE) for internal consistency — a
same-step JS comment containing the literal no longer satisfies it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 6, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 784f3a17fa08d8e5f56e8ff2af88276d9b945979


Overall Assessment

✅ Looks good

Executive Summary

  • No estimator, math, weighting, variance/SE, identification, or default-behavior changes.
  • Methodology Registry review found no affected causal method; this PR is test/TODO-only.
  • Prior P3 concern is resolved: the CODEX_FINAL_MESSAGE JS read is now anchored to the live assignment at tests/test_openai_review.py:L2577-L2580.
  • The workflow contract tests bind to specific step blocks and match the current workflow wiring.
  • The TODO cleanup is consistent with the added coverage.
  • I could not execute pytest locally because pytest is not installed.

Methodology

No findings.

Severity: N/A
Impact: No methodology surface is changed.
Concrete fix: N/A

Code Quality

No findings.

Severity: N/A
Impact: The added helper and assertions are scoped to named workflow blocks, matching the existing pattern in TestWorkflowDoesNotExecutePRHeadCode.
Concrete fix: N/A

Performance

No findings.

Severity: N/A
Impact: Test-only changes; no runtime path affected.
Concrete fix: N/A

Maintainability

No findings.

Severity: N/A
Impact: The new tests make the workflow producer/consumer couplings explicit and reduce future drift risk.
Concrete fix: N/A

Tech Debt

No findings.

Severity: N/A
Impact: The removed TODO row is now covered by TestWorkflowCodexActionContract, and the backlog no longer lists the completed item.
Concrete fix: N/A

Security

No findings.

Severity: N/A
Impact: No secrets or workflow permission changes are introduced.
Concrete fix: N/A

Documentation/Tests

No blocking findings.

Severity: P3
Impact: Targeted test execution could not be verified in this environment: pytest is unavailable (pytest: command not found). Static inspection shows the new tests align with .github/workflows/ai_pr_review.yml.
Concrete fix: Ensure CI runs tests/test_openai_review.py::TestWorkflowCodexActionContract.

@igerber igerber added the ready-for-ci Triggers CI test workflows label Jun 6, 2026
@igerber igerber merged commit 36350c9 into main Jun 6, 2026
25 of 26 checks passed
@igerber igerber deleted the test/openai-review-workflow-contract branch June 6, 2026 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant