test: pin AI-review workflow Codex-action contract by igerber · Pull Request #528 · igerber/diff-diff

igerber · 2026-06-06T12:06:14Z

Summary

Add TestWorkflowCodexActionContract to tests/test_openai_review.py (7 tests) pinning the AI-review workflow (.github/workflows/ai_pr_review.yml) Codex-action wiring as producer↔consumer couplings, not just string presence:
- openai/codex-action@v1 pin + prompt-file input (scoped to the Run Codex step)
- compiled-prompt path agrees between the build step (PROMPT=) and the action input (prompt-file:)
- final-message output wired from the id:-tagged Codex step → CODEX_FINAL_MESSAGE env → JS body of the post-comment step (the ref id must equal the Codex step's actual id:)
- all three unified-diff exclude pathspecs (real-data JSON/CSV + notebook .ipynb)
- comment markers + the invariant that the prev-review fetch filter is a prefix shared by the canonical and rerun markers
Each assertion binds to its specific step block (via a _step_block helper mirroring the sibling TestWorkflowDoesNotExecutePRHeadCode._extract_step_block) and anchors to the live key line / JS assignment (start-of-line, MULTILINE), so a literal left in a same-step comment can't make the pin pass spuriously.
Close the TODO row "AI review CI: pin workflow contract via test" (Revert AI review CI to Codex + gpt-5.4 (reverts #404, #415) #416): removed from the Testing/Docs table and the Tier B backlog.

Methodology references (required if estimator / math changes)

N/A — no estimator / math / methodology changes. Test + TODO only.

Validation

Tests added/updated: tests/test_openai_review.py — new TestWorkflowCodexActionContract (7 tests); full file 250 passed.
Falsifiability mutation-verified: anchored checks reject commented-out / drifted keys, markers, and broken id: wiring that bare-substring checks would have accepted.
3-round local AI review (codex / gpt-5.5 @ xhigh) converged fully clean (no findings in any category).
No CHANGELOG entry (matches test-only precedent chore: add CI integrity check for docs/doc-deps.yaml #519/ci: only re-run gated matrices on ready-for-ci label transitions #509); no doc-deps mapping (neither path is a public source).

Security / privacy

Confirm no secrets/PII in this PR: Yes

🤖 Generated with Claude Code

Add TestWorkflowCodexActionContract to tests/test_openai_review.py pinning the load-bearing wiring of .github/workflows/ai_pr_review.yml as producer<->consumer couplings (not just string presence): - openai/codex-action@v1 pin + prompt-file input - compiled-prompt path agrees between the build step (PROMPT=) and the action input (prompt-file:) - final-message output flows from the id:-tagged Codex step into the post-comment step (broken id => blank comment, caught) - the three unified-diff exclude pathspecs (real-data JSON/CSV + .ipynb) - comment markers + the invariant that the prev-review fetch filter is a prefix shared by both the canonical and rerun markers These cover the elements not already guarded by TestWorkflowPromptHardening (wrapper/close-tag sanitization), TestWorkflowCommentPosting (rerun gates), or TestWorkflowDoesNotExecutePRHeadCode (sandbox: read-only). Closes the TODO row "AI review CI: pin workflow contract via test" (#416); removes it from the Testing/Docs table and the Tier B backlog. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Address local AI-review feedback on the new TestWorkflowCodexActionContract: - P2: assertions scanned the whole workflow text, so drift could pass if a literal appeared in a comment or an unrelated step. Bind each assertion to the specific step block (Run Codex / Build review prompt / Post PR comment / Fetch previous AI review) via a `_step_block` helper that mirrors the sibling TestWorkflowDoesNotExecutePRHeadCode._extract_step_block. The final-message test now asserts the post-comment ref id equals the Codex step's actual `id:` (not just "some `id:` exists somewhere"). Verified: a stray action pin in a comment with a different action in the real step now fails (global check would have passed). - P3: reworded the class docstring "Tracks TODO.md ..." -> "Covers the former TODO.md item ..." since the row is removed in the same change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Address local AI-review R2-P2 (test robustness): within a step block the assertions still used bare substrings, so a literal left in a same-step comment after the real key was commented out / moved could pass spuriously. Anchor each check to the live key line / JS assignment (start-of-line, MULTILINE): `^\s*uses: openai/codex-action@v1$`, `^\s*prompt-file: ...$`, `^\s*PROMPT=...$`, the `^\s*CODEX_FINAL_MESSAGE: ${{ steps.<id>.outputs. final-message }}$` env mapping (plus a check that the JS reads process.env.CODEX_FINAL_MESSAGE), `const marker =`/`const rerunMarker =` JS assignments, and the `.includes("<!-- ai-pr-review:codex:")` filter. The pathspec check now requires a live (non-comment) command line. Verified: a commented-out action pin / marker that the bare-substring check accepted now fails the anchored check. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-06T12:08:12Z

Overall Assessment

✅ Looks good

Executive Summary

No estimator, math, weighting, variance/SE, identification, or default-behavior changes.
Methodology impact: none; docs/methodology/REGISTRY.md does not apply to these test/TODO-only changes.
The new workflow contract tests are scoped to the intended workflow steps and cover the former TODO item.
TODO cleanup is consistent: the workflow-contract test row was removed from both the Testing/Docs table and Tier B backlog.
I could not execute the tests because pytest is not installed in this environment.

Methodology

No findings.

Severity: N/A
Impact: No causal estimators or methodology contracts are changed.
Concrete fix: N/A

Code Quality

No P0/P1 findings.

Severity: P3
Impact: Minor test-hardening opportunity: tests/test_openai_review.py:L2574-L2578 checks process.env.CODEX_FINAL_MESSAGE with a bare substring, while nearby comments say assertions are anchored to live lines. A same-step JS comment containing that literal could satisfy the assertion.
Concrete fix: Replace the substring assertion with a multiline regex anchored to the live assignment, e.g. matching const msg = (process.env.CODEX_FINAL_MESSAGE || "").trim();.

Performance

No findings.

Severity: N/A
Impact: Test-only changes; no library runtime path affected.
Concrete fix: N/A

Maintainability

No findings.

Severity: N/A
Impact: The helper mirrors existing step-block extraction style and keeps assertions localized.
Concrete fix: N/A

Tech Debt

No findings.

Severity: N/A
Impact: The removed TODO row appears to be addressed by the new TestWorkflowCodexActionContract coverage.
Concrete fix: N/A

Security

No findings.

Severity: N/A
Impact: No secrets or production workflow behavior changes were introduced.
Concrete fix: N/A

Documentation/Tests

No blocking findings.

Severity: P3
Impact: Targeted verification could not be run here: pytest is unavailable (pytest: command not found). Static inspection shows the new tests match the checked-out workflow blocks.
Concrete fix: Ensure CI runs tests/test_openai_review.py::TestWorkflowCodexActionContract.

CI Codex R1 P3: this was the one assertion still using a bare substring while the class comment claims all checks anchor to live lines. Anchor it to the live `const msg = (process.env.CODEX_FINAL_MESSAGE ...` JS assignment (start-of-line, MULTILINE) for internal consistency — a same-step JS comment containing the literal no longer satisfies it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-06T12:11:54Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 784f3a17fa08d8e5f56e8ff2af88276d9b945979

Overall Assessment

✅ Looks good

Executive Summary

No estimator, math, weighting, variance/SE, identification, or default-behavior changes.
Methodology Registry review found no affected causal method; this PR is test/TODO-only.
Prior P3 concern is resolved: the CODEX_FINAL_MESSAGE JS read is now anchored to the live assignment at tests/test_openai_review.py:L2577-L2580.
The workflow contract tests bind to specific step blocks and match the current workflow wiring.
The TODO cleanup is consistent with the added coverage.
I could not execute pytest locally because pytest is not installed.

Methodology

No findings.

Severity: N/A
Impact: No methodology surface is changed.
Concrete fix: N/A

Code Quality

No findings.

Severity: N/A
Impact: The added helper and assertions are scoped to named workflow blocks, matching the existing pattern in TestWorkflowDoesNotExecutePRHeadCode.
Concrete fix: N/A

Performance

No findings.

Severity: N/A
Impact: Test-only changes; no runtime path affected.
Concrete fix: N/A

Maintainability

No findings.

Severity: N/A
Impact: The new tests make the workflow producer/consumer couplings explicit and reduce future drift risk.
Concrete fix: N/A

Tech Debt

No findings.

Severity: N/A
Impact: The removed TODO row is now covered by TestWorkflowCodexActionContract, and the backlog no longer lists the completed item.
Concrete fix: N/A

Security

No findings.

Severity: N/A
Impact: No secrets or workflow permission changes are introduced.
Concrete fix: N/A

Documentation/Tests

No blocking findings.

Severity: P3
Impact: Targeted test execution could not be verified in this environment: pytest is unavailable (pytest: command not found). Static inspection shows the new tests align with .github/workflows/ai_pr_review.yml.
Concrete fix: Ensure CI runs tests/test_openai_review.py::TestWorkflowCodexActionContract.

igerber and others added 3 commits June 6, 2026 07:51

igerber added the ready-for-ci Triggers CI test workflows label Jun 6, 2026

igerber merged commit 36350c9 into main Jun 6, 2026
25 of 26 checks passed

igerber deleted the test/openai-review-workflow-contract branch June 6, 2026 15:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: pin AI-review workflow Codex-action contract#528

test: pin AI-review workflow Codex-action contract#528
igerber merged 4 commits into
mainfrom
test/openai-review-workflow-contract

igerber commented Jun 6, 2026

Uh oh!

github-actions Bot commented Jun 6, 2026

Uh oh!

github-actions Bot commented Jun 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Jun 6, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented Jun 6, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

github-actions Bot commented Jun 6, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant