evalops · haasonsaas · Apr 29, 2026 · Apr 29, 2026 · Apr 29, 2026 · Apr 29, 2026
diff --git a/.github/codex/prompts/ci-failure-triage.md b/.github/codex/prompts/ci-failure-triage.md
@@ -0,0 +1,25 @@
+# EvalOps Codex CI Failure Triage
+
+Investigate the failing GitHub Actions run for this repository and produce a
+minimal fix plan or patch.
+
+Required checks:
+
+- Start from the exact failing run, job, and step. Do not infer from workflow
+  names alone.
+- Fetch failed logs with `gh run view --log-failed` and fall back to the
+  Actions jobs API when the log output is empty.
+- Distinguish stale failures on superseded SHAs from failures on the live PR or
+  `main` tip.
+- Group related failures by root cause and avoid unrelated refactors.
+- If the failure is a workflow issue, inspect path filters, generated workflow
+  surfaces, branch protection expectations, and pinned action policy.
+- If the failure is test or code behavior, run the smallest local reproduction
+  before proposing broader gates.
+
+Output:
+
+- Root cause with run/job evidence.
+- Minimal fix or the exact reason no code change is appropriate.
+- Commands run locally.
+- Remaining CI or review-thread work.
diff --git a/.github/codex/prompts/label-churn-audit.md b/.github/codex/prompts/label-churn-audit.md
@@ -0,0 +1,20 @@
+# EvalOps Codex Label Churn Audit
+
+Audit PR labels that are being added and removed repeatedly by automation.
+
+Required checks:
+
+- Inspect the PR timeline, issue events, workflow runs, bot comments, and
+  repository workflows that can mutate labels.
+- Group label changes by actor, label, timestamp, and likely workflow source.
+- Distinguish intended mutually exclusive labels from automation loops.
+- Check whether human-authored code is expected to be agent-authored in this
+  repo before treating agent labels as suspicious.
+- Identify the smallest durable fix: workflow condition, label ownership rule,
+  branch filter, debounce, or documentation update.
+
+Output:
+
+- A concise timeline of label mutations.
+- The likely source workflow or automation.
+- The durable fix and how to verify it.
diff --git a/.github/codex/prompts/local-traffic-canary.md b/.github/codex/prompts/local-traffic-canary.md
@@ -0,0 +1,22 @@
+# EvalOps Codex Local Traffic Canary
+
+Investigate a failure in local developer tooling, traffic simulation, or
+distributed tracing.
+
+Required checks:
+
+- Start from the failing command and preserve its output.
+- Inspect `AGENTS.md`, Makefile targets, local compose files, traffic profiles,
+  and tracing docs before changing behavior.
+- Prefer dry-run validations first, then dependency-backed local smoke only
+  when Docker and local ports are available.
+- Verify that generated trace IDs, `traceparent`, NATS subjects, and manifest
+  paths match the repo contract.
+- Keep fixes local-tooling focused unless the failure exposes a production
+  contract bug.
+
+Output:
+
+- Failing command and root cause.
+- Patch or precise follow-up if credentials/local services are unavailable.
+- Verification commands that future developers can run.
diff --git a/.github/codex/prompts/post-merge-verify.md b/.github/codex/prompts/post-merge-verify.md
@@ -0,0 +1,21 @@
+# EvalOps Codex Post-Merge Verification
+
+Verify that a recently merged PR is actually healthy on the default branch.
+
+Required checks:
+
+- Identify the merge commit and affected workflows on `main`.
+- Check the latest default-branch GitHub Actions runs, not stale PR checks.
+- For deploy or runtime changes, describe the GitOps or live-state validation
+  path and whether credentials were available.
+- For local tooling, run the relevant local smoke or dry-run target.
+- For tracing/event-bus work, verify trace propagation, subject/catalog
+  alignment, and local simulation manifests.
+- If a follow-up is needed, create or describe a precise issue with acceptance
+  criteria.
+
+Output:
+
+- Healthy / unhealthy / inconclusive status.
+- Evidence links or command outputs summarized in prose.
+- Follow-up PR or issue recommendations.
diff --git a/.github/codex/prompts/pr-review.md b/.github/codex/prompts/pr-review.md
@@ -0,0 +1,28 @@
+# EvalOps Codex PR Review
+
+Review the pull request as an EvalOps maintainer. Focus on defects, behavioral
+regressions, missing tests, generated artifact drift, security footguns, and
+operational risk. Prefer concise findings over broad summaries.
+
+Required checks:
+
+- Inspect the diff against the PR base and identify the affected repos,
+  services, workflows, contracts, generated files, and deployment surfaces.
+- Read any `AGENTS.md` files that apply to changed paths before reviewing.
+- Use live GitHub context when available: PR description, labels, checks,
+  review comments, unresolved review threads, and recent CI failures.
+- For generated code, verify whether the generator or checked-in output is the
+  source of truth before recommending direct edits.
+- For infrastructure or workflow changes, call out whether the change affects
+  labels, branch protection, automation, release trains, or GitOps desired
+  state.
+- For tracing or event-bus changes, verify trace context, subject/catalog
+  alignment, and local simulation coverage.
+
+Output:
+
+- Start with actionable findings ordered by severity.
+- Include file paths and line references when possible.
+- Include a short residual-risk note when the diff looks clean.
+- Do not approve a PR solely because tests pass if unresolved review threads or
+  failing checks remain.
diff --git a/.github/workflow-templates/codex-ci-triage.properties.json b/.github/workflow-templates/codex-ci-triage.properties.json
@@ -0,0 +1,9 @@
+{
+  "name": "Codex CI failure triage",
+  "description": "Manually run Codex against a failed GitHub Actions run and optionally post the fix summary to a PR.",
+  "iconName": "octicon pulse",
+  "categories": [
+    "Automation",
+    "Continuous integration"
+  ]
+}
diff --git a/.github/workflow-templates/codex-ci-triage.yml b/.github/workflow-templates/codex-ci-triage.yml
@@ -0,0 +1,74 @@
+name: Codex CI failure triage
+
+on:
+  workflow_dispatch:
+    inputs:
+      run_id:
+        description: "GitHub Actions run id to triage"
+        required: true
+        type: string
+      pr_number:
+        description: "Optional PR number to comment on"
+        required: false
+        type: string
+
+permissions:
+  contents: write
+  actions: read
+  pull-requests: write
+  issues: write
+
+jobs:
+  triage:
+    runs-on: ubuntu-latest
+    timeout-minutes: 45
+    outputs:
+      final_message: ${{ steps.run-codex.outputs.final-message }}
+    steps:
+      - uses: actions/checkout@v5
+        with:
+          fetch-depth: 0
+
+      - name: Capture failed run evidence
+        env:
+          GH_TOKEN: ${{ github.token }}
+          RUN_ID: ${{ inputs.run_id }}
+        run: |
+          {
+            echo "# GitHub Actions failure"
+            echo
+            gh run view "${RUN_ID}" --repo "${GITHUB_REPOSITORY}" --json url,name,displayTitle,event,headBranch,headSha,conclusion,createdAt,updatedAt
+            echo
+            gh run view "${RUN_ID}" --repo "${GITHUB_REPOSITORY}" --log-failed || true
+          } > codex-ci-evidence.md
+
+      - name: Run Codex CI triage
+        id: run-codex
+        uses: openai/codex-action@5c3f4ccdb2b8790f73d6b21751ac00e602aa0c02
+        with:
+          openai-api-key: ${{ secrets.OPENAI_API_KEY }}
+          prompt: |
+            Investigate the failure described in codex-ci-evidence.md. Start
+            from the exact failed run/job/step, distinguish stale failures from
+            live failures, and make the smallest safe patch when appropriate.
+            Report commands run and remaining CI or review-thread work.
+          codex-args: '["--full-auto"]'
+          output-file: codex-ci-triage.md
+          safety-strategy: drop-sudo
+          sandbox: workspace-write
+
+      - name: Post triage summary
+        if: ${{ inputs.pr_number != '' && steps.run-codex.outputs.final-message != '' }}
+        uses: actions/github-script@v7
+        env:
+          CODEX_FINAL_MESSAGE: ${{ steps.run-codex.outputs.final-message }}
+          PR_NUMBER: ${{ inputs.pr_number }}
+        with:
+          github-token: ${{ github.token }}
+          script: |
+            await github.rest.issues.createComment({
+              owner: context.repo.owner,
+              repo: context.repo.repo,
+              issue_number: Number(process.env.PR_NUMBER),
+              body: process.env.CODEX_FINAL_MESSAGE,
+            });
diff --git a/.github/workflow-templates/codex-label-churn-audit.properties.json b/.github/workflow-templates/codex-label-churn-audit.properties.json
@@ -0,0 +1,9 @@
+{
+  "name": "Codex label churn audit",
+  "description": "Have Codex inspect PR label mutation events and identify automation loops.",
+  "iconName": "octicon tag",
+  "categories": [
+    "Automation",
+    "Code review"
+  ]
+}
diff --git a/.github/workflow-templates/codex-label-churn-audit.yml b/.github/workflow-templates/codex-label-churn-audit.yml
@@ -0,0 +1,68 @@
+name: Codex label churn audit
+
+on:
+  workflow_dispatch:
+    inputs:
+      pr_number:
+        description: "Pull request number to audit"
+        required: true
+        type: string
+
+permissions:
+  contents: read
+  pull-requests: read
+  issues: write
+
+jobs:
+  audit:
+    runs-on: ubuntu-latest
+    timeout-minutes: 30
+    steps:
+      - uses: actions/checkout@v5
+        with:
+          fetch-depth: 0
+
+      - name: Capture label timeline
+        env:
+          GH_TOKEN: ${{ github.token }}
+          PR_NUMBER: ${{ inputs.pr_number }}
+        run: |
+          {
+            echo "# Label timeline"
+            echo
+            gh api "repos/${GITHUB_REPOSITORY}/issues/${PR_NUMBER}/events" --paginate
+            echo
+            echo "# Workflows that mention labels"
+            rg -n "add-label|remove-label|gh pr edit|issues.addLabels|issues.removeLabel|labels" .github/workflows scripts || true
+          } > codex-label-churn-evidence.md
+
+      - name: Run Codex label audit
+        id: run-codex
+        uses: openai/codex-action@5c3f4ccdb2b8790f73d6b21751ac00e602aa0c02
+        with:
+          openai-api-key: ${{ secrets.OPENAI_API_KEY }}
+          prompt: |
+            Audit codex-label-churn-evidence.md. Identify which automation is
+            adding and removing labels, whether the churn is intentional, and
+            the smallest durable fix. Remember that EvalOps human committed
+            code is usually LLM-authored, so agent-authorship labels should not
+            be treated as suspicious by default.
+          output-file: codex-label-churn-audit.md
+          safety-strategy: drop-sudo
+          sandbox: read-only
+
+      - name: Comment with audit
+        if: ${{ steps.run-codex.outputs.final-message != '' }}
+        uses: actions/github-script@v7
+        env:
+          CODEX_FINAL_MESSAGE: ${{ steps.run-codex.outputs.final-message }}
+          PR_NUMBER: ${{ inputs.pr_number }}
+        with:
+          github-token: ${{ github.token }}
+          script: |
+            await github.rest.issues.createComment({
+              owner: context.repo.owner,
+              repo: context.repo.repo,
+              issue_number: Number(process.env.PR_NUMBER),
+              body: process.env.CODEX_FINAL_MESSAGE,
+            });
diff --git a/.github/workflow-templates/codex-post-merge-verify.properties.json b/.github/workflow-templates/codex-post-merge-verify.properties.json
@@ -0,0 +1,9 @@
+{
+  "name": "Codex post-merge verification",
+  "description": "Have Codex inspect recent main-branch runs and summarize post-merge health.",
+  "iconName": "octicon checklist",
+  "categories": [
+    "Automation",
+    "Continuous integration"
+  ]
+}
diff --git a/.github/workflow-templates/codex-post-merge-verify.yml b/.github/workflow-templates/codex-post-merge-verify.yml
@@ -0,0 +1,70 @@
+name: Codex post-merge verification
+
+on:
+  workflow_dispatch:
+    inputs:
+      merge_sha:
+        description: "Merge commit or main-branch SHA to verify"
+        required: false
+        type: string
+  schedule:
+    - cron: "37 */6 * * *"
+
+permissions:
+  contents: read
+  actions: read
+  issues: write
+
+jobs:
+  verify:
+    runs-on: ubuntu-latest
+    timeout-minutes: 30
+    steps:
+      - uses: actions/checkout@v5
+        with:
+          fetch-depth: 0
+
+      - name: Capture main-branch evidence
+        env:
+          GH_TOKEN: ${{ github.token }}
+          MERGE_SHA: ${{ inputs.merge_sha }}
+        run: |
+          {
+            echo "# Default-branch verification"
+            echo
+            echo "repository=${GITHUB_REPOSITORY}"
+            echo "merge_sha=${MERGE_SHA:-${GITHUB_SHA}}"
+            echo
+            gh run list --repo "${GITHUB_REPOSITORY}" --branch main --limit 20 \
+              --json databaseId,name,event,status,conclusion,headSha,createdAt,updatedAt,url
+          } > codex-post-merge-evidence.md
+
+      - name: Run Codex verifier
+        id: run-codex
+        uses: openai/codex-action@5c3f4ccdb2b8790f73d6b21751ac00e602aa0c02
+        with:
+          openai-api-key: ${{ secrets.OPENAI_API_KEY }}
+          prompt: |
+            Verify default-branch health using codex-post-merge-evidence.md and
+            the repository's local guidance. Decide whether the latest main
+            state is healthy, unhealthy, or inconclusive. If unhealthy, propose
+            the smallest follow-up with acceptance criteria.
+          output-file: codex-post-merge-verify.md
+          safety-strategy: drop-sudo
+          sandbox: read-only
+
+      - name: Publish verification report
+        if: ${{ steps.run-codex.outputs.final-message != '' }}
+        env:
+          GH_TOKEN: ${{ github.token }}
+        run: |
+          title="[codex] Post-merge verification"
+          if issue_number="$(gh issue list --state open --search "\"${title}\" in:title" --limit 1 --json number --jq '.[0].number // empty')" && [ -n "${issue_number}" ]; then
+            gh issue comment "${issue_number}" --body-file codex-post-merge-verify.md
+          else
+            gh issue create --title "${title}" --body-file codex-post-merge-verify.md
+          fi
+
+      - name: Append report to summary
+        if: ${{ always() && hashFiles('codex-post-merge-verify.md') != '' }}
+        run: cat codex-post-merge-verify.md >> "${GITHUB_STEP_SUMMARY}"
diff --git a/.github/workflow-templates/codex-pr-review.properties.json b/.github/workflow-templates/codex-pr-review.properties.json
@@ -0,0 +1,10 @@
+{
+  "name": "Codex pull request review",
+  "description": "Run OpenAI Codex on PRs with EvalOps review guidance and post the findings back to the thread.",
+  "iconName": "octicon code-review",
+  "categories": [
+    "Automation",
+    "Code review",
+    "Continuous integration"
+  ]
+}