From 1af7d7429a4ed54f314267bf603fc496eff931ba Mon Sep 17 00:00:00 2001 From: Pierre Wizla <4233866+pwizla@users.noreply.github.com> Date: Thu, 16 Apr 2026 18:23:32 +0200 Subject: [PATCH 1/4] Add explicit rule: never write to strapi/strapi Prompt-level guardrail to prevent the Claude agent from making any write API calls to strapi/strapi via the GH_TOKEN. Co-Authored-By: Claude Opus 4.6 (1M context) --- .github/prompts/docs-self-healing.md | 1 + 1 file changed, 1 insertion(+) diff --git a/.github/prompts/docs-self-healing.md b/.github/prompts/docs-self-healing.md index dc8554edea..1ee7c554ad 100644 --- a/.github/prompts/docs-self-healing.md +++ b/.github/prompts/docs-self-healing.md @@ -186,3 +186,4 @@ After processing all PRs (or if none qualify), write a JSON summary to `/tmp/sel - **Max 5 PRs per run** — log skipped PRs to stdout - **Max 3000 lines per diff** — skip and log oversized diffs - **Never modify workflow files, configuration files, or sidebars.js** +- **NEVER run any write operation on strapi/strapi** — no issues, no comments, no PRs, no pushes, no API calls that modify state. The GH_TOKEN has write access but this workflow ONLY writes to strapi/documentation. Read-only access to strapi/strapi (diffs, PR bodies) is the only permitted use. From 232cece1263e3fa43265696b0a4d4de09f5f390b Mon Sep 17 00:00:00 2001 From: Pierre Wizla <4233866+pwizla@users.noreply.github.com> Date: Thu, 16 Apr 2026 21:26:45 +0200 Subject: [PATCH 2/4] Move PR filtering and idempotency check before Claude The workflow now handles title-based filtering (chore, test, docs, security, translations, typos) and idempotency checks in bash before launching Claude. This avoids spending tokens on PRs that never need documentation. Claude receives only the pre-filtered list via $FILTERED_PRS and starts directly at diff fetching + Router. Based on analysis of 100+ merged strapi/strapi PR titles. Co-Authored-By: Claude Opus 4.6 (1M context) --- .github/prompts/docs-self-healing.md | 51 +++++---------- .github/workflows/docs-self-healing.yml | 82 ++++++++++++++++++++----- 2 files changed, 83 insertions(+), 50 deletions(-) diff --git a/.github/prompts/docs-self-healing.md b/.github/prompts/docs-self-healing.md index 1ee7c554ad..5174caaea5 100644 --- a/.github/prompts/docs-self-healing.md +++ b/.github/prompts/docs-self-healing.md @@ -7,41 +7,24 @@ Your job is to process each PR merged into `strapi/strapi` in the last 24 hours - `$STRAPI_SOURCE` — local checkout of `strapi/strapi` (read-only, for diffs) - `$DOC_REPO` — local checkout of `strapi/documentation` (read + write, for creating PRs) +- `$FILTERED_PRS` — JSON array of pre-filtered PRs (chores, CI, deps, tests already excluded by the workflow) - GitHub CLI (`gh`) is authenticated via `GH_TOKEN` - Model: `claude-sonnet-4-6` (set in the workflow YAML; optimized for cost on batch automation) -## Step 1 — Identify merged PRs (last 24 hours) +## Step 1 — Read the pre-filtered PR list -Use the GitHub API to list PRs merged into `develop` in the last 24 hours: +The workflow has already fetched and filtered merged PRs. The list is in `$FILTERED_PRS` as a JSON array: -```bash -gh api repos/strapi/strapi/pulls \ - --jq '[.[] | select(.merged_at != null and .base.ref == "develop") | {number, title, body, merged_at, html_url}]' \ - -f state=closed \ - -f sort=updated \ - -f direction=desc \ - -f per_page=50 +```json +[{"number": 12345, "title": "feat: add feature X", "html_url": "https://github.com/strapi/strapi/pull/12345"}, ...] ``` -Filter to only those whose `merged_at` is within the last 24 hours. +Parse this list. **Do NOT re-fetch PRs from the GitHub API** — the workflow already did that. **Rate limit:** Process a maximum of 1 PR per run (testing mode). If more qualify, log the skipped ones to stdout and they will be picked up on the next run. -## Step 2 — Check idempotency (per PR) - -Before processing each PR, check if a doc PR already exists for it by searching -the body of open PRs with the `auto-doc-healing` label: - -```bash -gh pr list --repo strapi/documentation --label auto-doc-healing --state all \ - --json body --jq '.[].body' | grep -q "strapi/strapi/pull/" -``` - -If a match is found, skip the PR entirely. This ensures the workflow is idempotent -and recovers gracefully from partial failures. - -## Step 3 — Get PR context (per PR) +## Step 2 — Get PR context (per PR) For each PR to process, fetch the description and the diff: @@ -53,7 +36,7 @@ gh api repos/strapi/strapi/pulls/.diff > /tmp/pr-.diff **Diff size threshold:** If the diff exceeds 3000 lines, skip this PR and log: "PR # skipped — diff too large (X lines), flag for manual /autodoc". -## Step 4 — Run the Router (per PR) +## Step 3 — Run the Router (per PR) **Read these files once at the start of the run** (not per PR): - Router prompt: `$DOC_REPO/agents/prompts/router.md` @@ -61,18 +44,19 @@ gh api repos/strapi/strapi/pulls/.diff > /tmp/pr-.diff - Page index: `$DOC_REPO/docusaurus/static/llms.txt` Then, for each PR, apply the Router logic using: -- PR title and description from Step 3 -- The diff from Step 3 +- PR title and description from Step 2 +- The diff from Step 2 The Router will produce a YAML `targets` block. **Skip the PR if:** - The Router finds no targets - The Router sets `ask_user` (log the question to stdout for manual handling) -- The PR is purely: tests, dependency bumps, internal refactors, chore commits, CI changes, - translations, typo fixes in code comments -## Step 5 — Run the documentation pipeline (per PR with targets) +Note: chores, CI, deps, tests, and translations are already filtered out by the workflow +before Claude runs. The Router only sees PRs that passed the pre-filter. + +## Step 4 — Run the documentation pipeline (per PR with targets) For each PR where the Router identified targets, run the Create/Update Mode pipeline. @@ -111,7 +95,7 @@ Authoring guides are small and target-specific — read them per target, not upf **Templates:** For `create_page` targets, load the relevant template from `$DOC_REPO/agents/templates/` based on the Router's `doc_type`. -## Step 6 — Create branch and draft PR (per PR) +## Step 5 — Create branch and draft PR (per PR) After the Drafter has produced output for all targets: @@ -151,7 +135,7 @@ git clean -fd git reset --hard origin/main ``` -## Step 7 — Write run summary +## Step 6 — Write run summary After processing all PRs (or if none qualify), write a JSON summary to `/tmp/self-healing-summary.json`: @@ -164,9 +148,6 @@ After processing all PRs (or if none qualify), write a JSON summary to `/tmp/sel {"number": 12346, "title": "Fix typo in test", "reason": "Router: no doc update needed"}, {"number": 12347, "title": "Massive refactor", "reason": "Diff too large (4200 lines)"} ], - "already_processed": [ - {"number": 12340, "title": "Update middleware", "reason": "Existing PR found with auto-doc-healing label"} - ], "errors": [ {"number": 12348, "title": "Add plugin Y", "error": "Drafter failed after retry"} ] diff --git a/.github/workflows/docs-self-healing.yml b/.github/workflows/docs-self-healing.yml index be2340bafd..aece6c4696 100644 --- a/.github/workflows/docs-self-healing.yml +++ b/.github/workflows/docs-self-healing.yml @@ -29,25 +29,85 @@ jobs: token: ${{ secrets.PAT_TOKEN_PIWI }} fetch-depth: 50 - - name: Check for merged PRs in last 24 hours + - name: List and filter merged PRs in last 24 hours id: check-prs env: GH_TOKEN: ${{ secrets.PAT_TOKEN_PIWI }} run: | echo "Checking for PRs merged into strapi/strapi develop in the last 24h..." SINCE=$(date -u -d '24 hours ago' '+%Y-%m-%dT%H:%M:%SZ' 2>/dev/null || date -u -v-24H '+%Y-%m-%dT%H:%M:%SZ') - PR_COUNT=$(gh api search/issues \ + + # Fetch all merged PRs with metadata + gh api search/issues \ --method GET \ -f q="repo:strapi/strapi is:pr is:merged base:develop merged:>=$SINCE" \ -f per_page=50 \ - --jq '.total_count') - echo "Found $PR_COUNT merged PRs in the last 24h" - if [ "$PR_COUNT" -eq 0 ]; then + --jq '.items | [.[] | {number, title, html_url: .pull_request.html_url}]' > /tmp/all-prs.json + + TOTAL=$(jq 'length' /tmp/all-prs.json) + echo "Found $TOTAL merged PRs in the last 24h" + + # Filter out PRs that never need documentation. + # Based on analysis of 100+ merged PR titles in strapi/strapi. + # + # EXCLUDED: chore(*), test(*), docs:, security: package, + # *translation(s), *typo, Remove/Update yarn/README + # + # KEPT (Router decides): feat, fix, enhancement, and anything else + jq '[.[] | select( + (.title | test("^chore"; "i") | not) and + (.title | test("^test[:(\\s]"; "i") | not) and + (.title | test("^docs:"; "i") | not) and + (.title | test("^security: package"; "i") | not) and + (.title | test("translation[s]?$"; "i") | not) and + (.title | test("typo"; "i") | not) and + (.title | test("^(Remove|Update) (yarn|README)"; "i") | not) + )]' /tmp/all-prs.json > /tmp/filtered-prs.json + + FILTERED=$(jq 'length' /tmp/filtered-prs.json) + EXCLUDED=$((TOTAL - FILTERED)) + echo "After title filtering: $FILTERED candidates ($EXCLUDED excluded as chore/CI/deps/test)" + + # Check idempotency: remove PRs that already have a doc PR + EXISTING_BODIES=$(gh pr list --repo strapi/documentation --label auto-doc-healing --state all \ + --json body --jq '.[].body' 2>/dev/null || echo "") + + jq --arg bodies "$EXISTING_BODIES" '[.[] | select( + ($bodies | contains("strapi/strapi/pull/" + (.number | tostring))) | not + )]' /tmp/filtered-prs.json > /tmp/new-prs.json + + ALREADY=$((FILTERED - $(jq 'length' /tmp/new-prs.json))) + FINAL=$(jq 'length' /tmp/new-prs.json) + + if [ "$ALREADY" -gt 0 ]; then + echo "Idempotency: $ALREADY PRs already have a doc PR, skipped" + fi + + echo "Final candidates for Claude: $FINAL" + + # Set outputs based on final filtered list + if [ "$FINAL" -eq 0 ]; then echo "has_prs=false" >> $GITHUB_OUTPUT else echo "has_prs=true" >> $GITHUB_OUTPUT fi + # Pass the final list to Claude + { + echo "pr_list<> $GITHUB_OUTPUT + + # Log pre-filtering results in summary + echo "### Pre-filtering results" >> $GITHUB_STEP_SUMMARY + echo "" >> $GITHUB_STEP_SUMMARY + echo "- **Total merged PRs (24h):** $TOTAL" >> $GITHUB_STEP_SUMMARY + echo "- **Excluded (chore/CI/deps/test/typo):** $EXCLUDED" >> $GITHUB_STEP_SUMMARY + echo "- **Already processed (idempotency):** $ALREADY" >> $GITHUB_STEP_SUMMARY + echo "- **Candidates for Claude:** $FINAL" >> $GITHUB_STEP_SUMMARY + echo "" >> $GITHUB_STEP_SUMMARY + - name: Load prompt from file if: steps.check-prs.outputs.has_prs == 'true' id: load-prompt @@ -73,6 +133,7 @@ jobs: STRAPI_SOURCE: ${{ github.workspace }}/.strapi-source DOC_REPO: ${{ github.workspace }} GH_TOKEN: ${{ secrets.PAT_TOKEN_PIWI }} + FILTERED_PRS: ${{ steps.check-prs.outputs.pr_list }} - name: Save Claude log as artifact if: always() && steps.check-prs.outputs.has_prs == 'true' @@ -121,15 +182,6 @@ jobs: echo "" >> $GITHUB_STEP_SUMMARY fi - # Already processed - ALREADY=$(jq -r '.already_processed | length' "$SUMMARY_FILE") - if [ "$ALREADY" -gt 0 ]; then - echo "### Already processed ($ALREADY)" >> $GITHUB_STEP_SUMMARY - echo "" >> $GITHUB_STEP_SUMMARY - jq -r '.already_processed[] | "- strapi/strapi#\(.number) — \(.title): \(.reason)"' "$SUMMARY_FILE" >> $GITHUB_STEP_SUMMARY - echo "" >> $GITHUB_STEP_SUMMARY - fi - # Errors ERRORS=$(jq -r '.errors | length' "$SUMMARY_FILE") if [ "$ERRORS" -gt 0 ]; then @@ -140,6 +192,6 @@ jobs: fi # Totals - TOTAL=$((PROCESSED + SKIPPED + ALREADY + ERRORS)) + TOTAL=$((PROCESSED + SKIPPED + ERRORS)) echo "---" >> $GITHUB_STEP_SUMMARY echo "**Total PRs scanned:** $TOTAL | **Doc PRs created:** $PROCESSED | **Skipped:** $SKIPPED | **Errors:** $ERRORS" >> $GITHUB_STEP_SUMMARY From 66fc0fa395a623fccd0c03d9e6c852b2ff43d0aa Mon Sep 17 00:00:00 2001 From: Pierre Wizla <4233866+pwizla@users.noreply.github.com> Date: Thu, 16 Apr 2026 21:27:39 +0200 Subject: [PATCH 3/4] Detail pre-filtering in summary and remove artifact step The summary now lists every PR by name: excluded by title filter (strikethrough), already processed (idempotency), and candidates sent to Claude (bold). Removed the artifact upload step. Co-Authored-By: Claude Opus 4.6 (1M context) --- .github/workflows/docs-self-healing.yml | 46 ++++++++++++++++--------- 1 file changed, 30 insertions(+), 16 deletions(-) diff --git a/.github/workflows/docs-self-healing.yml b/.github/workflows/docs-self-healing.yml index aece6c4696..d0a135eb10 100644 --- a/.github/workflows/docs-self-healing.yml +++ b/.github/workflows/docs-self-healing.yml @@ -99,15 +99,38 @@ jobs: echo "PR_LIST_EOF" } >> $GITHUB_OUTPUT - # Log pre-filtering results in summary - echo "### Pre-filtering results" >> $GITHUB_STEP_SUMMARY - echo "" >> $GITHUB_STEP_SUMMARY - echo "- **Total merged PRs (24h):** $TOTAL" >> $GITHUB_STEP_SUMMARY - echo "- **Excluded (chore/CI/deps/test/typo):** $EXCLUDED" >> $GITHUB_STEP_SUMMARY - echo "- **Already processed (idempotency):** $ALREADY" >> $GITHUB_STEP_SUMMARY - echo "- **Candidates for Claude:** $FINAL" >> $GITHUB_STEP_SUMMARY + # Log detailed pre-filtering results in summary + echo "### Pre-filtering ($TOTAL merged PRs in 24h)" >> $GITHUB_STEP_SUMMARY echo "" >> $GITHUB_STEP_SUMMARY + # List excluded PRs + if [ "$EXCLUDED" -gt 0 ]; then + echo "**Excluded by title filter ($EXCLUDED):**" >> $GITHUB_STEP_SUMMARY + echo "" >> $GITHUB_STEP_SUMMARY + jq -r --argjson filtered "$(cat /tmp/filtered-prs.json)" \ + '[.[] | select(.number as $n | $filtered | map(.number) | index($n) | not)] + | .[] | "- ~~#\(.number) — \(.title)~~"' /tmp/all-prs.json >> $GITHUB_STEP_SUMMARY + echo "" >> $GITHUB_STEP_SUMMARY + fi + + # List already processed PRs + if [ "$ALREADY" -gt 0 ]; then + echo "**Already processed ($ALREADY):**" >> $GITHUB_STEP_SUMMARY + echo "" >> $GITHUB_STEP_SUMMARY + jq -r --argjson newprs "$(cat /tmp/new-prs.json)" \ + '[.[] | select(.number as $n | $newprs | map(.number) | index($n) | not)] + | .[] | "- #\(.number) — \(.title) *(doc PR already exists)*"' /tmp/filtered-prs.json >> $GITHUB_STEP_SUMMARY + echo "" >> $GITHUB_STEP_SUMMARY + fi + + # List candidates sent to Claude + if [ "$FINAL" -gt 0 ]; then + echo "**Sent to Claude ($FINAL):**" >> $GITHUB_STEP_SUMMARY + echo "" >> $GITHUB_STEP_SUMMARY + jq -r '.[] | "- **#\(.number) — \(.title)**"' /tmp/new-prs.json >> $GITHUB_STEP_SUMMARY + echo "" >> $GITHUB_STEP_SUMMARY + fi + - name: Load prompt from file if: steps.check-prs.outputs.has_prs == 'true' id: load-prompt @@ -135,15 +158,6 @@ jobs: GH_TOKEN: ${{ secrets.PAT_TOKEN_PIWI }} FILTERED_PRS: ${{ steps.check-prs.outputs.pr_list }} - - name: Save Claude log as artifact - if: always() && steps.check-prs.outputs.has_prs == 'true' - uses: actions/upload-artifact@v4 - with: - name: self-healing-log-${{ github.run_number }} - path: /tmp/self-healing-summary.json - retention-days: 30 - if-no-files-found: warn - - name: Summary if: always() run: | From 5c33452933456d0dbd74f5a377ffe05e7a3dfb15 Mon Sep 17 00:00:00 2001 From: Pierre Wizla <4233866+pwizla@users.noreply.github.com> Date: Thu, 16 Apr 2026 21:39:06 +0200 Subject: [PATCH 4/4] Pre-fetch PR diffs and bodies in bash before Claude New workflow step fetches body and diff for each candidate PR and stores them in /tmp/pr--body.txt and /tmp/pr-.diff. Claude reads from disk instead of calling gh api, saving tool calls. Co-Authored-By: Claude Opus 4.6 (1M context) --- .github/prompts/docs-self-healing.md | 12 ++++++------ .github/workflows/docs-self-healing.yml | 19 +++++++++++++++++++ 2 files changed, 25 insertions(+), 6 deletions(-) diff --git a/.github/prompts/docs-self-healing.md b/.github/prompts/docs-self-healing.md index 5174caaea5..35a0e991a1 100644 --- a/.github/prompts/docs-self-healing.md +++ b/.github/prompts/docs-self-healing.md @@ -24,14 +24,14 @@ Parse this list. **Do NOT re-fetch PRs from the GitHub API** — the workflow al **Rate limit:** Process a maximum of 1 PR per run (testing mode). If more qualify, log the skipped ones to stdout and they will be picked up on the next run. -## Step 2 — Get PR context (per PR) +## Step 2 — Read pre-fetched PR context (per PR) -For each PR to process, fetch the description and the diff: +The workflow has already fetched the body and diff for each PR. Read them from: -```bash -gh api repos/strapi/strapi/pulls/ --jq '.body' > /tmp/pr--body.txt -gh api repos/strapi/strapi/pulls/.diff > /tmp/pr-.diff -``` +- `/tmp/pr--body.txt` — PR description +- `/tmp/pr-.diff` — full diff + +**Do NOT fetch these from the GitHub API** — they are already on disk. **Diff size threshold:** If the diff exceeds 3000 lines, skip this PR and log: "PR # skipped — diff too large (X lines), flag for manual /autodoc". diff --git a/.github/workflows/docs-self-healing.yml b/.github/workflows/docs-self-healing.yml index d0a135eb10..3d414e3a60 100644 --- a/.github/workflows/docs-self-healing.yml +++ b/.github/workflows/docs-self-healing.yml @@ -131,6 +131,25 @@ jobs: echo "" >> $GITHUB_STEP_SUMMARY fi + - name: Pre-fetch PR diffs and bodies + if: steps.check-prs.outputs.has_prs == 'true' + env: + GH_TOKEN: ${{ secrets.PAT_TOKEN_PIWI }} + run: | + for NUMBER in $(jq -r '.[].number' /tmp/new-prs.json); do + echo "Fetching body and diff for PR #$NUMBER..." + gh api "repos/strapi/strapi/pulls/$NUMBER" --jq '.body' > "/tmp/pr-${NUMBER}-body.txt" 2>/dev/null || echo "" > "/tmp/pr-${NUMBER}-body.txt" + gh api "repos/strapi/strapi/pulls/$NUMBER.diff" > "/tmp/pr-${NUMBER}.diff" 2>/dev/null || echo "" > "/tmp/pr-${NUMBER}.diff" + + DIFF_LINES=$(wc -l < "/tmp/pr-${NUMBER}.diff") + echo " PR #$NUMBER: body $(wc -c < "/tmp/pr-${NUMBER}-body.txt") bytes, diff $DIFF_LINES lines" + + # Flag oversized diffs + if [ "$DIFF_LINES" -gt 3000 ]; then + echo " ⚠️ PR #$NUMBER diff exceeds 3000 lines, will be skipped by Claude" + fi + done + - name: Load prompt from file if: steps.check-prs.outputs.has_prs == 'true' id: load-prompt