refactor(crossref_review): consume TSV from mathlib4's post-build lint by kim-em · Pull Request #39 · leanprover-community/mathlib-ci

kim-em · 2026-05-22T08:28:36Z

The mathlib4 cross-reference review pipeline is moving from a text-based extractor running independently in a pull_request_target workflow to an elaboration-based lint step that runs in the build pipeline's post_steps job. The lint step emits a machine-readable TSV (db / tag / declName / file / comment) as an artifact, and a workflow_run-triggered companion downloads it.

This PR rewrites scripts/crossref_review/crossref-pr-comment.py to consume the new TSV instead of driving extraction itself:

Takes --tsv (the artifact path) instead of two checkout paths.
Defensively parses the TSV (unknown databases dropped, missing fields padded). Nothing from PR-derived input is executed.
Fetches snippets directly via Python's urllib — Wikidata via the batched wbgetentities API, Stacks/Kerodon via the documented Gerby /data/tag/<TAG>/content/statement endpoint. Gerby fetches are parallelised with ThreadPoolExecutor(max_workers=8) to keep the workflow under a minute on PRs that touch large files.
Constructs the Markdown comment in trusted code (no longer relies on PR-derived Markdown), still using the ### Cross-reference tags added by this PR sentinel so update_PR_comment.sh can dedupe.
Exit codes unchanged: 0 = no rows, 1 = all resolved, 2 = some missing (fail CI), 3 = transient network only (warn).

Stacked on the companion mathlib4 PR: leanprover-community/mathlib4#39666

🤖 Prepared with Claude Code

When a PR touches `Mathlib/**/*.lean` and adds `@[stacks ...]`, `@[kerodon ...]`, or `@[wikidata ...]` attributes, this workflow posts (or updates, or deletes) a single bot comment with one row per added tag, showing the upstream label / description fetched from the source database, the Mathlib declaration's first-line signature, and the author's optional attribute comment. CI fails when any tag cannot be resolved upstream and emits a warning (not a failure) when only transient network errors occur. Trusted-script execution pattern (same as `PR_summary.yml`): * The PR head is checked out into `pr-branch/` and used as DATA only — no code from it ever runs in this `pull_request_target` context. * The base ref is checked out into `base/`; its `scripts/extract-crossref-tags.lean` and `scripts/crossref-snippet.lean` are the trusted versions we actually invoke, and its `lean-toolchain` is what `lake env lean --run` picks up. * `mathlib-ci` is checked out via the shared `get-mathlib-ci` action; its `scripts/crossref_review/crossref-pr-comment.py` orchestrator (added in leanprover-community/mathlib-ci#39) drives the extract → snippet → format pipeline. * The resulting comment file is piped through `scripts/pr_summary/update_PR_comment.sh` for post-once / update-in-place behaviour. A startup gate exits cleanly when the base ref pre-dates the companion Lean scripts (e.g. against an older master), so the workflow doesn't fail spuriously. `docs/workflows.md` gains a new entry; `scripts/README.md` points at the mathlib-ci location for the orchestrator. Stacked on leanprover-community#39664 (LSP widget) and depends on the orchestrator landing in leanprover-community/mathlib-ci#39. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

kim-em · 2026-05-22T08:36:49Z

Here's what the bot's comment renders as, generated by feeding the orchestrator a synthetic file with one valid + one bogus tag from each database. (No quoting trick — this is the literal contents of comment.md from a real run, dropped into a GitHub comment so you can see how the table actually displays.)

Cross-reference tags added by this PR

Wikidata

Tag	Declaration	Title	Snippet	Author comment
Q11518	`theorem pythagoras : True`	Pythagorean theorem	relation in Euclidean geometry among the three sides of a right triangle	named after Pythagoras
QFAKE99999	`theorem bad : True`	⚠ NOT FOUND		intentionally bogus

Stacks

Tag	Declaration	Title	Snippet	Author comment
09FS	`theorem cohen_first (n : ℕ) [NeZero n] : True`	Example 9.5.2	Example 9.5.2. The characteristic of $\mathbf{F}_ p$ is $p$, and that of $\mathbf{Q}$ is $0$.	First part. We don't require `p` to be a prime in mathlib.
ZZZZ	`theorem worse : True`	⚠ NOT FOUND		also bogus

Kerodon

Tag	Declaration	Title	Snippet	Author comment
0009	`theorem first_notation : True`	Notation 1.1.0.1	Notation 1.1.0.1. For every nonnegative integer $n$, we let $[n]$ denote the linearly ordered set ${ 0 < 1 < 2 < \cdots < n-1 < n } $.

⚠ One or more tags above could not be found upstream. Please verify the tag identifier or remove the attribute.

When a PR touches `Mathlib/**/*.lean` and adds `@[stacks ...]`, `@[kerodon ...]`, or `@[wikidata ...]` attributes, this workflow posts (or updates, or deletes) a single bot comment with one row per added tag, showing the upstream label / description fetched from the source database, the Mathlib declaration's first-line signature, and the author's optional attribute comment. CI fails when any tag cannot be resolved upstream and emits a warning (not a failure) when only transient network errors occur. Trusted-script execution pattern (same as `PR_summary.yml`): * The PR head is checked out into `pr-branch/` and used as DATA only — no code from it ever runs in this `pull_request_target` context. * The base ref is checked out into `base/`; its `scripts/extract-crossref-tags.lean` and `scripts/crossref-snippet.lean` are the trusted versions we actually invoke, and its `lean-toolchain` is what `lake env lean --run` picks up. * `mathlib-ci` is checked out via the shared `get-mathlib-ci` action; its `scripts/crossref_review/crossref-pr-comment.py` orchestrator (added in leanprover-community/mathlib-ci#39) drives the extract → snippet → format pipeline. * The resulting comment file is piped through `scripts/pr_summary/update_PR_comment.sh` for post-once / update-in-place behaviour. A startup gate exits cleanly when the base ref pre-dates the companion Lean scripts (e.g. against an older master), so the workflow doesn't fail spuriously. `docs/workflows.md` gains a new entry; `scripts/README.md` points at the mathlib-ci location for the orchestrator. Stacked on leanprover-community#39664 (LSP widget) and depends on the orchestrator landing in leanprover-community/mathlib-ci#39. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…tory

…prover-community#6) Add a local composite action that mints a GitHub App installation token using OIDC and signing from an Azure Key Vault. This lets us avoid storing private keys as GitHub secrets and use federated credentials to control access to them. To make this fast, we avoid using azure/login + az keyvault key sign (this is very slow) and use straight HTTP calls. Our flow is simple enough that this is manageable without a lot of complexity.

Mint an Azure storage access token via GitHub OIDC for cache uploads. To be used by the mathlib CI.

…unity#12)

…ommunity#8) Add retry-with-backoff for all Zulip API calls, and exit gracefully (instead of crashing) when the API returns errors after retries. Previously, the script would crash with `KeyError: 'messages'` when Zulip rate-limited a `get_messages` call, because the error response has no `messages` key. This caused CI failures on unrelated PRs. Also documents that consuming workflows should use `continue-on-error: true` as a safety net. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…r-community#14) The Zulip instructions tell users to clone mathlib4, but `git checkout nightly-testing` fails in a fresh mathlib4 clone since that branch only exists in mathlib4-nightly-testing. Fix by fetching nightly-testing from the URL first, then creating the local branch if it doesn't exist. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…over-community#15) The regex `[A-Za-z0-9_]*` excluded dots, so `backward.isDefEq.respectTransparency` was reported as just `backward.isDefEq`, and `backward.privateInPublic.warn` was lumped into `backward.privateInPublic`. Add `.` to the character class so the full option name is captured, and add a trailing space to the count pattern to prevent substring overlap between e.g. `backward.privateInPublic` and `backward.privateInPublic.warn`. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…r-community#16) This script created the wrong diffs in lean-community/mathlib4#36982. This bug can be fixed by simply adding `public` to the list of modifiers to be dropped.

…munity#18) This PR makes `merge-lean-testing-pr.sh` create or reuse a real `nightly-testing` remote and ensures the local `nightly-testing` branch tracks `nightly-testing/nightly-testing`. This avoids the broken state where running the script in a fresh `mathlib4` clone leaves a later plain `git push` targeting `origin/nightly-testing` instead of `leanprover-community/mathlib4-nightly-testing`. :robot: Prepared with OpenAI Codex

…ommunity#19) Info messages don't indicate problems: the detailed breakdown in the Zulip report is pure noise (entries like "96 × info: stdout:" or "1 × info: Found git remote for proofwidgets at ..."). The top-line "Info messages: N" summary is retained in the counts bullet list. See https://leanprover.zulipchat.com/#narrow/channel/428973-nightly-testing/topic/Docgen.20status.20updates/near/586226267

…identity (leanprover-community#20) The comment-deduplication filter still searched for comments authored by `leanprover-community-bot`, but comments are now posted via the `mathlib-lean-pr-testing` GitHub App, whose user login is `mathlib-lean-pr-testing[bot]`. The lookup never matched, so the script always fell through to posting a new comment instead of editing the existing one. See https://leanprover.zulipchat.com/#narrow/channel/428973-nightly-testing/topic/Bot.20messages/near/585690908

…eanprover-community#22) The PR summary comment previously advised running `./scripts/reporting/technical-debt-metrics.sh pr_summary`, which was a stale relative path assuming the script was in `mathlib4`. Update both the in-script comment and the printed instructions to give a copy-pasteable recipe that clones `mathlib-ci` as a sibling of `mathlib4` and invokes the script from there. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…rface errors in PR comment (leanprover-community#26) * fix: replay auto commits in isolated worktrees and surface errors in PR comment The commit verification script had several bugs that combined to produce unhelpful PR comments and cascading false-positive failures: - Per-commit replay shared the main checkout, so any untracked file an earlier auto commit's command produced (e.g., from a `> hits.txt` redirect) blocked the next commit's `git checkout --detach $parent` with "untracked working tree files would be overwritten". The "fix" to earlier commit N+1 then reported a confusing checkout error instead of its real failure. - JSON output was hand-rolled with `echo "{ \"...\": \"$subject\" }"` and only escaped quotes. Subjects containing literal newlines (created by some commit-message generators) produced unparseable JSON, which then caused the comment-rendering step to fail silently with `jq: parse error: Invalid escape`. - The comment surfaced no actual error output. Authors saw "❌ Failed" and had to dig through CI logs to learn what went wrong. This rewrite: - Replays each auto commit in a fresh `git worktree add --detach $parent` under a single mktemp'd parent. Worktrees are removed after each iteration, on EXIT (trap), and `git worktree prune` runs at startup so any stale state from interrupted runs is cleared. The same isolation applies to the transient-commit cherry-pick replay. - Captures combined stdout+stderr via `tee` so live CI logs still show the command's output while we also save it for the JSON report. - Builds JSON via `jq -nc --arg ...` so commit subjects with arbitrary bytes (newlines, backslashes, control chars) round-trip safely. - Records a `failure_kind` (`command_failed`, `timed_out`, or `tree_mismatch`) plus the last 4KB / 40 lines of output, and for tree mismatches a truncated `git diff-tree --stat`. The summary script dispatches on `failure_kind` (no keyword matching on error text — the underlying tool's message is the diagnosis) and renders each failed commit as a `<details>` block containing the actual output. - Caps total comment size at 50KB, falling back to "see CI logs" links for late-truncated content. - Replaces the literal `✅`/`❌` escape strings (which only rendered correctly inside `jq` output) with real UTF-8 ✅/❌. - Drops the bash-4 `mapfile` and associative-array dependencies; both scripts now run on bash 3.2+, easing local testing. - Recovers from unparseable JSON: if `verify_commits.sh` crashes before writing the JSON file, the summary script emits a fallback comment pointing to the CI logs rather than producing empty output. Smoke-tested against the failing run for leanprover-community/mathlib4#38480. The new comment shows the actual bash parse error / xargs / sed / git rm output for each commit independently, with no cascading "checkout would overwrite" noise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: allow TIMEOUT_SECONDS to be overridden via environment variable Hard-coded 600s makes the timeout-handling path effectively unreachable from test harnesses that build a synthetic git history and need a quick artificial timeout. Use the same `${VAR:-default}` pattern already used for TRANSIENT_PREFIX et al., so a test scenario can set TIMEOUT_SECONDS=1 and exercise the timed_out failure_kind path in seconds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…y#24) * feat(olean_diff.py): script for comparing oleans and generating a markdown comment * adjust error messages

…nprover-community#25) If we start counting this number as part of the tech debt metrics, this should encourage us to make more effective use of the module system. Tested locally on Mathlib by running `grep '^@\[expose\] | grep -v ...` and building the regex until it matched all the forms of exposed public sections.

Fix the local run instructions of declarations_diff.sh and import_trans_difference.sh so that they point so this repo. This style is already used in technical-debt-metrics.sh, which had the correct instructions.

…er-community#23) leanprover-community#19 removed info messages, but the weekly linting report in both Mathlib and CSLib make use of this (see leanprover-community/mathlib4#37236). Instead, make this an off by default option. I will follow up with Mathlib and CSLib PRs enabling this for the weekly linting reports.

…rover-community#21) This PR adds a counter for `set_option warning.simp.varHead false` occurrences to the weekly tech debt report. The `warning.simp.varHead` option silences a warning introduced in Lean about simp lemmas whose LHS has a metavariable as its head. Places where this warning is silenced are tech debt we want to track over time. Prompted by discussion on leanprover-community/mathlib4-nightly-testing#209 which adds several `set_option warning.simp.varHead false in` uses. 🤖 Prepared with Claude Code Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

leanprover-community#29) * fix: require all CI outcomes to succeed before adding `builds-mathlib` Previously the `builds-mathlib` label was added as soon as `TEST_OUTCOME == "success"`, regardless of whether `LINT_OUTCOME`, `BUILD_OUTCOME`, `NOISY_OUTCOME`, `ARCHIVE_OUTCOME`, or `COUNTEREXAMPLES_OUTCOME` had failed. The success-message branch had the same bug, short-circuiting all the failure-message cases below it. As a result, lean4 PRs whose Mathlib CI ran with a failing `simp` linter step (but a succeeding `test` step) were labelled `builds-mathlib` and shown a green "successfully built against this PR" comment, hiding the lint failure from PR authors and reviewers. This PR requires all six step outcomes (`BUILD`, `NOISY`, `ARCHIVE`, `COUNTEREXAMPLES`, `LINT`, `TEST`) to be `success` before the `builds-mathlib` label is added or the green-check status comment is posted. The `breaks-mathlib` branch already triggered on any failure, so its behaviour is unchanged; the asymmetry is now closed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: drive labels and status comment from one aggregate state Previously the label-mutation block and the comment-message block ran their own outcome classifications. After the previous commit, the label block knew only `all-success` and `any-failure`, while the message block also handled `cancelled` and a fall-through case. That left two real problems: * `cancelled`, `skipped`, or empty outcomes posted a yellow comment but did not touch labels, so a stale `builds-mathlib` or `breaks-mathlib` from an earlier run would persist after a reattempt. * When one outcome was `failure` and another `cancelled`, labels said `breaks-mathlib` but the comment said "was cancelled", hiding the failure. This refactor classifies the run once into one of `success`, `failure`, `cancelled`, or `other`, with `failure` taking precedence over `cancelled`. Both the labels and the message branch off the same state. The non-success/non-failure paths now explicitly remove both `builds-mathlib` and `breaks-mathlib` so labels cannot drift out of sync with reality. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

This PR adds a "level" attribute to the tech debt reports posted weekly on Zulip and under every PR. The intention is that "weak" tech debt is allowed to increase in a PR (when reasonable), while "strong" tech debt should only increase in exceptional cases (for example, Lean updates or because we expose a new source of tech debt). I played around with a few formatting options for the report and settled on this one: show the strong tech debt and then the weak tech debt in two different tables. That should make it clear for the reader (and possible automation) if we care about a particular change or not.

…eanprover-community#31) This category of tech debt will give false positives until leanprover/lean4#13595 is merged (hopefully in the next 2 weeks).

…ver-community#32) The label/comment automation used `curl -L -s ...`, where `-s` swallows both progress output and error responses. When the GitHub App token lacked permission for the target repo (as happened with batteries-pr-testing branches before mathlib4#38945), `curl` would print the API's 4xx body nowhere and exit 0, so the script's `set -e` never tripped and the script silently misreported success ("Removing label X" / "Adding label Y" with no actual API effect). Switch all 7 curl calls to `-sS --fail-with-body`: silent on success, prints the response body and exits non-zero on HTTP errors. Combined with the script's existing `set -euo pipefail`, any future API failure now fails the workflow step loudly with a diagnostic message. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…er-community#33) * chore: route Zulip notifications to nightly-testing-mathlib This PR retargets the regression-report and docgen-status workflows, and the adaptation-PR helper script, from the shared `nightly-testing` channel to the new `nightly-testing-mathlib` channel, in line with the proposal to split `#nightly-testing` into per-project channels for Mathlib, Batteries, and Cslib. 🤖 Prepared with Claude Code * chore: update echo string to reference new channel name Codex review caught that the prose `echo` still referenced the old `#nightly-testing` channel even though the actual `zulip-send` call was already retargeted at `#nightly-testing-mathlib`. 🤖 Prepared with Claude Code

…eanprover-community#34) After leanprover-community#32 added `--fail-with-body` to the curl calls, every label-API 4xx aborts the script under `set -euo pipefail`. That's the desired behaviour for `add_label`, but `remove_label` returns 404 when the label isn't currently on the PR — which is the script's desired post-condition. The whole labelling step then aborts before reaching the `add_label` calls, so PRs end up with no `builds-mathlib` / `breaks-mathlib` label even when the build's outcome is clear. Inspect the HTTP status manually instead and accept 200 or 404; keep `add_label` strict. Co-authored-by: Kim Morrison <kim.morrison@anu.edu.au> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…community#36) In GNU grep BRE, `\|` is the alternation operator, so `grep "\|strong$"` matches every line (the empty alternative matches anything). As a result both `report strong` and `report weak` emitted the full table, and the weekly Zulip post showed every category twice. Use a literal `|` so only rows ending in `|strong` or `|weak` are kept. Co-authored-by: Kim Morrison <kim@lean-fro.org> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nprover-community#35) The technical debt metrics seem to have broken recently, and in the CI log there is a fatal Git error about not detaching `HEAD`. The log does not show which command precisely failed. Looking at the set of error messages that Git produces, this can be traced to `git checkout` (which `git switch` is the modern version of). So replacing `git switch` and `git checkout` with `git switch --detach` should hopefully fix the issue. Example of broken tech debt reports: * https://github.com/leanprover-community/mathlib4/actions/runs/25331716666/job/74266774924#step:11:133 Interestingly `git checkout` in the transitive imports script does not seem to break anything. My suspicion is the `git checkout -q` switch turns the advice message into a fatal error? I'd have to look at the Git source code in more detail since the man page is not very clear. In any case, let's just turn it into `git switch --detach` there too for consistency.

Add `scripts/crossref_review/crossref-pr-comment.py` — the trusted orchestrator driven by mathlib4's `.github/workflows/crossref_review.yml` (introduced in a stacked mathlib4 PR). It runs the Lean extract / snippet scripts from the mathlib4 base checkout against the PR checkout as data, formats a Markdown table of every `@[stacks ...]`, `@[kerodon ...]`, and `@[wikidata ...]` attribute added by the PR, and exits 0 / 1 / 2 / 3 so the workflow can distinguish "nothing to post" / "post and pass" / "post and fail" / "post but treat as a network blip." The script writes its comment body to a file whose first line is the sentinel title used by `scripts/pr_summary/update_PR_comment.sh` to dedupe — so the workflow can pipe it straight through that helper for post-once / update-in-place behaviour. Pure stdlib only, so the CI job can run it without `pip install`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

When a PR touches `Mathlib/**/*.lean` and adds `@[stacks ...]`, `@[kerodon ...]`, or `@[wikidata ...]` attributes, this workflow posts (or updates, or deletes) a single bot comment with one row per added tag, showing the upstream label / description fetched from the source database, the Mathlib declaration's first-line signature, and the author's optional attribute comment. CI fails when any tag cannot be resolved upstream and emits a warning (not a failure) when only transient network errors occur. Trusted-script execution pattern (same as `PR_summary.yml`): * The PR head is checked out into `pr-branch/` and used as DATA only — no code from it ever runs in this `pull_request_target` context. * The base ref is checked out into `base/`; its `scripts/extract-crossref-tags.lean` and `scripts/crossref-snippet.lean` are the trusted versions we actually invoke, and its `lean-toolchain` is what `lake env lean --run` picks up. * `mathlib-ci` is checked out via the shared `get-mathlib-ci` action; its `scripts/crossref_review/crossref-pr-comment.py` orchestrator (added in leanprover-community/mathlib-ci#39) drives the extract → snippet → format pipeline. * The resulting comment file is piped through `scripts/pr_summary/update_PR_comment.sh` for post-once / update-in-place behaviour. A startup gate exits cleanly when the base ref pre-dates the companion Lean scripts (e.g. against an older master), so the workflow doesn't fail spuriously. `docs/workflows.md` gains a new entry; `scripts/README.md` points at the mathlib-ci location for the orchestrator. Stacked on leanprover-community#39664 (LSP widget) and depends on the orchestrator landing in leanprover-community/mathlib-ci#39. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…t step The mathlib4 cross-reference review pipeline is moving from a text-based extractor running independently in a `pull_request_target` workflow to an elaboration-based lint step that runs in the build pipeline's `post_steps` job after `lake exe cache get`. The lint step emits a machine-readable TSV (db / tag / declName / file / comment) as an artifact, and a `workflow_run`-triggered companion downloads it here. This commit rewrites `scripts/crossref_review/crossref-pr-comment.py` so it: * Takes `--tsv` (the path to the artifact) instead of two checkout paths. * Defensively parses the TSV (unknown databases dropped, missing fields padded). Nothing from PR-derived input is executed. * Fetches snippets directly via Python's `urllib` — Wikidata via the batched `wbgetentities` API, Stacks/Kerodon via the documented Gerby `/data/tag/<TAG>/content/statement` endpoint. Gerby fetches are parallelised with `ThreadPoolExecutor(max_workers=8)` to keep the workflow under a minute on PRs that touch large files. * Constructs the Markdown comment in trusted code (no longer relies on PR-derived Markdown), still using the `### Cross-reference tags added by this PR` sentinel so `update_PR_comment.sh` can dedupe. * Exits 0 / 1 / 2 / 3 with the same grammar as before. Stacked on leanprover-community/mathlib4#39666 upcoming force-push, which adds the post-build lint step and rewires `crossref_review.yml` as `workflow_run`-triggered. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replace the text-based cross-reference extractor with an elaboration- based lint step that runs in the build pipeline's `post_steps` job and a `workflow_run`-triggered companion that posts the bot comment. **Why.** The previous text-based extractor (a ~250 LOC byte-level scanner in `scripts/crossref.lean`) was a workaround for not having a built Mathlib environment in the pull_request_target workflow. Now that we run as a post-build lint step, we have a fully elaborated environment and can read `Mathlib.CrossRef.tagExt` directly — cutting ~250 LOC and eliminating the fidelity gap with the Lean parser. **Pieces:** * `scripts/crossref.lean` — the `extract` subcommand is gone. New `check --diff <range> --tsv <path>` subcommand uses `importModules (loadExts := true)` (NOT `withImportModules`, which passes `loadExts := false` and would leave `tagExt` empty), walks `tagExt`, looks up each tagged declaration's source module via `env.getModuleFor?`, and filters by `git diff --name-only` of the PR range. Output is machine-readable TSV (`db\ttag\tdeclName\tfile\tcomment`). No Markdown, no network, no snippets — those are the privileged workflow's job. * `.github/workflows/build_template.yml` — adds three new steps to `post_steps`, after the import-graph upload: - Run `lake env lean --run scripts/crossref.lean check ...` against the diff between the PR base and HEAD. - Write a `crossref-context.json` with PR number + head SHA so the companion workflow doesn't have to rely on the unreliable `workflow_run.pull_requests[0].number`. - Upload both as a single `crossref-context` artifact. * `.github/workflows/crossref_review.yml` — rewritten. Old trigger `pull_request_target` → new trigger `workflow_run`. Downloads the artifact, resolves the PR number through a three-stage fallback chain (artifact JSON → workflow_run payload → `gh pr list` head SHA match), calls the mathlib-ci orchestrator (which now consumes the TSV and fetches snippets in Python), posts/updates/deletes via `update_PR_comment.sh`, and fails the check on `missing` tags. **Privilege model.** The lint step runs with the PR's permissions and produces only machine-readable TSV. The `workflow_run` job runs in the base ref's permission scope (with `pull-requests: write`) and never executes anything from the artifact — it parses the TSV defensively, fetches snippets server-side from three fixed upstream APIs, and builds the Markdown comment in trusted code. Companion mathlib-ci PR: leanprover-community/mathlib-ci#39 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Codex's review of the stacked refactor flagged that `md_escape` only handled pipes and newlines, leaving PR-author comment strings (the optional `"…"` after a `@[wikidata QID "…"]` attribute) free to render arbitrary Markdown / HTML / `@mentions` in the bot's PR comment. * Split into `md_cell_escape` (for upstream-derived snippets — Wikidata labels, Stacks/Kerodon statements — that already passed through the upstream's own sanitiser; we just collapse line breaks and neutralise pipes) and `md_author_escape` (for PR-author comments — backslash- escapes Markdown metacharacters, HTML-entity-escapes angle brackets, inserts a zero-width space after `@` to defang mentions, and neutralises pipes). * `_doc_url(file, decl_name)` replaces the inline string interpolation for the doc-gen link. It does a suffix-only `.lean → .html` rewrite (so `foo.leanblah.lean` is handled) and URL-encodes the path and fragment so unusual characters in declaration names don't break the link. * Wikipedia / Stacks / Kerodon tag URLs are also URL-encoded. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replace the text-based cross-reference extractor with an elaboration- based lint step that runs in the build pipeline's `post_steps` job and a `workflow_run`-triggered companion that posts the bot comment. **Why.** The previous text-based extractor (a ~250 LOC byte-level scanner in `scripts/crossref.lean`) was a workaround for not having a built Mathlib environment in the pull_request_target workflow. Now that we run as a post-build lint step, we have a fully elaborated environment and can read `Mathlib.CrossRef.tagExt` directly — cutting ~250 LOC and eliminating the fidelity gap with the Lean parser. **Pieces:** * `scripts/crossref.lean` — the `extract` subcommand is gone. New `check --diff <range> --tsv <path>` subcommand uses `importModules (loadExts := true)` (NOT `withImportModules`, which passes `loadExts := false` and would leave `tagExt` empty), walks `tagExt`, looks up each tagged declaration's source module via `env.getModuleFor?`, and filters by `git diff --name-only` of the PR range. Output is machine-readable TSV (`db\ttag\tdeclName\tfile\tcomment`). No Markdown, no network, no snippets — those are the privileged workflow's job. * `.github/workflows/build_template.yml` — adds three new steps to `post_steps`, after the import-graph upload: - Run `lake env lean --run scripts/crossref.lean check ...` against the diff between the PR base and HEAD. - Write a `crossref-context.json` with PR number + head SHA so the companion workflow doesn't have to rely on the unreliable `workflow_run.pull_requests[0].number`. - Upload both as a single `crossref-context` artifact. * `.github/workflows/crossref_review.yml` — rewritten. Old trigger `pull_request_target` → new trigger `workflow_run`. Downloads the artifact, resolves the PR number through a three-stage fallback chain (artifact JSON → workflow_run payload → `gh pr list` head SHA match), calls the mathlib-ci orchestrator (which now consumes the TSV and fetches snippets in Python), posts/updates/deletes via `update_PR_comment.sh`, and fails the check on `missing` tags. **Privilege model.** The lint step runs with the PR's permissions and produces only machine-readable TSV. The `workflow_run` job runs in the base ref's permission scope (with `pull-requests: write`) and never executes anything from the artifact — it parses the TSV defensively, fetches snippets server-side from three fixed upstream APIs, and builds the Markdown comment in trusted code. Companion mathlib-ci PR: leanprover-community/mathlib-ci#39 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

kim-em mentioned this pull request May 22, 2026

feat(ci): cross-reference tag review via post-build lint + workflow_run leanprover-community/mathlib4#39666

Open

2 tasks

kim-em marked this pull request as ready for review May 22, 2026 08:32

kim-em force-pushed the crossref-review branch from 68028aa to 5b09178 Compare May 22, 2026 08:35

kim-em force-pushed the crossref-review branch from 5b09178 to fdad732 Compare May 22, 2026 10:02

marcelolynch and others added 19 commits May 22, 2026 10:13

Move nightly-docgen.yml, nightly-regression-report.yml to this reposi…

c462909

…tory

Incorporate changes from Anne's PR

506e9eb

step

6e9ad67

step

f5605f5

step

8966673

change api key variable

7e01f4d

fix: Fix quoting in env variables

2ab44c8

Correct email

47a8ad8

feat: Add action to mint Azure Storage token (leanprover-community#3)

65845ae

Mint an Azure storage access token via GitHub OIDC for cache uploads. To be used by the mathlib CI.

fix: Set pipefail in scripts that pipe builds to tee (leanprover-comm…

e1a6b46

…unity#12)

fix: declarations_diff.sh can't handle public modifier (leanprove…

00cad20

…r-community#16) This script created the wrong diffs in lean-community/mathlib4#36982. This bug can be fixed by simply adding `public` to the list of modifiers to be dropped.

bryangingechen and others added 15 commits May 22, 2026 10:13

feat(olean_diff.py): script for comparing oleans (leanprover-communit…

7f761b0

…y#24) * feat(olean_diff.py): script for comparing oleans and generating a markdown comment * adjust error messages

fix local run instructions (leanprover-community#27)

a30f1a2

Fix the local run instructions of declarations_diff.sh and import_trans_difference.sh so that they point so this repo. This style is already used in technical-debt-metrics.sh, which had the correct instructions.

chore: temporarily mark disabled deprecation lints as weak tech debt (l…

6d26b57

…eanprover-community#31) This category of tech debt will give false positives until leanprover/lean4#13595 is merged (hopefully in the next 2 weeks).

kim-em force-pushed the crossref-review branch from fdad732 to 3c72af0 Compare May 22, 2026 10:15

kim-em changed the title ~~feat: orchestrator for cross-reference tag review bot~~ refactor(crossref_review): consume TSV from mathlib4's post-build lint May 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(crossref_review): consume TSV from mathlib4's post-build lint#39

refactor(crossref_review): consume TSV from mathlib4's post-build lint#39
kim-em wants to merge 36 commits into
leanprover-community:masterfrom
kim-em:crossref-review

kim-em commented May 22, 2026 •

edited

Loading

Uh oh!

kim-em commented May 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

kim-em commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kim-em commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Cross-reference tags added by this PR

Wikidata

Stacks

Kerodon

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

kim-em commented May 22, 2026 •

edited

Loading

kim-em commented May 22, 2026 •

edited

Loading