Skip to content

refactor(crossref_review): consume TSV from mathlib4's post-build lint#39

Open
kim-em wants to merge 36 commits into
leanprover-community:masterfrom
kim-em:crossref-review
Open

refactor(crossref_review): consume TSV from mathlib4's post-build lint#39
kim-em wants to merge 36 commits into
leanprover-community:masterfrom
kim-em:crossref-review

Conversation

@kim-em
Copy link
Copy Markdown
Contributor

@kim-em kim-em commented May 22, 2026

The mathlib4 cross-reference review pipeline is moving from a text-based extractor running independently in a pull_request_target workflow to an elaboration-based lint step that runs in the build pipeline's post_steps job. The lint step emits a machine-readable TSV (db / tag / declName / file / comment) as an artifact, and a workflow_run-triggered companion downloads it.

This PR rewrites scripts/crossref_review/crossref-pr-comment.py to consume the new TSV instead of driving extraction itself:

  • Takes --tsv (the artifact path) instead of two checkout paths.
  • Defensively parses the TSV (unknown databases dropped, missing fields padded). Nothing from PR-derived input is executed.
  • Fetches snippets directly via Python's urllib — Wikidata via the batched wbgetentities API, Stacks/Kerodon via the documented Gerby /data/tag/<TAG>/content/statement endpoint. Gerby fetches are parallelised with ThreadPoolExecutor(max_workers=8) to keep the workflow under a minute on PRs that touch large files.
  • Constructs the Markdown comment in trusted code (no longer relies on PR-derived Markdown), still using the ### Cross-reference tags added by this PR sentinel so update_PR_comment.sh can dedupe.
  • Exit codes unchanged: 0 = no rows, 1 = all resolved, 2 = some missing (fail CI), 3 = transient network only (warn).

Stacked on the companion mathlib4 PR: leanprover-community/mathlib4#39666

🤖 Prepared with Claude Code

kim-em added a commit to kim-em/mathlib4 that referenced this pull request May 22, 2026
When a PR touches `Mathlib/**/*.lean` and adds `@[stacks ...]`,
`@[kerodon ...]`, or `@[wikidata ...]` attributes, this workflow posts (or
updates, or deletes) a single bot comment with one row per added tag,
showing the upstream label / description fetched from the source database,
the Mathlib declaration's first-line signature, and the author's optional
attribute comment. CI fails when any tag cannot be resolved upstream and
emits a warning (not a failure) when only transient network errors occur.

Trusted-script execution pattern (same as `PR_summary.yml`):

* The PR head is checked out into `pr-branch/` and used as DATA only — no
  code from it ever runs in this `pull_request_target` context.
* The base ref is checked out into `base/`; its
  `scripts/extract-crossref-tags.lean` and `scripts/crossref-snippet.lean`
  are the trusted versions we actually invoke, and its `lean-toolchain`
  is what `lake env lean --run` picks up.
* `mathlib-ci` is checked out via the shared `get-mathlib-ci` action;
  its `scripts/crossref_review/crossref-pr-comment.py` orchestrator
  (added in leanprover-community/mathlib-ci#39)
  drives the extract → snippet → format pipeline.
* The resulting comment file is piped through
  `scripts/pr_summary/update_PR_comment.sh` for post-once / update-in-place
  behaviour.

A startup gate exits cleanly when the base ref pre-dates the companion
Lean scripts (e.g. against an older master), so the workflow doesn't
fail spuriously.

`docs/workflows.md` gains a new entry; `scripts/README.md` points at the
mathlib-ci location for the orchestrator.

Stacked on
leanprover-community#39664
(LSP widget) and depends on the orchestrator landing in
leanprover-community/mathlib-ci#39.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kim-em kim-em marked this pull request as ready for review May 22, 2026 08:32
@kim-em kim-em force-pushed the crossref-review branch from 68028aa to 5b09178 Compare May 22, 2026 08:35
kim-em added a commit to kim-em/mathlib4 that referenced this pull request May 22, 2026
When a PR touches `Mathlib/**/*.lean` and adds `@[stacks ...]`,
`@[kerodon ...]`, or `@[wikidata ...]` attributes, this workflow posts (or
updates, or deletes) a single bot comment with one row per added tag,
showing the upstream label / description fetched from the source database,
the Mathlib declaration's first-line signature, and the author's optional
attribute comment. CI fails when any tag cannot be resolved upstream and
emits a warning (not a failure) when only transient network errors occur.

Trusted-script execution pattern (same as `PR_summary.yml`):

* The PR head is checked out into `pr-branch/` and used as DATA only — no
  code from it ever runs in this `pull_request_target` context.
* The base ref is checked out into `base/`; its
  `scripts/extract-crossref-tags.lean` and `scripts/crossref-snippet.lean`
  are the trusted versions we actually invoke, and its `lean-toolchain`
  is what `lake env lean --run` picks up.
* `mathlib-ci` is checked out via the shared `get-mathlib-ci` action;
  its `scripts/crossref_review/crossref-pr-comment.py` orchestrator
  (added in leanprover-community/mathlib-ci#39)
  drives the extract → snippet → format pipeline.
* The resulting comment file is piped through
  `scripts/pr_summary/update_PR_comment.sh` for post-once / update-in-place
  behaviour.

A startup gate exits cleanly when the base ref pre-dates the companion
Lean scripts (e.g. against an older master), so the workflow doesn't
fail spuriously.

`docs/workflows.md` gains a new entry; `scripts/README.md` points at the
mathlib-ci location for the orchestrator.

Stacked on
leanprover-community#39664
(LSP widget) and depends on the orchestrator landing in
leanprover-community/mathlib-ci#39.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kim-em
Copy link
Copy Markdown
Contributor Author

kim-em commented May 22, 2026

Here's what the bot's comment renders as, generated by feeding the orchestrator a synthetic file with one valid + one bogus tag from each database. (No quoting trick — this is the literal contents of comment.md from a real run, dropped into a GitHub comment so you can see how the table actually displays.)


Cross-reference tags added by this PR

Wikidata

Tag Declaration Title Snippet Author comment
Q11518 theorem pythagoras : True Pythagorean theorem relation in Euclidean geometry among the three sides of a right triangle named after Pythagoras
QFAKE99999 theorem bad : True NOT FOUND intentionally bogus

Stacks

Tag Declaration Title Snippet Author comment
09FS theorem cohen_first (n : ℕ) [NeZero n] : True Example 9.5.2 Example 9.5.2. The characteristic of $\mathbf{F}_ p$ is $p$, and that of $\mathbf{Q}$ is $0$. First part. We don't require p to be a prime in mathlib.
ZZZZ theorem worse : True NOT FOUND also bogus

Kerodon

Tag Declaration Title Snippet Author comment
0009 theorem first_notation : True Notation 1.1.0.1 Notation 1.1.0.1. For every nonnegative integer $n$, we let $[n]$ denote the linearly ordered set ${ 0 &lt; 1 &lt; 2 &lt; \cdots &lt; n-1 &lt; n } $.

⚠ One or more tags above could not be found upstream. Please verify the tag identifier or remove the attribute.

kim-em added a commit to kim-em/mathlib4 that referenced this pull request May 22, 2026
When a PR touches `Mathlib/**/*.lean` and adds `@[stacks ...]`,
`@[kerodon ...]`, or `@[wikidata ...]` attributes, this workflow posts (or
updates, or deletes) a single bot comment with one row per added tag,
showing the upstream label / description fetched from the source database,
the Mathlib declaration's first-line signature, and the author's optional
attribute comment. CI fails when any tag cannot be resolved upstream and
emits a warning (not a failure) when only transient network errors occur.

Trusted-script execution pattern (same as `PR_summary.yml`):

* The PR head is checked out into `pr-branch/` and used as DATA only — no
  code from it ever runs in this `pull_request_target` context.
* The base ref is checked out into `base/`; its
  `scripts/extract-crossref-tags.lean` and `scripts/crossref-snippet.lean`
  are the trusted versions we actually invoke, and its `lean-toolchain`
  is what `lake env lean --run` picks up.
* `mathlib-ci` is checked out via the shared `get-mathlib-ci` action;
  its `scripts/crossref_review/crossref-pr-comment.py` orchestrator
  (added in leanprover-community/mathlib-ci#39)
  drives the extract → snippet → format pipeline.
* The resulting comment file is piped through
  `scripts/pr_summary/update_PR_comment.sh` for post-once / update-in-place
  behaviour.

A startup gate exits cleanly when the base ref pre-dates the companion
Lean scripts (e.g. against an older master), so the workflow doesn't
fail spuriously.

`docs/workflows.md` gains a new entry; `scripts/README.md` points at the
mathlib-ci location for the orchestrator.

Stacked on
leanprover-community#39664
(LSP widget) and depends on the orchestrator landing in
leanprover-community/mathlib-ci#39.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kim-em added a commit to kim-em/mathlib4 that referenced this pull request May 22, 2026
When a PR touches `Mathlib/**/*.lean` and adds `@[stacks ...]`,
`@[kerodon ...]`, or `@[wikidata ...]` attributes, this workflow posts (or
updates, or deletes) a single bot comment with one row per added tag,
showing the upstream label / description fetched from the source database,
the Mathlib declaration's first-line signature, and the author's optional
attribute comment. CI fails when any tag cannot be resolved upstream and
emits a warning (not a failure) when only transient network errors occur.

Trusted-script execution pattern (same as `PR_summary.yml`):

* The PR head is checked out into `pr-branch/` and used as DATA only — no
  code from it ever runs in this `pull_request_target` context.
* The base ref is checked out into `base/`; its
  `scripts/extract-crossref-tags.lean` and `scripts/crossref-snippet.lean`
  are the trusted versions we actually invoke, and its `lean-toolchain`
  is what `lake env lean --run` picks up.
* `mathlib-ci` is checked out via the shared `get-mathlib-ci` action;
  its `scripts/crossref_review/crossref-pr-comment.py` orchestrator
  (added in leanprover-community/mathlib-ci#39)
  drives the extract → snippet → format pipeline.
* The resulting comment file is piped through
  `scripts/pr_summary/update_PR_comment.sh` for post-once / update-in-place
  behaviour.

A startup gate exits cleanly when the base ref pre-dates the companion
Lean scripts (e.g. against an older master), so the workflow doesn't
fail spuriously.

`docs/workflows.md` gains a new entry; `scripts/README.md` points at the
mathlib-ci location for the orchestrator.

Stacked on
leanprover-community#39664
(LSP widget) and depends on the orchestrator landing in
leanprover-community/mathlib-ci#39.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kim-em kim-em force-pushed the crossref-review branch from 5b09178 to fdad732 Compare May 22, 2026 10:02
kim-em added a commit to kim-em/mathlib4 that referenced this pull request May 22, 2026
When a PR touches `Mathlib/**/*.lean` and adds `@[stacks ...]`,
`@[kerodon ...]`, or `@[wikidata ...]` attributes, this workflow posts (or
updates, or deletes) a single bot comment with one row per added tag,
showing the upstream label / description fetched from the source database,
the Mathlib declaration's first-line signature, and the author's optional
attribute comment. CI fails when any tag cannot be resolved upstream and
emits a warning (not a failure) when only transient network errors occur.

Trusted-script execution pattern (same as `PR_summary.yml`):

* The PR head is checked out into `pr-branch/` and used as DATA only — no
  code from it ever runs in this `pull_request_target` context.
* The base ref is checked out into `base/`; its
  `scripts/extract-crossref-tags.lean` and `scripts/crossref-snippet.lean`
  are the trusted versions we actually invoke, and its `lean-toolchain`
  is what `lake env lean --run` picks up.
* `mathlib-ci` is checked out via the shared `get-mathlib-ci` action;
  its `scripts/crossref_review/crossref-pr-comment.py` orchestrator
  (added in leanprover-community/mathlib-ci#39)
  drives the extract → snippet → format pipeline.
* The resulting comment file is piped through
  `scripts/pr_summary/update_PR_comment.sh` for post-once / update-in-place
  behaviour.

A startup gate exits cleanly when the base ref pre-dates the companion
Lean scripts (e.g. against an older master), so the workflow doesn't
fail spuriously.

`docs/workflows.md` gains a new entry; `scripts/README.md` points at the
mathlib-ci location for the orchestrator.

Stacked on
leanprover-community#39664
(LSP widget) and depends on the orchestrator landing in
leanprover-community/mathlib-ci#39.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kim-em added a commit to kim-em/mathlib4 that referenced this pull request May 22, 2026
When a PR touches `Mathlib/**/*.lean` and adds `@[stacks ...]`,
`@[kerodon ...]`, or `@[wikidata ...]` attributes, this workflow posts (or
updates, or deletes) a single bot comment with one row per added tag,
showing the upstream label / description fetched from the source database,
the Mathlib declaration's first-line signature, and the author's optional
attribute comment. CI fails when any tag cannot be resolved upstream and
emits a warning (not a failure) when only transient network errors occur.

Trusted-script execution pattern (same as `PR_summary.yml`):

* The PR head is checked out into `pr-branch/` and used as DATA only — no
  code from it ever runs in this `pull_request_target` context.
* The base ref is checked out into `base/`; its
  `scripts/extract-crossref-tags.lean` and `scripts/crossref-snippet.lean`
  are the trusted versions we actually invoke, and its `lean-toolchain`
  is what `lake env lean --run` picks up.
* `mathlib-ci` is checked out via the shared `get-mathlib-ci` action;
  its `scripts/crossref_review/crossref-pr-comment.py` orchestrator
  (added in leanprover-community/mathlib-ci#39)
  drives the extract → snippet → format pipeline.
* The resulting comment file is piped through
  `scripts/pr_summary/update_PR_comment.sh` for post-once / update-in-place
  behaviour.

A startup gate exits cleanly when the base ref pre-dates the companion
Lean scripts (e.g. against an older master), so the workflow doesn't
fail spuriously.

`docs/workflows.md` gains a new entry; `scripts/README.md` points at the
mathlib-ci location for the orchestrator.

Stacked on
leanprover-community#39664
(LSP widget) and depends on the orchestrator landing in
leanprover-community/mathlib-ci#39.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
marcelolynch and others added 19 commits May 22, 2026 10:13
…prover-community#6)

Add a local composite action that mints a GitHub App installation token using OIDC and signing from an Azure Key Vault. This lets us avoid storing private keys as GitHub secrets and use federated credentials to control access to them.

To make this fast, we avoid using azure/login + az keyvault key sign (this is very slow) and use straight HTTP calls. Our flow is simple enough that this is manageable without a lot of complexity.
Mint an Azure storage access token via GitHub OIDC for cache uploads. To be used by the mathlib CI.
…ommunity#8)

Add retry-with-backoff for all Zulip API calls, and exit gracefully
(instead of crashing) when the API returns errors after retries.

Previously, the script would crash with `KeyError: 'messages'` when
Zulip rate-limited a `get_messages` call, because the error response
has no `messages` key. This caused CI failures on unrelated PRs.

Also documents that consuming workflows should use
`continue-on-error: true` as a safety net.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…r-community#14)

The Zulip instructions tell users to clone mathlib4, but
`git checkout nightly-testing` fails in a fresh mathlib4 clone
since that branch only exists in mathlib4-nightly-testing.

Fix by fetching nightly-testing from the URL first, then creating
the local branch if it doesn't exist.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…over-community#15)

The regex `[A-Za-z0-9_]*` excluded dots, so
`backward.isDefEq.respectTransparency` was reported as just
`backward.isDefEq`, and `backward.privateInPublic.warn` was lumped
into `backward.privateInPublic`.

Add `.` to the character class so the full option name is captured,
and add a trailing space to the count pattern to prevent substring
overlap between e.g. `backward.privateInPublic` and
`backward.privateInPublic.warn`.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…r-community#16)

This script created the wrong diffs in lean-community/mathlib4#36982.

This bug can be fixed by simply adding `public` to the list of modifiers to be dropped.
…munity#18)

This PR makes `merge-lean-testing-pr.sh` create or reuse a real `nightly-testing` remote and ensures the local `nightly-testing` branch tracks `nightly-testing/nightly-testing`. This avoids the broken state where running the script in a fresh `mathlib4` clone leaves a later plain `git push` targeting `origin/nightly-testing` instead of `leanprover-community/mathlib4-nightly-testing`.

:robot: Prepared with OpenAI Codex
…ommunity#19)

Info messages don't indicate problems: the detailed breakdown in the
Zulip report is pure noise (entries like "96 × info: stdout:" or
"1 × info: Found git remote for proofwidgets at ..."). The top-line
"Info messages: N" summary is retained in the counts bullet list.

See https://leanprover.zulipchat.com/#narrow/channel/428973-nightly-testing/topic/Docgen.20status.20updates/near/586226267
…identity (leanprover-community#20)

The comment-deduplication filter still searched for comments authored by
`leanprover-community-bot`, but comments are now posted via the
`mathlib-lean-pr-testing` GitHub App, whose user login is
`mathlib-lean-pr-testing[bot]`. The lookup never matched, so the script
always fell through to posting a new comment instead of editing the
existing one.

See https://leanprover.zulipchat.com/#narrow/channel/428973-nightly-testing/topic/Bot.20messages/near/585690908
…eanprover-community#22)

The PR summary comment previously advised running
`./scripts/reporting/technical-debt-metrics.sh pr_summary`, which was
a stale relative path assuming the script was in `mathlib4`. Update
both the in-script comment and the printed instructions to give a
copy-pasteable recipe that clones `mathlib-ci` as a sibling of
`mathlib4` and invokes the script from there.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bryangingechen and others added 15 commits May 22, 2026 10:13
…rface errors in PR comment (leanprover-community#26)

* fix: replay auto commits in isolated worktrees and surface errors in PR comment

The commit verification script had several bugs that combined to produce
unhelpful PR comments and cascading false-positive failures:

- Per-commit replay shared the main checkout, so any untracked file an
  earlier auto commit's command produced (e.g., from a `> hits.txt`
  redirect) blocked the next commit's `git checkout --detach $parent`
  with "untracked working tree files would be overwritten". The "fix" to
  earlier commit N+1 then reported a confusing checkout error instead of
  its real failure.

- JSON output was hand-rolled with `echo "{ \"...\": \"$subject\" }"` and
  only escaped quotes. Subjects containing literal newlines (created by
  some commit-message generators) produced unparseable JSON, which then
  caused the comment-rendering step to fail silently with `jq: parse
  error: Invalid escape`.

- The comment surfaced no actual error output. Authors saw "❌ Failed"
  and had to dig through CI logs to learn what went wrong.

This rewrite:

- Replays each auto commit in a fresh `git worktree add --detach $parent`
  under a single mktemp'd parent. Worktrees are removed after each
  iteration, on EXIT (trap), and `git worktree prune` runs at startup so
  any stale state from interrupted runs is cleared. The same isolation
  applies to the transient-commit cherry-pick replay.

- Captures combined stdout+stderr via `tee` so live CI logs still show
  the command's output while we also save it for the JSON report.

- Builds JSON via `jq -nc --arg ...` so commit subjects with arbitrary
  bytes (newlines, backslashes, control chars) round-trip safely.

- Records a `failure_kind` (`command_failed`, `timed_out`, or
  `tree_mismatch`) plus the last 4KB / 40 lines of output, and for tree
  mismatches a truncated `git diff-tree --stat`. The summary script
  dispatches on `failure_kind` (no keyword matching on error text — the
  underlying tool's message is the diagnosis) and renders each failed
  commit as a `<details>` block containing the actual output.

- Caps total comment size at 50KB, falling back to "see CI logs" links
  for late-truncated content.

- Replaces the literal `✅`/`❌` escape strings (which only
  rendered correctly inside `jq` output) with real UTF-8 ✅/❌.

- Drops the bash-4 `mapfile` and associative-array dependencies; both
  scripts now run on bash 3.2+, easing local testing.

- Recovers from unparseable JSON: if `verify_commits.sh` crashes before
  writing the JSON file, the summary script emits a fallback comment
  pointing to the CI logs rather than producing empty output.

Smoke-tested against the failing run for leanprover-community/mathlib4#38480.
The new comment shows the actual bash parse error / xargs / sed / git rm
output for each commit independently, with no cascading "checkout would
overwrite" noise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: allow TIMEOUT_SECONDS to be overridden via environment variable

Hard-coded 600s makes the timeout-handling path effectively unreachable
from test harnesses that build a synthetic git history and need a quick
artificial timeout. Use the same `${VAR:-default}` pattern already used
for TRANSIENT_PREFIX et al., so a test scenario can set
TIMEOUT_SECONDS=1 and exercise the timed_out failure_kind path in
seconds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…y#24)

* feat(olean_diff.py): script for comparing oleans and generating a markdown comment

* adjust error messages
…nprover-community#25)

If we start counting this number as part of the tech debt metrics, this should encourage us to make more effective use of the module system.

Tested locally on Mathlib by running `grep '^@\[expose\] | grep -v ...` and building the regex until it matched all the forms of exposed public sections.
Fix the local run instructions of declarations_diff.sh and import_trans_difference.sh so that they point so this repo. This style is already used in technical-debt-metrics.sh, which had the correct instructions.
…er-community#23)

leanprover-community#19 removed info messages, but the weekly linting report in both Mathlib and CSLib make use of this (see leanprover-community/mathlib4#37236). Instead, make this an off by default option. I will follow up with Mathlib and CSLib PRs enabling this for the weekly linting reports.
…rover-community#21)

This PR adds a counter for `set_option warning.simp.varHead false`
occurrences to the weekly tech debt report.

The `warning.simp.varHead` option silences a warning introduced in Lean
about simp lemmas whose LHS has a metavariable as its head. Places where
this warning is silenced are tech debt we want to track over time.

Prompted by discussion on
leanprover-community/mathlib4-nightly-testing#209
which adds several `set_option warning.simp.varHead false in` uses.

🤖 Prepared with Claude Code

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
leanprover-community#29)

* fix: require all CI outcomes to succeed before adding `builds-mathlib`

Previously the `builds-mathlib` label was added as soon as `TEST_OUTCOME == "success"`,
regardless of whether `LINT_OUTCOME`, `BUILD_OUTCOME`, `NOISY_OUTCOME`, `ARCHIVE_OUTCOME`,
or `COUNTEREXAMPLES_OUTCOME` had failed. The success-message branch had the same bug,
short-circuiting all the failure-message cases below it. As a result, lean4 PRs whose
Mathlib CI ran with a failing `simp` linter step (but a succeeding `test` step) were
labelled `builds-mathlib` and shown a green "successfully built against this PR" comment,
hiding the lint failure from PR authors and reviewers.

This PR requires all six step outcomes (`BUILD`, `NOISY`, `ARCHIVE`, `COUNTEREXAMPLES`,
`LINT`, `TEST`) to be `success` before the `builds-mathlib` label is added or the
green-check status comment is posted. The `breaks-mathlib` branch already triggered on
any failure, so its behaviour is unchanged; the asymmetry is now closed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: drive labels and status comment from one aggregate state

Previously the label-mutation block and the comment-message block ran
their own outcome classifications. After the previous commit, the label
block knew only `all-success` and `any-failure`, while the message block
also handled `cancelled` and a fall-through case. That left two real
problems:

* `cancelled`, `skipped`, or empty outcomes posted a yellow comment but
  did not touch labels, so a stale `builds-mathlib` or `breaks-mathlib`
  from an earlier run would persist after a reattempt.
* When one outcome was `failure` and another `cancelled`, labels said
  `breaks-mathlib` but the comment said "was cancelled", hiding the
  failure.

This refactor classifies the run once into one of `success`, `failure`,
`cancelled`, or `other`, with `failure` taking precedence over
`cancelled`. Both the labels and the message branch off the same state.
The non-success/non-failure paths now explicitly remove both
`builds-mathlib` and `breaks-mathlib` so labels cannot drift out of sync
with reality.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This PR adds a "level" attribute to the tech debt reports posted weekly on Zulip and under every PR. The intention is that "weak" tech debt is allowed to increase in a PR (when reasonable), while "strong" tech debt should only increase in exceptional cases (for example, Lean updates or because we expose a new source of tech debt).

I played around with a few formatting options for the report and settled on this one: show the strong tech debt and then the weak tech debt in two different tables. That should make it clear for the reader (and possible automation) if we care about a particular change or not.
…eanprover-community#31)

This category of tech debt will give false positives until leanprover/lean4#13595 is merged (hopefully in the next 2 weeks).
…ver-community#32)

The label/comment automation used `curl -L -s ...`, where `-s` swallows
both progress output and error responses. When the GitHub App token
lacked permission for the target repo (as happened with batteries-pr-testing
branches before mathlib4#38945), `curl` would print the API's 4xx body
nowhere and exit 0, so the script's `set -e` never tripped and the script
silently misreported success ("Removing label X" / "Adding label Y" with
no actual API effect).

Switch all 7 curl calls to `-sS --fail-with-body`: silent on success,
prints the response body and exits non-zero on HTTP errors. Combined with
the script's existing `set -euo pipefail`, any future API failure now
fails the workflow step loudly with a diagnostic message.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…er-community#33)

* chore: route Zulip notifications to nightly-testing-mathlib

This PR retargets the regression-report and docgen-status workflows, and the
adaptation-PR helper script, from the shared `nightly-testing` channel to the
new `nightly-testing-mathlib` channel, in line with the proposal to split
`#nightly-testing` into per-project channels for Mathlib, Batteries, and Cslib.

🤖 Prepared with Claude Code

* chore: update echo string to reference new channel name

Codex review caught that the prose `echo` still referenced the old
`#nightly-testing` channel even though the actual `zulip-send` call was
already retargeted at `#nightly-testing-mathlib`.

🤖 Prepared with Claude Code
…eanprover-community#34)

After leanprover-community#32 added `--fail-with-body` to the curl calls, every label-API
4xx aborts the script under `set -euo pipefail`. That's the desired
behaviour for `add_label`, but `remove_label` returns 404 when the
label isn't currently on the PR — which is the script's desired
post-condition. The whole labelling step then aborts before reaching
the `add_label` calls, so PRs end up with no `builds-mathlib` /
`breaks-mathlib` label even when the build's outcome is clear.

Inspect the HTTP status manually instead and accept 200 or 404;
keep `add_label` strict.

Co-authored-by: Kim Morrison <kim.morrison@anu.edu.au>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…community#36)

In GNU grep BRE, `\|` is the alternation operator, so `grep "\|strong$"`
matches every line (the empty alternative matches anything). As a result
both `report strong` and `report weak` emitted the full table, and the
weekly Zulip post showed every category twice.

Use a literal `|` so only rows ending in `|strong` or `|weak` are kept.

Co-authored-by: Kim Morrison <kim@lean-fro.org>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nprover-community#35)

The technical debt metrics seem to have broken recently, and in the CI log there is a fatal Git error about not detaching `HEAD`. The log does not show which command precisely failed. Looking at the set of error messages that Git produces, this can be traced to `git checkout` (which `git switch` is the modern version of). So replacing `git switch` and `git checkout` with `git switch --detach` should hopefully fix the issue.

Example of broken tech debt reports:
* https://github.com/leanprover-community/mathlib4/actions/runs/25331716666/job/74266774924#step:11:133

Interestingly `git checkout` in the transitive imports script does not seem to break anything. My suspicion is the `git checkout -q` switch turns the advice message into a fatal error? I'd have to look at the Git source code in more detail since the man page is not very clear. In any case, let's just turn it into `git switch --detach` there too for consistency.
Add `scripts/crossref_review/crossref-pr-comment.py` — the trusted
orchestrator driven by mathlib4's
`.github/workflows/crossref_review.yml` (introduced in a stacked
mathlib4 PR). It runs the Lean extract / snippet scripts from the
mathlib4 base checkout against the PR checkout as data, formats a
Markdown table of every `@[stacks ...]`, `@[kerodon ...]`, and
`@[wikidata ...]` attribute added by the PR, and exits 0 / 1 / 2 / 3
so the workflow can distinguish "nothing to post" / "post and pass" /
"post and fail" / "post but treat as a network blip."

The script writes its comment body to a file whose first line is the
sentinel title used by `scripts/pr_summary/update_PR_comment.sh` to
dedupe — so the workflow can pipe it straight through that helper for
post-once / update-in-place behaviour.

Pure stdlib only, so the CI job can run it without `pip install`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kim-em kim-em force-pushed the crossref-review branch from fdad732 to 3c72af0 Compare May 22, 2026 10:15
kim-em added a commit to kim-em/mathlib4 that referenced this pull request May 22, 2026
When a PR touches `Mathlib/**/*.lean` and adds `@[stacks ...]`,
`@[kerodon ...]`, or `@[wikidata ...]` attributes, this workflow posts (or
updates, or deletes) a single bot comment with one row per added tag,
showing the upstream label / description fetched from the source database,
the Mathlib declaration's first-line signature, and the author's optional
attribute comment. CI fails when any tag cannot be resolved upstream and
emits a warning (not a failure) when only transient network errors occur.

Trusted-script execution pattern (same as `PR_summary.yml`):

* The PR head is checked out into `pr-branch/` and used as DATA only — no
  code from it ever runs in this `pull_request_target` context.
* The base ref is checked out into `base/`; its
  `scripts/extract-crossref-tags.lean` and `scripts/crossref-snippet.lean`
  are the trusted versions we actually invoke, and its `lean-toolchain`
  is what `lake env lean --run` picks up.
* `mathlib-ci` is checked out via the shared `get-mathlib-ci` action;
  its `scripts/crossref_review/crossref-pr-comment.py` orchestrator
  (added in leanprover-community/mathlib-ci#39)
  drives the extract → snippet → format pipeline.
* The resulting comment file is piped through
  `scripts/pr_summary/update_PR_comment.sh` for post-once / update-in-place
  behaviour.

A startup gate exits cleanly when the base ref pre-dates the companion
Lean scripts (e.g. against an older master), so the workflow doesn't
fail spuriously.

`docs/workflows.md` gains a new entry; `scripts/README.md` points at the
mathlib-ci location for the orchestrator.

Stacked on
leanprover-community#39664
(LSP widget) and depends on the orchestrator landing in
leanprover-community/mathlib-ci#39.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t step

The mathlib4 cross-reference review pipeline is moving from a text-based
extractor running independently in a `pull_request_target` workflow to an
elaboration-based lint step that runs in the build pipeline's `post_steps`
job after `lake exe cache get`. The lint step emits a machine-readable
TSV (db / tag / declName / file / comment) as an artifact, and a
`workflow_run`-triggered companion downloads it here.

This commit rewrites `scripts/crossref_review/crossref-pr-comment.py` so
it:

* Takes `--tsv` (the path to the artifact) instead of two checkout paths.
* Defensively parses the TSV (unknown databases dropped, missing fields
  padded). Nothing from PR-derived input is executed.
* Fetches snippets directly via Python's `urllib` — Wikidata via the
  batched `wbgetentities` API, Stacks/Kerodon via the documented Gerby
  `/data/tag/<TAG>/content/statement` endpoint. Gerby fetches are
  parallelised with `ThreadPoolExecutor(max_workers=8)` to keep the
  workflow under a minute on PRs that touch large files.
* Constructs the Markdown comment in trusted code (no longer relies on
  PR-derived Markdown), still using the `### Cross-reference tags added
  by this PR` sentinel so `update_PR_comment.sh` can dedupe.
* Exits 0 / 1 / 2 / 3 with the same grammar as before.

Stacked on leanprover-community/mathlib4#39666
upcoming force-push, which adds the post-build lint step and rewires
`crossref_review.yml` as `workflow_run`-triggered.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kim-em added a commit to kim-em/mathlib4 that referenced this pull request May 22, 2026
Replace the text-based cross-reference extractor with an elaboration-
based lint step that runs in the build pipeline's `post_steps` job
and a `workflow_run`-triggered companion that posts the bot comment.

**Why.** The previous text-based extractor (a ~250 LOC byte-level
scanner in `scripts/crossref.lean`) was a workaround for not having a
built Mathlib environment in the pull_request_target workflow. Now
that we run as a post-build lint step, we have a fully elaborated
environment and can read `Mathlib.CrossRef.tagExt` directly — cutting
~250 LOC and eliminating the fidelity gap with the Lean parser.

**Pieces:**

* `scripts/crossref.lean` — the `extract` subcommand is gone. New
  `check --diff <range> --tsv <path>` subcommand uses
  `importModules (loadExts := true)` (NOT `withImportModules`, which
  passes `loadExts := false` and would leave `tagExt` empty), walks
  `tagExt`, looks up each tagged declaration's source module via
  `env.getModuleFor?`, and filters by `git diff --name-only` of the
  PR range. Output is machine-readable TSV
  (`db\ttag\tdeclName\tfile\tcomment`). No Markdown, no network,
  no snippets — those are the privileged workflow's job.

* `.github/workflows/build_template.yml` — adds three new steps to
  `post_steps`, after the import-graph upload:
  - Run `lake env lean --run scripts/crossref.lean check ...`
    against the diff between the PR base and HEAD.
  - Write a `crossref-context.json` with PR number + head SHA so
    the companion workflow doesn't have to rely on the unreliable
    `workflow_run.pull_requests[0].number`.
  - Upload both as a single `crossref-context` artifact.

* `.github/workflows/crossref_review.yml` — rewritten. Old trigger
  `pull_request_target` → new trigger `workflow_run`. Downloads the
  artifact, resolves the PR number through a three-stage fallback
  chain (artifact JSON → workflow_run payload → `gh pr list` head
  SHA match), calls the mathlib-ci orchestrator (which now consumes
  the TSV and fetches snippets in Python), posts/updates/deletes
  via `update_PR_comment.sh`, and fails the check on `missing` tags.

**Privilege model.** The lint step runs with the PR's permissions and
produces only machine-readable TSV. The `workflow_run` job runs in the
base ref's permission scope (with `pull-requests: write`) and never
executes anything from the artifact — it parses the TSV defensively,
fetches snippets server-side from three fixed upstream APIs, and
builds the Markdown comment in trusted code.

Companion mathlib-ci PR:
leanprover-community/mathlib-ci#39

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kim-em kim-em changed the title feat: orchestrator for cross-reference tag review bot refactor(crossref_review): consume TSV from mathlib4's post-build lint May 22, 2026
Codex's review of the stacked refactor flagged that `md_escape` only
handled pipes and newlines, leaving PR-author comment strings (the
optional `"…"` after a `@[wikidata QID "…"]` attribute) free to render
arbitrary Markdown / HTML / `@mentions` in the bot's PR comment.

* Split into `md_cell_escape` (for upstream-derived snippets — Wikidata
  labels, Stacks/Kerodon statements — that already passed through the
  upstream's own sanitiser; we just collapse line breaks and neutralise
  pipes) and `md_author_escape` (for PR-author comments — backslash-
  escapes Markdown metacharacters, HTML-entity-escapes angle brackets,
  inserts a zero-width space after `@` to defang mentions, and
  neutralises pipes).

* `_doc_url(file, decl_name)` replaces the inline string interpolation
  for the doc-gen link. It does a suffix-only `.lean → .html` rewrite
  (so `foo.leanblah.lean` is handled) and URL-encodes the path and
  fragment so unusual characters in declaration names don't break the
  link.

* Wikipedia / Stacks / Kerodon tag URLs are also URL-encoded.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kim-em added a commit to kim-em/mathlib4 that referenced this pull request May 22, 2026
Replace the text-based cross-reference extractor with an elaboration-
based lint step that runs in the build pipeline's `post_steps` job
and a `workflow_run`-triggered companion that posts the bot comment.

**Why.** The previous text-based extractor (a ~250 LOC byte-level
scanner in `scripts/crossref.lean`) was a workaround for not having a
built Mathlib environment in the pull_request_target workflow. Now
that we run as a post-build lint step, we have a fully elaborated
environment and can read `Mathlib.CrossRef.tagExt` directly — cutting
~250 LOC and eliminating the fidelity gap with the Lean parser.

**Pieces:**

* `scripts/crossref.lean` — the `extract` subcommand is gone. New
  `check --diff <range> --tsv <path>` subcommand uses
  `importModules (loadExts := true)` (NOT `withImportModules`, which
  passes `loadExts := false` and would leave `tagExt` empty), walks
  `tagExt`, looks up each tagged declaration's source module via
  `env.getModuleFor?`, and filters by `git diff --name-only` of the
  PR range. Output is machine-readable TSV
  (`db\ttag\tdeclName\tfile\tcomment`). No Markdown, no network,
  no snippets — those are the privileged workflow's job.

* `.github/workflows/build_template.yml` — adds three new steps to
  `post_steps`, after the import-graph upload:
  - Run `lake env lean --run scripts/crossref.lean check ...`
    against the diff between the PR base and HEAD.
  - Write a `crossref-context.json` with PR number + head SHA so
    the companion workflow doesn't have to rely on the unreliable
    `workflow_run.pull_requests[0].number`.
  - Upload both as a single `crossref-context` artifact.

* `.github/workflows/crossref_review.yml` — rewritten. Old trigger
  `pull_request_target` → new trigger `workflow_run`. Downloads the
  artifact, resolves the PR number through a three-stage fallback
  chain (artifact JSON → workflow_run payload → `gh pr list` head
  SHA match), calls the mathlib-ci orchestrator (which now consumes
  the TSV and fetches snippets in Python), posts/updates/deletes
  via `update_PR_comment.sh`, and fails the check on `missing` tags.

**Privilege model.** The lint step runs with the PR's permissions and
produces only machine-readable TSV. The `workflow_run` job runs in the
base ref's permission scope (with `pull-requests: write`) and never
executes anything from the artifact — it parses the TSV defensively,
fetches snippets server-side from three fixed upstream APIs, and
builds the Markdown comment in trusted code.

Companion mathlib-ci PR:
leanprover-community/mathlib-ci#39

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants