[Lean Squad] feat(fv): Task 3+9 — Lean 4 formal spec for TreeNodeFilter.MatchFilterPattern + CI fix by Evangelink · Pull Request #8111 · microsoft/testfx

Evangelink · 2026-05-11T13:31:12Z

feat: add Lean 4 formal spec for TreeNodeFilter.MatchFilterPattern (Task 3)
fix(lean-ci): fix elan version v3.1.0, drop missing sha256 check, add sorry-checker step

…ask 3) 🔬 Lean Squad — formal verification artifact Target: TreeNodeFilter.MatchFilterPattern (Target #7, Phase 2→3) File: formal-verification/lean/FVSquad/TreeNodeFilter.lean ## What's included - FilterExpr inductive type modelling the C# FilterExpression hierarchy (leaf/nop/and/or/not/withProps constructors) - Mutual recursive evaluators: evalFilter, evalFilterAll, evalFilterAny mirroring the MatchFilterPattern switch statement - 12 definitional @[simp] equation lemmas (proved via rw [*.eq_def]) - 2 De Morgan helper lemmas (proved by induction on List FilterExpr) - 12 Boolean-algebra theorems B1-B12 from the informal spec (all proved, no sorry) - 7 additional structural properties (proved) ## Proof status All 21 theorems are proved. No sorry. lake build passes clean. ## Approximations - matchesGlob is an opaque Bool-valued axiom (real impl uses Regex) - Property matching abstracted as String -> Bool inside withProps - MatchesFilter (public entry) not modelled in this PR ## Infrastructure changes - lakefile.toml: removed Mathlib dependency (firewall blocks cache) - lean-toolchain: updated v4.14.0 -> v4.29.1 (stable) - FVSquad.lean: root module file created Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

… sorry-checker step - Change elan install from v4.2.1 (non-existent) to v3.1.0 (verified) - Remove sha256sum check (no .sha256 file published for v3.1.0); add file-size sanity check instead - Include lake-manifest.json in cache key for better cache invalidation - Add 'Check for sorry' step that lists any unfinished proof stubs - Improve proof summary: add proof-status row and fix sorry grep pattern - Update README to reflect current toolchain (v4.29.1, no Mathlib) and current FVSquad targets table Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR advances the formal-verification (Lean 4) effort by adding a Lean specification/proof module for TreeNodeFilter.MatchFilterPattern and updating the Lean CI workflow/tooling docs to build and report proof status in GitHub Actions.

Changes:

Add FVSquad.TreeNodeFilter Lean module modeling filter-expression evaluation plus accompanying theorems.
Update Lean project plumbing (root module import, toolchain pin, and manifest) and refresh the Lean README.
Adjust .github/workflows/lean-proofs.yml to install a pinned elan version, update caching, and add a “sorry” scan + proof summary output.

Show a summary per file

File	Description
`formal-verification/lean/README.md`	Updates Lean project structure/build instructions and adds a targets/status table.
`formal-verification/lean/lean-toolchain`	Pins the Lean toolchain to `leanprover/lean4:v4.29.1`.
`formal-verification/lean/lake-manifest.json`	Adds the Lake dependency manifest to support reproducibility/caching.
`formal-verification/lean/FVSquad/TreeNodeFilter.lean`	New Lean formal spec + theorems for filter evaluation semantics.
`formal-verification/lean/FVSquad.lean`	Adds root module importing the new `TreeNodeFilter` module.
`.github/workflows/lean-proofs.yml`	Updates CI cache key, pinned `elan` install logic, and adds sorry/proof summary reporting.

Copilot's findings

Comments suppressed due to low confidence (1)

.github/workflows/lean-proofs.yml:127

In the proof summary, SORRY_COUNT uses \bsorry\b, which won’t act as a word-boundary with default grep and can undercount to 0 even when sorry exists. Also, THEOREM_COUNT only matches lines starting with theorem|lemma, so it won’t count declarations preceded by attributes like @[simp] theorem, making the reported theorem total misleading once such declarations exist (as in the new TreeNodeFilter.lean).

          LEAN_DIR="formal-verification/lean/FVSquad"
          THEOREM_COUNT=$(grep -rEc '^(theorem|lemma) ' "${LEAN_DIR}" --include='*.lean' 2>/dev/null || echo 0)
          SORRY_COUNT=$(grep -rc '\bsorry\b' "${LEAN_DIR}" --include='*.lean' 2>/dev/null \
            | awk -F: '{sum += $2} END {print sum+0}')

Files reviewed: 6/6 changed files
Comments generated: 2

+          SORRY_FILES=$(grep -rl '\bsorry\b' "${LEAN_DIR}" --include='*.lean' 2>/dev/null || true)
+          if [ -n "${SORRY_FILES}" ]; then
+            echo "⚠️ Files containing sorry (unfinished proofs):"
+            echo "${SORRY_FILES}"
+            grep -rn '\bsorry\b' "${LEAN_DIR}" --include='*.lean'


+| File | Target | Phase | Theorems | sorry |
+|------|--------|-------|----------|-------|
+| `FVSquad/TreeNodeFilter.lean` | `TreeNodeFilter.MatchFilterPattern` | 3 | 21 | 0 |


Evangelink

Summary

Workflow: PR Nitpick Reviewer
Date: 2026-05-11
Repository: microsoft/testfx

Overview

This is a well-structured PR adding a Lean 4 formal specification for TreeNodeFilter.MatchFilterPattern alongside CI improvements. The Lean file is thorough — 21 theorems covering B1–B12 Boolean-algebra invariants plus structural properties, all with zero sorry. The CI changes are pragmatic (downgrading to elan v3.1.0 with a file-size sanity check instead of the missing SHA-256).

Key Findings

Terminology mismatch (TreeNodeFilter.lean:16) — Comments say "Regex matching" but the abstraction is named matchesGlob, creating ambiguity about what is being modelled.
Naming inconsistency (TreeNodeFilter.lean:38, 48) — Constructor parameter propPred in the type definition becomes f in the pattern match; value for a sub-expression is also ambiguous.
Magic number (lean-proofs.yml:83) — 1000000 byte threshold lacks a named constant or comment quantifying expected archive size.
Unused GITHUB_OUTPUT (lean-proofs.yml:115) — sorry_found is written to step outputs but never consumed downstream; either wire it to a conditional failure or drop the output infrastructure.
Misleading "auto-generated" label (FVSquad.lean:2) — The root module appears to be hand-maintained, making the label potentially confusing for future contributors.

Positive Highlights

Clean mutual-recursion design for structural termination — a subtle Lean 4 requirement handled correctly.
Comprehensive @[simp] definitional lemmas make the main theorems very readable.
The precedence pitfall (= vs &&/||) is well-documented in the file header and per-theorem notes.
The CI sorry-checker is a nice addition for automated proof-completeness tracking.

🔍 Meticulously inspected by PR Nitpick Reviewer

🔍 Meticulously inspected by PR Nitpick Reviewer 🔍

Evangelink · 2026-05-11T13:37:56Z

+  - Proved theorems for Boolean-algebra invariants B1-B12 from informal spec
+
+  ## Approximations / limitations
+  1. Regex matching is abstracted as opaque Bool function `matchesGlob`.


Nitpick (Minor): The comment says "Regex matching is abstracted as opaque Bool function matchesGlob", but the function is named matchesGlob — suggesting a glob/wildcard abstraction, not regex. The C# implementation uses regex under the hood, but the Lean model uses glob semantics. This naming mismatch could confuse readers about what is actually being abstracted. Consider aligning the language: either call it "glob/wildcard matching" or rename the opaque function to matchesRegex to match the C# reality.

Evangelink · 2026-05-11T13:37:56Z

+  | and (subExprs : List FilterExpr) : FilterExpr
+  | or  (subExprs : List FilterExpr) : FilterExpr
+  | not (inner : FilterExpr) : FilterExpr
+  | withProps (value : FilterExpr) (propPred : String -> Bool) : FilterExpr


Nitpick (Minor): The constructor parameter is named value but represents a sub-filter expression. The name value is generic and ambiguous — it could mean the evaluated result rather than a sub-FilterExpr. Consider a more descriptive name like subExpr or filterExpr to match the naming used in and/or constructors (subExprs).

Evangelink · 2026-05-11T13:37:56Z

+  | .and es,        s => evalFilterAll es s
+  | .or es,         s => evalFilterAny es s
+  | .not inner,     s => ! evalFilter inner s
+  | .withProps v f, s => evalFilter v s && f s


Nitpick (Minor): The pattern-match variable f doesn't match the parameter name propPred used in the FilterExpr.withProps constructor declaration (line 38). Consistent naming between the type definition and match arms reduces the cognitive overhead of cross-referencing the two. Consider using propPred here as well:

| .withProps v propPred, s => evalFilter v s && propPred s

Evangelink · 2026-05-11T13:37:57Z

          curl -sSfL "${ELAN_BASE_URL}/${ELAN_ARCHIVE}" -o "${ELAN_TMP_DIR}/${ELAN_ARCHIVE}"
-          curl -sSfL "${ELAN_BASE_URL}/${ELAN_ARCHIVE}.sha256" -o "${ELAN_TMP_DIR}/${ELAN_ARCHIVE}.sha256"
+          ARCHIVE_SIZE=$(stat -c%s "${ELAN_TMP_DIR}/${ELAN_ARCHIVE}" 2>/dev/null || stat -f%z "${ELAN_TMP_DIR}/${ELAN_ARCHIVE}")
+          if [ "${ARCHIVE_SIZE:-0}" -lt 1000000 ]; then


Nitpick (Minor): 1000000 (1 MB) is a magic number. While the intent — catching a suspiciously small or empty archive — is clear from the surrounding comment, a named variable would make the threshold explicit and easier to adjust:

MIN_ARCHIVE_BYTES=1000000 # elan archives are typically ~5 MB; 1 MB signals corruption if [ "\$\{ARCHIVE_SIZE:-0}" -lt "\$\{MIN_ARCHIVE_BYTES}" ]; then

This is especially helpful since the threshold is the only integrity check replacing the now-absent SHA-256 verification.

Evangelink · 2026-05-11T13:37:57Z

+            echo "⚠️ Files containing sorry (unfinished proofs):"
+            echo "${SORRY_FILES}"
+            grep -rn '\bsorry\b' "${LEAN_DIR}" --include='*.lean'
+            echo "sorry_found=true" >> "$GITHUB_OUTPUT"


Nitpick (Minor): sorry_found is written to $GITHUB_OUTPUT on both branches, but no subsequent step references steps.sorry-check.outputs.sorry_found. If it is only used for human-readable console output, the id: sorry-check and the >> "$GITHUB_OUTPUT" lines are dead infrastructure — misleading future maintainers into thinking some downstream step depends on this value. Either wire it up to a conditional step (e.g. fail the job when sorry_found=true) or remove the id and output lines and keep the step purely informational.

Evangelink · 2026-05-11T13:37:57Z

@@ -0,0 +1,4 @@
+-- FVSquad: Lean 4 formal verification artifacts for microsoft/testfx
+-- 🔬 Lean Squad — auto-generated


Nitpick (Minor): The comment -- 🔬 Lean Squad — auto-generated is slightly misleading. This file appears to be hand-crafted (it selectively imports only FVSquad.TreeNodeFilter). If it will be regenerated automatically as targets grow, add a note about what generates it. If it's maintained by hand, drop "auto-generated" to avoid confusion and unintentional overwriting by future tooling.

Evangelink

Summary

Workflow: Expert Code Reviewer
Date: 2026-05-11
Repository: microsoft/testfx

Key Findings

[Security] Weak integrity check on elan binary download (.github/workflows/lean-proofs.yml line 83)
The previous flow fetched a .sha256 companion file from GitHub Releases and verified the archive with sha256sum -c. The new flow replaces this with a file-size sanity check (≥ 1 MB), which provides no cryptographic guarantee. The elan binary is executed directly after download (./elan-init -y), making this a supply-chain risk. Suggest hardcoding the SHA-256 digest of the v3.1.0 binary.

[Correctness] sorry-checker step is informational only — build will not fail on incomplete proofs (.github/workflows/lean-proofs.yml line 115)
The step detects sorry placeholders and sets sorry_found=true output, but neither calls exit 1 nor is its output gated on by any subsequent step. Lean 4's lake build does not fail on sorry by default either, so a PR with incomplete proofs will pass CI. This defeats the purpose of the checker.

Positive Observations

The Lean 4 formal spec (FVSquad/TreeNodeFilter.lean) is well-structured with 21 machine-verified theorems covering B1–B12 Boolean algebra invariants and additional structural properties. All proofs use only Lean 4 core tactics (no Mathlib dependency), which is appropriate given CI network constraints.
The mutual recursion block (evalFilter / evalFilterAll / evalFilterAny) correctly mirrors the structural termination argument from the C# implementation.
Adding lake-manifest.json to the cache key is a correct fix — this ensures cache invalidation when the dependency manifest changes.
The \bsorry\b word-boundary regex in grep is an improvement over the previous \<sorry\> ERE boundary, as it works consistently across both GNU and BSD grep.

Recommendations

Hardcode the SHA-256 hash of the elan-x86_64-unknown-linux-gnu.tar.gz v3.1.0 archive and restore sha256sum -c verification.
Add exit 1 in the sorry-checker's if [ -n "${SORRY_FILES}" ] branch (or a downstream gate step) to enforce proof completeness in CI.

Generated by Expert Code Reviewer

🧠 Reviewed by Expert Code Reviewer 🧠

Evangelink · 2026-05-11T13:38:12Z

          curl -sSfL "${ELAN_BASE_URL}/${ELAN_ARCHIVE}" -o "${ELAN_TMP_DIR}/${ELAN_ARCHIVE}"
-          curl -sSfL "${ELAN_BASE_URL}/${ELAN_ARCHIVE}.sha256" -o "${ELAN_TMP_DIR}/${ELAN_ARCHIVE}.sha256"
+          ARCHIVE_SIZE=$(stat -c%s "${ELAN_TMP_DIR}/${ELAN_ARCHIVE}" 2>/dev/null || stat -f%z "${ELAN_TMP_DIR}/${ELAN_ARCHIVE}")
+          if [ "${ARCHIVE_SIZE:-0}" -lt 1000000 ]; then


[Security] The file-size check (≥ 1 MB) is not a meaningful integrity guarantee — any malicious or corrupted archive that exceeds 1 MB will pass silently.

Mechanism: The original flow fetched a .sha256 companion file from the same GitHub Release and called sha256sum -c to cryptographically verify the binary. That guarantee is now gone. A MITM or a compromised CDN serving a large-but-malicious tarball would not be caught.

Impact: A tampered elan binary is executed directly via ./elan-init -y, which installs toolchain binaries into the runner's ~/.elan path and is subsequently used to build Lean proofs. Supply-chain risk in CI.

Suggestion: Hardcode the known-good SHA-256 digest of elan-x86_64-unknown-linux-gnu.tar.gz for v3.1.0 and verify against it, even if GitHub doesn't publish a companion file:

EXPECTED_SHA256="<precomputed-sha256-of-v3.1.0-archive>" echo "\$\{EXPECTED_SHA256} \$\{ELAN_TMP_DIR}/\$\{ELAN_ARCHIVE}" | sha256sum -c

This is a one-time cost per elan version bump and provides real supply-chain protection.

Evangelink · 2026-05-11T13:38:12Z

+            echo "⚠️ Files containing sorry (unfinished proofs):"
+            echo "${SORRY_FILES}"
+            grep -rn '\bsorry\b' "${LEAN_DIR}" --include='*.lean'
+            echo "sorry_found=true" >> "$GITHUB_OUTPUT"


[Correctness] The sorry-check step detects sorry stubs but never fails the build — the CI will report green even if proofs are incomplete.

Mechanism: When sorry is found, the step prints a warning and sets sorry_found=true as an output, but there is no exit 1 and no downstream step gates on steps.sorry-check.outputs.sorry_found. The lake build step itself also does not fail on sorry by default in Lean 4 (it emits a warning, not an error).

Impact: A future PR that introduces sorry placeholders (intentional or accidental) will pass CI, undermining the purpose of the checker.

Suggestion: Either add exit 1 inside the if [ -n "\$\{SORRY_FILES}" ] block, or add an explicit gate step after the sorry-check:

- name: Fail on sorry if: steps.sorry-check.outputs.sorry_found == 'true' run: | echo "CI failed: unfinished sorry proofs found." >&2 exit 1

If in-progress proofs with sorry should be allowed on feature branches but blocked on main, the condition could check github.ref == 'refs/heads/main'.

…er.MatchFilterPattern + CI fix (#8111) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions Bot and others added 2 commits May 11, 2026 15:30

Copilot AI review requested due to automatic review settings May 11, 2026 13:31

Evangelink merged commit f19fac3 into main May 11, 2026
17 checks passed

Evangelink deleted the lean-squad/task3-treenodefilter-lean-spec-2026-05-07-9d6ff18f7f576241 branch May 11, 2026 13:31

Copilot started reviewing on behalf of Evangelink May 11, 2026 13:32 View session

Copilot AI reviewed May 11, 2026

View reviewed changes

Evangelink commented May 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Lean Squad] feat(fv): Task 3+9 — Lean 4 formal spec for TreeNodeFilter.MatchFilterPattern + CI fix#8111

[Lean Squad] feat(fv): Task 3+9 — Lean 4 formal spec for TreeNodeFilter.MatchFilterPattern + CI fix#8111
Evangelink merged 2 commits into
mainfrom
lean-squad/task3-treenodefilter-lean-spec-2026-05-07-9d6ff18f7f576241

Evangelink commented May 11, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Evangelink left a comment

Uh oh!

Evangelink May 11, 2026

Uh oh!

Evangelink May 11, 2026

Uh oh!

Evangelink May 11, 2026

Uh oh!

Evangelink May 11, 2026

Uh oh!

Evangelink May 11, 2026

Uh oh!

Evangelink May 11, 2026

Uh oh!

Evangelink left a comment

Uh oh!

Evangelink May 11, 2026

Uh oh!

Evangelink May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,4 @@
		-- FVSquad: Lean 4 formal verification artifacts for microsoft/testfx
		-- 🔬 Lean Squad — auto-generated

Conversation

Evangelink commented May 11, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Evangelink left a comment

Choose a reason for hiding this comment

Summary

Overview

Key Findings

Positive Highlights

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Evangelink left a comment

Choose a reason for hiding this comment

Summary

Key Findings

Positive Observations

Recommendations

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants