microsoft · Evangelink · Apr 27, 2026 · Apr 26, 2026 · Apr 27, 2026 · Apr 27, 2026
diff --git a/formal-verification/REPORT.md b/formal-verification/REPORT.md
@@ -4,25 +4,29 @@
 
 ## Status
 
-**Phase**: Early — Research complete, informal spec for `ArgumentArity` extracted.
+**Phase**: Early — Research expanded to 7 targets; informal specs extracted for `ArgumentArity` (merged) and `CommandLineParser.TryUnescape` (PR open). Lean toolchain unavailable in runner (network firewall), blocking Task 3+.
 
 ## Summary
 
-The Lean Squad has surveyed the `microsoft/testfx` codebase and identified five high-quality FV targets in the command-line infrastructure of Microsoft.Testing.Platform. All five targets are pure or near-pure functions with clear algebraic properties, making them suitable for Lean 4 formal verification.
+The Lean Squad has surveyed the `microsoft/testfx` codebase and identified **seven** high-quality FV targets. Two informal specifications have been extracted. The Lean toolchain cannot currently be installed in the runner environment due to network restrictions.
 
-An informal specification has been extracted for `ArgumentArity` (highest-priority target): 14 properties across four groups (predefined constants, well-formedness, equality, distinctness). One open question identified: the constructor does not enforce `Min ≤ Max`, so ill-formed arities are silently accepted. This is a potential source of undefined validator behavior.
+Key findings:
+- **`ArgumentArity`**: informal spec extracted (14 properties, 4 groups). Constructor does not enforce `Min ≤ Max`; `CommandLineOption` enforces this on construction, acting as the real guard.
+- **`CommandLineParser.TryUnescape`**: informal spec extracted (24 properties, 5 groups, 2 confirmed bugs for single-char quote inputs). PR open.
+- **New targets identified**: `ResponseFileHelper.SplitCommandLine` (pure tokeniser) and `TreeNodeFilter.MatchFilterPattern` (Boolean algebra — ideal for structural induction proofs of De Morgan, double negation, idempotence).
 
-The next steps are:
-1. Write the Lean 4 formal spec for `ArgumentArity` (types + theorem stubs).
-2. Attempt decidable proofs with `decide`.
-3. Advance to `CommandLineParser.TryUnescape`.
+Next steps (once Lean toolchain available):
+1. Write Lean 4 formal spec for `ArgumentArity` (Task 3).
+2. Write Lean 4 formal spec for `TreeNodeFilter.MatchFilterPattern` — Boolean algebra model with abstract match predicate.
 
 ## Targets Identified
 
 | Target | Phase | Notes |
 |--------|-------|-------|
-| `ArgumentArity` | 2 | Informal spec extracted — 14 properties, open question on ill-formed arity |
-| `CommandLineParser.TryUnescape` | 1 | Security-relevant string unescaping |
+| `ArgumentArity` | 2 | Informal spec extracted — 14 properties, open question on ill-formed arity (merged PR) |
+| `CommandLineParser.TryUnescape` | 2 | Informal spec extracted — 24 properties, 2 confirmed bugs (PR open) |
+| `TreeNodeFilter.MatchFilterPattern` | 1 | Boolean algebra; De Morgan, double negation, idempotence provable by `simp` |
+| `ResponseFileHelper.SplitCommandLine` | 1 | Pure tokeniser; state machine; grammar-based properties |
 | `CommandLineParser.ParseOptionAndSeparators` | 1 | Pure option-splitting function |
 | `CommandLineOptionsValidator` arity validation | 1 | Arity bounds checking |
 | `CommandLineParseResult.Equals` | 1 | Structural equality laws |
@@ -31,5 +35,7 @@ The next steps are:
 
 | Date | Tasks | Outcome |
 |------|-------|---------|
+| 2026-04-26 | Task 1 (Research expansion), Task 3 (blocked — Lean unavailable) | Added 2 new targets: SplitCommandLine and TreeNodeFilter.MatchFilterPattern; updated priority order; Lean toolchain still blocked by runner network |
+| 2026-04-25 | Task 2 (Informal Spec — TryUnescape) | Extracted 24-property informal spec; discovered 2 confirmed bugs for single-char quote inputs |
 | 2026-04-24 | Task 1 (Research), Task 2 (Informal Spec), Task 9 (CI Automation) | Identified 5 targets; extracted informal spec for ArgumentArity (14 properties, open question on ill-formed arity); set up CI workflow |
 | 2026-04-24 | Task 1 (Research), Task 9 (CI Automation) | Identified targets, created FV directory, set up CI workflow |
diff --git a/formal-verification/RESEARCH.md b/formal-verification/RESEARCH.md
@@ -140,6 +140,59 @@ Structural equality over a parse result: tool name, list of options (name + argu
 
 ---
 
+### Target 6 — `ResponseFileHelper.SplitCommandLine` ★★★★☆
+
+**File**: `src/Platform/Microsoft.Testing.Platform/CommandLine/ResponseFileHelper.cs`
+**Type**: `string → IEnumerable<string>`
+
+A pure function that tokenises a command-line string into whitespace-separated tokens, treating double-quoted substrings as single tokens (with their surrounding quotes stripped).
+
+**Why good for FV**:
+- Pure function; all state is local variables.
+- State machine structure with two orthogonal state dimensions (`seeking` ∈ {TokenStart, WordEnd} × `seekingQuote` ∈ {QuoteStart, QuoteEnd}).
+- Well-defined contract: whitespace → token separator outside quotes; double-quote toggles quoting mode.
+- Termination is obvious (index advances on every iteration).
+
+**Properties to verify**:
+1. Empty input → empty output.
+2. Whitespace-only input → empty output.
+3. Non-whitespace input with no quotes → splits into whitespace-delimited tokens.
+4. Double-quoted string → single token with quotes stripped.
+5. Double-quoted string containing spaces → single token, spaces preserved.
+6. All output tokens have their `"` characters removed.
+7. Two adjacent quoted strings (`"a""b"`) → may produce one or two tokens (document the chosen semantics).
+8. A trailing non-terminated quote (e.g., `"foo`) → yields what remains as a token.
+
+**Approximations**: Lean model uses `List Char`; whitespace is modelled as `Char.isWhitespace`; does not model the `SplitLine` comment-filter layer.
+
+---
+
+### Target 7 — `TreeNodeFilter.MatchFilterPattern` — Boolean Filter Algebra ★★★★★
+
+**File**: `src/Platform/Microsoft.Testing.Platform/Requests/TreeNodeFilter/TreeNodeFilter.cs`
+**Type**: `FilterExpression × String × PropertyBag → Bool`
+
+A pure recursive function that evaluates whether a test-node path and property bag match a structured Boolean filter expression built from `And`, `Or`, `Not`, and `Nop` operators.
+
+**Why excellent for FV**:
+- Pure recursive function; ideal for Lean structural induction.
+- The semantics exactly model a propositional Boolean algebra.
+- Classic algebraic laws (De Morgan, double negation, idempotence, absorption) are provable purely by structural induction on the `FilterExpression` type.
+- Abstracting away the `Regex` pattern matcher gives a clean propositional model.
+
+**Properties to verify**:
+1. `NopExpression` always returns `true`.
+2. `Not(Not(A))` is semantically equivalent to `A`.
+3. De Morgan (AND): `Not(And(A, B)) ↔ Or(Not(A), Not(B))`.
+4. De Morgan (OR): `Not(Or(A, B)) ↔ And(Not(A), Not(B))`.
+5. Idempotence: `And([A, A]) ↔ A`, `Or([A, A]) ↔ A`.
+6. Identity: `And([Nop, A]) ↔ A`, `Or([Nop, ...]) ↔ true`.
+7. Commutativity of binary `And`/`Or` (over two-element lists): `And([A, B]) ↔ And([B, A])`.
+
+**Approximations**: The Lean model abstracts `Regex.IsMatch` as an opaque predicate parameter `match : String → Bool`. This captures the Boolean-algebra structure independently of the regex engine semantics. `PropertyBag` matching is similarly abstracted.
+
+---
+
 ## Approach Notes
 
 - We use Lean 4 with Mathlib for all proofs.
@@ -153,3 +206,5 @@ Structural equality over a parse result: tool name, list of options (name + argu
 - Should we model `int.MaxValue` as an explicit Lean constant (e.g., `def maxInt32 : Int := 2^31 - 1`) or leave it as an opaque constant?
 - The `TryUnescape` function handles environment `NewLine` — should we abstract over this or assume `"\n"`?
 - Is the lack of a `Min ≤ Max` invariant in `ArgumentArity` a real bug or an accepted API choice? Worth filing an issue if a "bad" arity causes unexpected validator behaviour.
+- For `SplitCommandLine`, is the two-adjacent-quoted-strings case (`"a""b"`) intended to produce one token `ab` or two tokens? The current state-machine code yields `a` then starts a new token at the opening quote, so it produces two tokens. This is worth documenting in the informal spec.
+- For `TreeNodeFilter.MatchFilterPattern`, should commutativity be stated over _ordered_ lists or only over two-element binary forms? In the abstracted Lean model, `List.All` / `List.Any` are order-independent only with respect to the final Boolean result, assuming a pure and total predicate; concrete evaluation order still affects short-circuiting, exceptions, and performance, so any commutativity claim should be stated at the denotational level.
diff --git a/formal-verification/TARGETS.md b/formal-verification/TARGETS.md
@@ -21,18 +21,23 @@
 | 3 | `CommandLineParser.ParseOptionAndSeparators` | `src/Platform/Microsoft.Testing.Platform/CommandLine/Parser.cs` | 1 | Identified | — |
 | 4 | `CommandLineOptionsValidator` arity validation | `src/Platform/Microsoft.Testing.Platform/CommandLine/CommandLineOptionsValidator.cs` | 1 | Identified | — |
 | 5 | `CommandLineParseResult.Equals` | `src/Platform/Microsoft.Testing.Platform/CommandLine/ParseResult.cs` | 1 | Identified | — |
+| 6 | `ResponseFileHelper.SplitCommandLine` | `src/Platform/Microsoft.Testing.Platform/CommandLine/ResponseFileHelper.cs` | 1 | Identified | — |
+| 7 | `TreeNodeFilter.MatchFilterPattern` | `src/Platform/Microsoft.Testing.Platform/Requests/TreeNodeFilter/TreeNodeFilter.cs` | 1 | Identified | — |
 
 ## Priority Order
 
-1. **`ArgumentArity`** — highest priority. Smallest self-contained target; decidable properties; good warm-up for setting up the Lean environment.
-2. **`CommandLineParser.TryUnescape`** — second priority. Pure function with clear specification; security-relevant string processing.
-3. **`CommandLineParser.ParseOptionAndSeparators`** — third priority. Small pure function; useful for verifying parser correctness.
-4. **`CommandLineOptionsValidator` arity validation** — fourth priority. Validation logic with clear input/output contract.
-5. **`CommandLineParseResult.Equals`** — fifth priority. Structural equality; good for verifying equivalence-relation laws.
+1. **`ArgumentArity`** — highest priority. Smallest self-contained target; decidable properties; good warm-up for setting up the Lean environment. Informal spec done.
+2. **`CommandLineParser.TryUnescape`** — second priority. Pure function with clear specification; security-relevant string processing. Informal spec extracted (PR open).
+3. **`TreeNodeFilter.MatchFilterPattern`** — **elevated third priority**. Pure recursive Boolean algebra; structural induction proofs; De Morgan and double negation provable by `simp`. Excellent Lean target.
+4. **`ResponseFileHelper.SplitCommandLine`** — fourth priority. Pure tokeniser with state machine; clear grammar-based properties.
+5. **`CommandLineParser.ParseOptionAndSeparators`** — fifth priority. Small pure function; useful for verifying parser correctness.
+6. **`CommandLineOptionsValidator` arity validation** — sixth priority. Validation logic with clear input/output contract.
+7. **`CommandLineParseResult.Equals`** — seventh priority. Structural equality; good for verifying equivalence-relation laws.
 
 ## Notes
 
-- All five targets are in the command-line infrastructure (`src/Platform/Microsoft.Testing.Platform/CommandLine/`).
-- This focus makes sense: command-line parsing is pure and testable, with clear specification from the POSIX/CLI conventions.
-- MSTest assertion APIs (e.g., `Assert.AreEqual`, `Assert.IsTrue`) are interesting but harder to model formally due to generic type constraints and exception-based control flow.
-- Future targets may include the test-filter grammar and the server-mode protocol state machine.
+- Six of the seven targets are in the command-line infrastructure of Microsoft.Testing.Platform; one target is in the tree-node filter subsystem.
+- `TreeNodeFilter.MatchFilterPattern` (Target 7) is the highest-complexity target but also the most mathematically rich: proofs of Boolean-algebra laws give immediate, meaningful results.
+- `ResponseFileHelper.SplitCommandLine` (Target 6) is derived from `dotnet/command-line-api` — the upstream source is noted in comments; cross-referencing it may reveal documented invariants.
+- MSTest assertion APIs (e.g., `Assert.AreEqual`, `Assert.IsTrue`) remain interesting but harder to model formally due to generic type constraints and exception-based control flow.
+- Future targets may include the server-mode protocol state machine and the test-filter tokeniser (`TreeNodeFilter.TokenizeFilter`).