[Lean Squad] feat(fv): Task 2 — informal spec for ResponseFileHelper.SplitCommandLine#7899
Conversation
Extracts a precise informal specification for the SplitCommandLine tokeniser in ResponseFileHelper. Documents: - State machine with two variables (seeking, seekingQuote) - 23 named properties across 5 groups (trivial, whitespace, quotes, structural invariants, composition) - Edge case analysis including unclosed-quote behaviour (Issue 2/3) - Asymmetry: unclosed opening quote silently discards content; unclosed mid-word quote is included in output token - Adjacent quoted segments emit two separate tokens (Issue 4) - No escape sequence support (Issue 1) - No existing unit tests (Issue 5) - Approximations for Lean 4 model (List Char, decidable isWhitespace) Advances ResponseFileHelper.SplitCommandLine from phase 1 to phase 2. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds formal-verification documentation that captures the current behavior of ResponseFileHelper.SplitCommandLine as an informal (human-readable) specification, and updates the FV target tracking to reflect progress.
Changes:
- Adds an informal spec document describing the tokenizer’s state machine, properties, and edge cases.
- Updates FV target phase/status for
ResponseFileHelper.SplitCommandLinefrom Phase 1 to Phase 2.
Show a summary per file
| File | Description |
|---|---|
formal-verification/specs/responsefilehelper_splitcommandline_informal.md |
New informal spec for SplitCommandLine including transitions, properties, and edge-case analysis. |
formal-verification/TARGETS.md |
Marks ResponseFileHelper.SplitCommandLine as Phase 2 (“Informal spec extracted”). |
Copilot's findings
Comments suppressed due to low confidence (2)
formal-verification/specs/responsefilehelper_splitcommandline_informal.md:113
- Property 13 title says "Adjacent quoted segments merge" but the example/output and later Issue 4 indicate they do not merge; they emit two separate tokens (e.g., ""a""b"" → ["a", "b"]). Please rename/reword this property to match the described behavior to avoid a self-contradicting spec.
13. **Adjacent quoted segments merge**: `SplitCommandLine("\"a\"\"b\"") = ["a", "b"]` — NOTE: separate yields because closing `"` triggers a yield when `seeking == TokenStart`
formal-verification/specs/responsefilehelper_splitcommandline_informal.md:149
- This edge-case row appears to show the closed quoted input (
"abc") but labels it as unclosed ("abc) and expects[]. The unclosed-quote-at-start case in the later analysis isSplitCommandLine("\"abc")(no trailing quote). Please fix the input representation so the table matches the described behavior.
| `"\"abc\""` (unclosed: `"abc`) | `[]` | Unclosed quote, `seeking==TokenStart` at end → no token emitted! |
| `"abc\""` (unclosed: `abc"`) | `["abc"]` | Word started before quote; toggle to QuoteEnd mid-word; end yields word |
- Files reviewed: 2/2 changed files
- Comments generated: 1
There was a problem hiding this comment.
Summary
Workflow: Expert Code Reviewer
Date: 2026-04-28
Repository: microsoft/testfx
PR: #7899 — Informal spec for ResponseFileHelper.SplitCommandLine
This PR adds only .md files (a phase-status update in TARGETS.md and a new informal specification). The spec itself is the deliverable, so accuracy is the primary concern.
I verified all state-machine traces in the spec against the actual implementation in ResponseFileHelper.cs. The overall structure, state-machine description, transition tables, and most edge cases are correct. Two inaccuracies were found that would corrupt the Lean formalisation if carried forward.
Key Findings
-
[Correctness] Line 113 — Property 13 heading contradicts its own output
Property 13 is named "Adjacent quoted segments merge", but the expected output["a", "b"]shows they produce separate tokens, not a merged one. The inline note acknowledges separation, making the heading directly misleading. In a proof assistant, this property name risks encoding the wrong invariant. -
[Correctness] Line 148 — Unclosed-quote edge-case example string is wrong
The notation`"\"abc\""`is the C# string literal for"abc"(5 chars, properly closed), which yields["abc"]— not[]. The intended input is the 4-char unclosed string"abc, whose C# literal is"\"abc". The parenthetical "(unclosed:"abc)" clarifies intent but the table cell itself is incorrect, which is what a tool or prover would consume.
Positive Observations
- The state-machine description accurately reflects the implementation's two-variable approach (
seeking/seekingQuote). - All other traced edge cases (empty string, whitespace-only,
""," ",abc"def"ghi,"abc"def) produce the correct expected outputs. - The asymmetry between Issue 2 (unclosed opening quote → silent discard) and Issue 3 (unclosed mid-word quote → content included) is correctly identified and explained.
- The note about
"foo""bar"→ two tokens (Issue 4) is correct and well-explained. - The Lean modelling approximations section is sound.
Recommendations
- Rename Property 13 to reflect actual behavior, e.g. "Adjacent quoted segments are NOT concatenated — each emits a separate token".
- Fix line 148 edge-case input to use
`"abc`(raw 4-char value) or the correct C# literal"\"abc"(no trailing\").
Generated by Expert Code Reviewer 🧠
🧠 Reviewed by Expert Code Reviewer 🧠
Extracts a precise informal specification for the SplitCommandLine tokeniser in ResponseFileHelper. Documents:
Advances ResponseFileHelper.SplitCommandLine from phase 1 to phase 2.
Fixes #7880