Fix Windows backslash path separators in JSON output and diagnostics#354
Conversation
Confirms root cause: ASTContext::with_filename stores raw filename strings with no normalization; both JSON writer emission sites read them verbatim. Single-point fix at with_filename ingress using the existing quarto_util::to_forward_slashes helper.
…lve add_filename question
…-dff27o04) Windows callers pass backslash-separated paths into ASTContext::with_filename (from glob() results, CLI args, etc). The JSON writer and ariadne diagnostics both read that string verbatim, so output diverges from insta snapshots recorded on Unix and from the forward-slash convention used everywhere else in the codebase (quarto_util::to_forward_slashes, already used for Lua/HTML/ DocumentProfile paths). TypeScript Quarto normalizes the same way via pathWithForwardSlashes for the same reason. Normalizes at with_filename/add_filename (the ASTContext ingress point, covers the JSON writer and SourceContext-based diagnostics for free), plus a second ingress point found during end-to-end verification: main.rs's ad hoc fallback SourceContext on the hard-parse-error path, which bypassed ASTContext entirely. Left the "Missing Newline" CLI warning alone since it echoes the user's own typed path back to them rather than emitting a portable identifier.
roborev review (job 1726) caught that both readers rebuild or re-add to context.source_context using the raw filename right after ASTContext::with_filename already normalized it — so a successful parse that emits warnings still rendered ariadne diagnostics with Windows backslashes, even though the earlier fix covered the JSON writer and the hard-parse-error fallback in main.rs. qmd.rs's case was live: it fully replaces context.source_context, so the normalized entry with_filename created was discarded. commonmark.rs's case was latent (a second, unused add_file call creates a dead entry at index 1 since FileId(0) is hardcoded) but fixed anyway for consistency, since leaving raw backslashes there is a footgun if that assumption ever changes. Both now reuse context.filenames[0], the value with_filename already normalized, rather than re-deriving from the raw filename argument.
roborev review (job 1730) caught that commonmark_reader_source_context_uses_forward_slashes asserted FileId(0), but ASTContext::with_filename populates FileId(0) internally before commonmark::read's own add_file call ever runs — the entry that call actually writes is FileId(1). The test passed regardless of whether that call normalized its filename, giving zero protection for the fix in f5a93fa. Verified by reverting just that one line and confirming the test still passed; asserting FileId(1) instead now fails correctly when reverted.
|
@cscheid asking for review to check this is the right approach: Normalize to '/' all the paths. It think this is the right call to make our life easier. |
cscheid
left a comment
There was a problem hiding this comment.
Looks good to me, excellent. I'm slightly terrified of the following question but I will ask it anyway, ha: are forward slashes valid characters in windows file names? I love the idea of normalizing everything to / at the deepest level of filename resolution, but I'm genuinely unsure what the actual rules for filenames on windows are.
|
Good question! Here is what I remember from last time I checked this:
As a path separator though, However, and this is the big BUT: under the Also, remote paths on network drives (UNC paths) also accept |
On Windows, pampa's JSON writer and ariadne diagnostics show backslash-separated file paths, diverging from the forward-slash convention used everywhere else in the codebase (
quarto_util::to_forward_slashes, already used for Lua/HTML/DocumentProfile paths) and from insta snapshots recorded on Unix. TypeScript Quarto normalizes the same way viapathWithForwardSlashesfor the same reason.Root Cause
ASTContext::with_filenamestores whatever filename string it's handed verbatim. On Windows, callers that derive the filename from a real path (glob()results in the snapshot-test harness, CLI arguments) produce backslash-separated strings, which the JSON writer andSourceContext-based diagnostics then read back out unchanged.Fix
Normalizes at the
ASTContextingress points (with_filename,add_filename), which covers the JSON writer and diagnostics built fromSourceContextfor free. Two more un-normalized ingress points surfaced during end-to-end verification and review, and got the same treatment:main.rs's ad hoc fallbackSourceContexton the hard-parse-error CLI path, built directly from the raw CLI argument, bypassingASTContextentirely.context.source_contextright afterwith_filenamehad already normalized it — for qmd this fully replaced the normalized entry, so warnings on an otherwise-successful parse still showed backslashes.Left
main.rs's "Missing Newline at End of File" CLI warning alone — it interpolates the raw CLI argument into a plain sentence, echoing the user's own typed path back to them rather than emitting a portable identifier, which matches howrustc/cathandle this.Test Plan
astContext.files[].namein JSON output uses forward slashes for a backslash-separated input pathpampacrate test suite passes (aside from the pre-existing, unrelated CRLF byte-offset failures)