fix: Windows-compatible path handling in JS/Go parsers#46
fix: Windows-compatible path handling in JS/Go parsers#46joshbouncesecurity wants to merge 5 commits intoknostic:masterfrom
Conversation
Three Windows-only failures preventing OpenAnt from running correctly on Windows. Ports the path/encoding fixes from #3 with a more nuanced Unicode strategy. 1. ts-morph backslash interpretation. path.relative() and path.resolve() produce backslash-separated paths on Windows; ts-morph treats those backslashes as escape characters when matching paths it has already added, so the JS analyzer silently emitted 0 functions. Normalise to forward slashes via a toPosixPath() helper before handing paths to ts-morph and before storing them in functionId values. 2. CRLF-tainted file lists. The --files-from consumer split on \n only, leaving a trailing \r on every path when the list was written on Windows. Switch to a /\r?\n/ split with per-line trim(). 3. cp1252 console crashes. The "Pipeline Test" status output used the Unicode glyphs U+2713 / U+2717 / U+2192, which crash on cp1252-encoded stdouts (the Windows default). Detect at import time whether stdout can encode those symbols and fall back to ASCII (OK / FAIL / ->) only when it cannot, so UTF-8 terminals keep the prettier output. Also remove the now-stale Windows xfail/skip markers from tests/ test_js_parser.py and tests/test_go_cli.py — the underlying bugs are fixed, so those guards no longer apply. Adds tests/test_windows_path_handling.py covering all three fixes: - ts-morph accepts a backslash-only file list and produces non-empty POSIX-form functionId values - the analyzer accepts a CRLF-terminated file list with a trailing blank line - both pipeline modules pick the ASCII fallback under a cp1252 stdout and the Unicode form under a UTF-8 stdout Refs knostic#16 (item 8). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous review (click to expand)🤖 Automated Claude Code ReviewFindingsF1 -- Critical bug: 14 print calls in both test_pipeline.py files emit literal {SYM_FAIL} / {SYM_OK} / {SYM_ARROW} strings instead of the variable values This is the most significant defect in the PR. A number of Affected sites in
The same 7 patterns appear in All 14 must be f-strings: The These regressions are not caught by the new tests because the tests only inspect F2 -- The PR adds A secondary issue: lines ~740 also call Fix: add F3 --
Additionally, Recommended fix: normalise at construction time: constructor(repoPath) {
this.repoPath = toPosixPath(path.resolve(repoPath));
...
}F4 -- The test constructs a backslash path by replacing every abs_path = str(src / "module.js") # /tmp/pytest-xxx/repo/src/module.js
backslash_path = abs_path.replace("/", "\\") # \tmp\pytest-xxx\repo\src\module.jsOn Linux/macOS, Node's If the Node/npm skip guard fires (no The PR description explicitly claims these tests are "runnable on any platform." That claim is incorrect for this test as written. Options:
F5 -- After The same issue affects Fix: add F6 -- Unicode detection fails silently when stdout is piped or redirected on UTF-8 systems When Python stdout is redirected (piped to a file, log aggregator, or subprocess), Additionally, the if not encoding:
return FalseIf degrading to ASCII for piped/redirected output is the intended behaviour, the docstring should say so explicitly. F7 -- The UTF-8 branch test (lines 862-884) loads only F8 -- The removed code included: if "UnicodeEncodeError" in result.stderr:
if sys.platform == "win32":
pytest.skip(...)
else:
pytest.fail("UnicodeEncodeError from JS parser on non-Windows (unexpected regression)")After the PR, if a The Windows-specific if result.returncode != 0 and "UnicodeEncodeError" in result.stderr:
pytest.fail("UnicodeEncodeError from JS parser (unexpected regression)")F9 -- When a Windows UNC path Positive notes
|
…h gaps, unicode clarity F1: fix 7 broken non-f-string print calls in go/test_pipeline.py (SYM_FAIL, SYM_OK, SYM_ARROW literals emitted verbatim) and 7 matching regressions in javascript/test_pipeline.py. status/marker variables now use the SYM_* variables directly instead of wrapping them in string literals. F2: apply toPosixPath to extractSingleFunction's filePath and to the nested requiredProject.addSourceFileAtPath resolvedPath so single-function CLI mode normalises Windows paths the same way batch mode does. F3: normalise this.repoPath in the TypeScriptAnalyzer constructor via toPosixPath(path.resolve(repoPath)) so all later path.relative / path.join calls operate on a consistent forward-slash base. F6: replace the `or ""` encoding fallback in _stdout_supports_unicode() with an explicit `if not encoding: return False` guard in both pipeline files, and add a docstring noting the piped/redirected-stdout behaviour. F9: add UNC path behaviour note to toPosixPath JSDoc. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
F4: gate test_typescript_analyzer_accepts_backslash_paths to Windows only. Replacing all forward slashes in a POSIX absolute path produces a non-absolute path on Linux/macOS, making the test fail on those platforms. F5: pop freshly-loaded cp1252 and utf-8 pipeline modules from sys.modules in finally blocks (pipeline_module fixture and unicode test) to prevent stale module-level state leaking into later tests. F7: parametrize test_pipeline_uses_unicode_when_stdout_supports_it to cover both the JS and Go pipeline modules, closing the coverage gap for the Go Unicode code path. F8: restore UnicodeEncodeError regression guard in test_go_cli.py test_parse_js_repo. The Windows-specific pytest.skip was correct to remove (bug is fixed), but the non-Windows pytest.fail provides a diagnostic on future regressions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
🤖 Automatically written by ClaudeThanks for the thorough review. All 9 findings addressed across two commits:
All 16 tests pass (12 skipped — Go binary not built, as expected in this environment). |
Previous review (click to expand)🤖 Automated Claude Code ReviewFindingsThis is a follow-up review. F1 through F9 from the previous review are addressed; new findings continue from F10. Status of previous findings:
F10 —
The comment "Remove the freshly loaded module so its stale cp1252-patched module-level state doesn't leak into later tests" is therefore misleading; the cleanup it describes does not happen. To make the fix effective, the module must first be registered: add Note: the practical consequence is small because the module-level F11 — Double The test now has two independent @pytest.mark.skipif(
sys.platform != "win32",
reason="...",
)
@pytest.mark.skipif(
not shutil.which("node") or not JS_NODE_MODULES.exists(),
reason="Node.js or JS parser npm dependencies not available",
)
def test_typescript_analyzer_accepts_backslash_paths(tmp_path):When running on Linux/macOS, the first decorator skips the test with the platform reason. That is the intended behaviour. However, pytest evaluates decorators from bottom to top (inner to outer at runtime), and multiple The concern is discoverability in CI output: on a non-Windows CI agent that also lacks Node, the skip message will say "backslash path simulation only meaningful on Windows" rather than "Node.js or JS parser npm dependencies not available", hiding the missing-dependency condition. More importantly, on Windows without Node installed, the platform guard is This is a minor readability/diagnostic issue rather than a correctness bug, but it is actionable: combining the two conditions into one @pytest.mark.skipif(
sys.platform != "win32"
or not shutil.which("node")
or not JS_NODE_MODULES.exists(),
reason="Windows only (backslash path test) and requires Node.js + JS parser npm deps",
)F12 — Lines 523-527: if (!fs.existsSync(filePath)) {
console.error(`File not found: ${filePath}`);
process.exit(1);
}
const normalisedFilePath = toPosixPath(path.resolve(filePath));
However, there is a subtle inconsistency: the error message reports const normalisedFilePath = toPosixPath(path.resolve(filePath));
if (!fs.existsSync(normalisedFilePath)) {
console.error(`File not found: ${normalisedFilePath}`);
process.exit(1);
}This also catches a (unlikely) edge case: if F13 — The parametrised Since A clean fix: check before inserting ( F14 — Lines 250-253: assert mod.SYM_OK == "✓"
assert mod.SYM_FAIL == "✗"
assert mod.SYM_ARROW == "→"The test file does not declare a An alternative assertion form — Positive notes
|
…istency, sys.path dedup, assertion safety F10: register module in sys.modules inside _load_pipeline_module so that pre/post sys.modules.pop() calls in the fixture and unicode test are meaningful and prevent stale module-level state from leaking. F11: combine double @pytest.mark.skipif decorators on test_typescript_analyzer_accepts_backslash_paths into a single condition so the skip reason is always accurate and neither guard can mask the other. F12: move toPosixPath/path.resolve above the fs.existsSync check in extractSingleFunction so the existence check and error message both use the normalised path, consistent with the rest of the function. F13: guard sys.path.insert with already_on_path check in both the pipeline_module fixture and test_pipeline_uses_unicode_when_stdout_supports_it to prevent duplicate path entries across parametrized test runs; only remove from sys.path if we added it. F14: replace literal Unicode glyph strings in assertions (✓ ✗ →) with \u-escape sequences so pytest assertion error messages remain safe on cp1252 consoles. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
🤖 Automatically written by ClaudeAll 5 new findings from iteration 2 addressed in commit 22e6b56:
|
🤖 Automated Claude Code ReviewFindingsThis is a follow-up review. F1 through F14 from the previous two reviews are addressed; new findings continue from F15. Status of previous findings (F10–F14):
F15 — The F13 fix guards against the test harness inserting a duplicate # Line 71 of both parsers/javascript/test_pipeline.py and parsers/go/test_pipeline.py:
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
Trace for the first parametrized test call (e.g., cp1252 + javascript):
On the second call (cp1252 + go):
Each subsequent test call adds one more copy. After 4 parametrized tests, A correct fix requires either (a) snapshotting path_snapshot = list(sys.path)
try:
mod = _load_pipeline_module(mod_name, path)
...
finally:
sys.path[:] = path_snapshot
sys.modules.pop(mod_name, None)F16 — The F10 fix registers the module in sys.modules[name] = module
try:
spec.loader.exec_module(module)
except BaseException:
try:
del sys.modules[name]
except KeyError:
pass
raiseThe current sys.modules[name] = mod
spec.loader.exec_module(mod) # if this raises, sys.modules[name] is a partial module
return modIf The callers' The fix is a three-line change inside sys.modules[name] = mod
try:
spec.loader.exec_module(mod)
except BaseException:
sys.modules.pop(name, None)
raise
return modF17 — Trailing-whitespace misalignment in The three new assertion lines have inconsistent alignment before the inline comments: assert mod.SYM_OK == "✓" # CHECK MARK (U+2713) ← 4 spaces
assert mod.SYM_FAIL == "✗" # BALLOT X (U+2717) ← 2 spaces
assert mod.SYM_ARROW == "→" # RIGHTWARDS ARROW (U+2192) ← 2 spaces
Positive notes
|
…rollback, comment alignment F15: snapshot and restore sys.path around exec_module in _load_pipeline_module so that module-level sys.path.insert calls in the pipeline files do not accumulate across parametrized test runs. F16: roll back sys.modules[name] on exec_module failure (CPython importlib convention) so a failed load does not leave a partially-initialized module poisoning the cache. F17: fix inconsistent inline comment spacing on the three \u-escape assertion lines — normalise to 2 spaces before # for all three. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
🤖 Automatically written by ClaudeAll 3 final findings from iteration 3 addressed in commit fb31f2a:
|
Manual verification (Windows-specific)
|
Local test resultsRan the JS pipeline on Windows against Commands run: Outcome:
Sample unit ids (note POSIX separators): |
Summary
Three Windows-only failures preventing OpenAnt from running correctly on Windows:
path.relative()produces backslashes on Windows; ts-morph treats them as escape characters when matching paths it has already added, so the JS parser silently emits 0 files. Fix: convert separators to forward slashes via atoPosixPath()helper before handing paths to ts-morph, and use POSIX-formrelativePathwhen buildingfunctionIdvalues.\r\n-tainted file lists — splitting a newline-delimited list on Windows leaves a trailing\ron every path, whichaddSourceFileAtPaththen fails to resolve. Fix: split the--files-fromcontent on/\r?\n/and.trim()each line.✓ ✗ →, which crash on cp1252-encoded stdouts (the Windows default). Fix: detect at import time whethersys.stdout.encodingcan encode those glyphs and fall back to ASCII (OK/FAIL/->) only when it cannot. UTF-8 terminals keep the prettier Unicode output — this is more nuanced than an unconditional ASCII downgrade.Stale Windows
xfail/skipmarkers intests/test_js_parser.pyandtests/test_go_cli.pyare removed since the underlying bugs are fixed.These are the parser path/encoding pieces of issue #16 item 8. The companion venv-binary path fix (
Scripts\python.exevsbin/pythoninapps/openant-cli/internal/python/runtime.go) is intentionally not included here — it's a separate concern and worth its own PR.Addresses item 8 from #16 (does not close the issue).
Test plan
New
libs/openant-core/tests/test_windows_path_handling.pycovers all three fixes, runnable on any platform:functionIdvalues that contain only forward slashes.pytest tests/→99 passed, 12 skipped(the 12 skips aretest_go_cli.pycases that need the compiled Go binary; pre-existing).openant parse <ts-repo>on Windows and confirm a non-zero file count is reported.