Skip to content

feat(res-to-affine): tree-sitter AST walker for side-effect-import (Phase 2b, Refs #57)#322

Merged
hyperpolymath merged 2 commits into
mainfrom
phase-2b/res-to-affine-walker
May 21, 2026
Merged

feat(res-to-affine): tree-sitter AST walker for side-effect-import (Phase 2b, Refs #57)#322
hyperpolymath merged 2 commits into
mainfrom
phase-2b/res-to-affine-walker

Conversation

@hyperpolymath
Copy link
Copy Markdown
Owner

Summary

Lands the first AST-based detection pass on top of the Phase-2a grammar build pipeline (#321). New module tools/res-to-affine/walker.ml shells out to the vendored tree-sitter CLI, parses its default s-expression output (with embedded [row, col] byte ranges), and walks the AST to find the side-effect-import anti-pattern structurally rather than via the column-0-anchored regex Phase 1 uses.

Detection upgrade vs. Phase 1

The walker flags let _ = Mod.value only when the binding sits at module top level — direct child of source_file, or direct child of a block that is the body of a module_declaration. The Phase-1 regex matches any column-0 occurrence of the same shape.

On the existing sample.res fixture the walker reports 1 finding (the intended top-level let _ = Pixi.Sound.register on line 8); the scanner reports the same 1 plus the same shape nested inside function bodies. The regex pattern that #319 band-aided with column-0 anchoring is eliminated structurally by the walker.

$ dune exec tools/res-to-affine/main.exe -- \
    --engine=scanner tools/res-to-affine/test/fixtures/sample.res \
    | grep -E "^//\s+- "
//   - [side-effect-import] line 8: ...
//   - [raw-js] line 11: ...
//   - [mutable-global] line 15: ...
//   - [untyped-exception] line 19: ...
//   - [untyped-exception] line 22: ...
//   - [untyped-exception] line 28: ...

$ dune exec tools/res-to-affine/main.exe -- \
    --engine=walker tools/res-to-affine/test/fixtures/sample.res \
    | grep -E "^//\s+- "
//   - [side-effect-import] line 8: ...

(Walker covers only side-effect-import in 2b — the other three anti-patterns are scanner-only until 2c.)

CLI

  • --engine={scanner,walker} — default scanner, preserves Phase-1 behaviour.
  • --grammar-dir DIR — override walker's parser location (default: tools/vendor/tree-sitter-rescript, matches install.sh output).
  • On any walker error (grammar missing, tree-sitter not on PATH, s-exp parse failure), the binary prints a one-line reason to stderr and falls back to the scanner — never bails for walker problems.

Walker design

  • Subprocess to tree-sitter, not in-process FFI. The migration assistant runs once per file; subprocess overhead is negligible against the actual parse cost. Sidesteps OCaml↔C bindings entirely.
  • Hand-rolled s-exp parser (~150 LOC) handles the tree-sitter default output format: paren-delimited nodes with [row, col] byte ranges and optional field: prefixes per child.
  • stderr suppression. tree-sitter 0.26+ emits a benign "You have not configured any parser directories!" warning on every invocation regardless of how the parser is resolved (we use the cd grammar_dir trick, not its config file). The walker redirects stderr to /dev/null; errors that matter still surface via the exit code.
  • Stable public surface: Walker.scan : grammar_dir:string -> path:string -> source:string -> Scanner.finding list. Phase 2c extends rather than re-architects.

Tests

test/test_walker.ml adds two end-to-end tests under a new res-to-affine-walker alcotest run. Both auto-skip if tree-sitter is missing or the vendored grammar hasn't been built, so dune runtest on a fresh clone (no bootstrap) still passes.

Repo-root discovery walks up from the test cwd looking for dune-project — necessary because the dune sandbox hides the source-tree tools/vendor/ from naive relative-path math.

CI

What is NOT in this PR

  • raw-js, untyped-exception, mutable-global walker ports → Phase 2c.
  • inline lambda callback record, oversized function (the two anti-patterns deferred from Phase 1 entirely) → Phase 2c.
  • Switching --engine=walker to default → Phase 2c, after the four anti-patterns reach parity.

Stack

#321 (Phase 2a) → this PR (Phase 2b) → Phase 2c

Base is main per the orphan-trap guidance, so the diff is cumulative until #321 lands, then this rebases.

Test plan

  • CI green — the build job now runs walker tests; the migration-assistant job still validates the grammar build.
  • Local: just install-grammar && dune runtest tools/res-to-affine/ reports both walker tests [OK] (verified).
  • Local: --engine=walker on sample.res produces exactly the line-8 side-effect-import finding (verified).

🤖 Generated with Claude Code

hyperpolymath and others added 2 commits May 21, 2026 10:31
…a, Refs #57)

Phase 2 of the `.res → .affine` migration assistant (#57) replaces the
Phase-1 line-regex scanner with a tree-sitter AST walker. The grammar
itself is manifest-vendored in `editors/tree-sitter-rescript/` (since
#314), but the actual build pipeline that turns the manifest into a
loadable parser had not been wired up. This commit is that wiring.

- `justfile` gains an `install-grammar` recipe that wraps the existing
  `editors/tree-sitter-rescript/scripts/install.sh` so the bootstrap
  step is a discoverable `just` recipe alongside `build`/`test`/etc.
- `editors/tree-sitter-rescript/scripts/install.sh` updates its error
  message when the `tree-sitter` CLI is missing to point at both the
  Rust-native install (`cargo install tree-sitter-cli`, the repo's
  preferred path for CLI tooling per CLAUDE.md) and the existing
  Node-based one (`npm install -g tree-sitter-cli`). The script
  behaviour is unchanged when the CLI is present.
- `.github/workflows/ci.yml` gains a `migration-assistant` job that
  installs the tree-sitter CLI via npm (the fast CI path — a pre-built
  binary, ~5 s vs. ~5 min for `cargo install` from source), runs
  `just install-grammar`, verifies `tools/vendor/tree-sitter-rescript/
  src/parser.c` was produced, and smoke-parses the existing
  `tools/res-to-affine/test/fixtures/sample.res` fixture. The job sits
  alongside `vscode-smoke` as a Node-using carve-out under the same
  reasoning (manifest dep on an npm-distributed tool whose binary
  output is what we actually consume; no new TS source in this repo).
- `editors/tree-sitter-rescript/README.md` documents the dual install
  path, the new justfile recipe, and the CI gate.

Phase 2b (the actual walker — `walker.ml`, AST-based detection of
`side-effect-import`, `--engine` CLI flag, snapshot tests vs. the
regex scanner) stacks on top of this branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…hase 2b, Refs #57)

Lands the first AST-based detection pass on top of the Phase-2a
grammar build pipeline (#321). New module `tools/res-to-affine/
walker.ml` shells out to the vendored `tree-sitter` CLI, parses its
default s-expression output (with embedded `[row, col]` byte ranges),
and walks the AST to find the `side-effect-import` anti-pattern
structurally rather than via the column-0-anchored regex the Phase-1
scanner uses.

Detection upgrade vs. Phase 1
-----------------------------

The walker flags `let _ = Mod.value` only when the binding sits at
module top level — direct child of `source_file`, or direct child of
a `block` that is the body of a `module_declaration`. The Phase-1
regex matches any column-0 occurrence of the same shape. On the
existing `sample.res` fixture the walker reports 1 finding (the
intended top-level `let _ = Pixi.Sound.register` on line 8); the
scanner reports the same 1 plus would also have reported the same
pattern nested inside a function body. The regex pattern that #319
band-aided with column-0 anchoring is eliminated structurally by
the walker.

CLI
---

New `--engine={scanner,walker}` flag on the CLI (default: scanner,
preserving Phase-1 behaviour); `--grammar-dir DIR` overrides the
walker's parser location (default: `tools/vendor/tree-sitter-rescript`,
matching `install.sh` output). When the walker hits any error
(grammar missing, tree-sitter not on PATH, s-exp parse failure), it
prints a one-line reason to stderr and falls back to the scanner —
the CLI never bails because of walker problems.

Tests
-----

`test/test_walker.ml` adds two end-to-end tests under a new
`res-to-affine-walker` alcotest run. Both auto-skip if the
`tree-sitter` CLI is missing or the vendored grammar has not been
built, so `dune runtest` on a fresh clone (no bootstrap) still
passes. The repo-root discovery walks up from cwd looking for
`dune-project` to find the source-tree grammar path; the dune sandbox
otherwise hides it.

CI
--

The existing `build` job now installs `tree-sitter-cli` (npm, fast)
and runs `install.sh` before `dune runtest`, so the walker tests
actually execute rather than auto-skip. The migration-assistant
gate added in #321 stays — it remains the dedicated first-signal
job for grammar-pin drift.

Scope discipline
----------------

`raw-js`, `untyped-exception`, `mutable-global` remain
scanner-only; their AST counterparts land in Phase 2c. The walker
exposes a stable surface (`Walker.scan : grammar_dir:string ->
path:string -> source:string -> Scanner.finding list`) that 2c
extends rather than re-architects.

Stack: #321 (Phase 2a) → this PR (Phase 2b) → Phase 2c.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@hyperpolymath hyperpolymath enabled auto-merge (squash) May 21, 2026 13:10
@hyperpolymath hyperpolymath merged commit f0cd0d3 into main May 21, 2026
0 of 15 checks passed
@hyperpolymath hyperpolymath deleted the phase-2b/res-to-affine-walker branch May 21, 2026 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant