diff --git a/tools/res-to-affine/README.md b/tools/res-to-affine/README.md index c4a72e6b..98fdbb6d 100644 --- a/tools/res-to-affine/README.md +++ b/tools/res-to-affine/README.md @@ -16,14 +16,14 @@ and the broader `idaptik` migration. ## Usage ```sh -# print skeleton to stdout (default: regex scanner) +# print skeleton to stdout (default: tree-sitter AST walker, Phase 2c) dune exec tools/res-to-affine/main.exe -- path/to/Foo.res # or write to a file dune exec tools/res-to-affine/main.exe -- path/to/Foo.res -o Foo.affine -# opt into the Phase-2 tree-sitter AST walker -dune exec tools/res-to-affine/main.exe -- --engine=walker path/to/Foo.res +# opt back into the Phase-1 line-regex scanner (no grammar required) +dune exec tools/res-to-affine/main.exe -- --engine=scanner path/to/Foo.res ``` The output is **not compilable**. It is a starting point for the human: @@ -36,8 +36,8 @@ needs re-decomposing. | `--engine` | Implementation | When to use | |---|---|---| -| `scanner` (default) | Line-anchored regex over the raw source (`scanner.ml`). | Default — no prerequisites, ships with the binary. | -| `walker` | Shells out to the vendored `tree-sitter` CLI, walks the AST (`walker.ml`). | When the regex's false-positive surface matters — eliminates the `let _ = chained.call()` class of misfire that the column-0 anchor in #319 had to band-aid. | +| `walker` (default) | Shells out to the vendored `tree-sitter` CLI, walks the AST (`walker.ml`). | Default since Phase 2c — covers all six anti-patterns including the two that the scanner cannot see (inline callback records, oversized functions) and eliminates the `let _ = chained.call()` / line-anchored false-positive classes. | +| `scanner` | Line-anchored regex over the raw source (`scanner.ml`). | Fallback when the vendored grammar is unavailable (no `tree-sitter` CLI, missing `tools/vendor/tree-sitter-rescript/`). Detects four of the six anti-patterns only. | The walker requires the vendored `tree-sitter-rescript` grammar to be built first: @@ -50,42 +50,37 @@ just install-grammar If the grammar isn't built or the `tree-sitter` CLI isn't on PATH, the walker auto-falls-back to the scanner and prints the reason to stderr. -## What gets flagged (Phase 1) +## What gets flagged The six anti-patterns surfaced in the -[idaptik Wave 3 pilot](https://github.com/hyperpolymath/idaptik/blob/main/migration/main/LESSONS.md), -of which the line-based scanner reliably detects four: +[idaptik Wave 3 pilot](https://github.com/hyperpolymath/idaptik/blob/main/migration/main/LESSONS.md): -| Tag | Detection | AffineScript answer | +| Tag | Detection (walker, default) | AffineScript answer | |---|---|---| -| `side-effect-import` | `let _ = Mod.foo` at top level | Explicit registration call | -| `raw-js` | `%raw(...)` or `[%bs.raw ...]` | Typed extern (`ABI-FFI-README.md`) | -| `untyped-exception` | `Promise.catch`, `Js.Exn`, `raise`, `try` | `Result[E, A]` / `Validation[E, A]` | -| `mutable-global` | `:=` operator | Affine record threaded through | +| `side-effect-import` | `let _ = Mod.foo` at module top level (structural — not nested inside a function body) | Explicit registration call | +| `raw-js` | `extension_expression` node — any `%name(...)` or `[%bs.name ...]` | Typed extern (`ABI-FFI-README.md`) | +| `untyped-exception` | `try_expression`, `raise(...)` call, `Js.Exn.*` reference, `Promise.catch` member access | `Result[E, A]` / `Validation[E, A]` | +| `mutable-global` | Top-level `let x = ref(...)` (call-of-`ref` body) OR top-level `mutation_expression` (`x := y`) | Affine record threaded through | +| `inline-callback-record` | ≥ 3 inline `function` values in one `record` literal OR one call's `arguments` list (via `labeled_argument` or direct) | Row-polymorphic handler record (LESSONS.md §callback-record) | +| `oversized-function` | `function` node whose row span exceeds 50 source lines | Re-decompose before porting; do not transliterate | -Deferred to Phase 2 (need real AST): - -- **inline lambda callback record** — N ≥ 3 `~handler: (...) =>` lambdas - inside one record literal (collapse to a row-polymorphic record). -- **oversized function** — function body > ~50 LOC (decompose). - -### Walker coverage (Phase 2) +### Walker vs scanner coverage | Anti-pattern | Scanner (regex) | Walker (AST) | |---|---|---| -| `side-effect-import` | ✓ | ✓ — Phase 2b | -| `raw-js` | ✓ | — Phase 2c | -| `untyped-exception` | ✓ | — Phase 2c | -| `mutable-global` | ✓ | — Phase 2c | -| inline lambda callback record | — | — Phase 2c | -| oversized function | — | — Phase 2c | - -The walker improves on the regex by being structural: it only reports -`side-effect-import` when `let _ = Mod.value` sits at module top level -(direct child of `source_file` or a `module_declaration` body), not when -the same shape appears inside a function body — where it is normally a -ReScript "discard the return value of a chained call" idiom, not a -module-load side effect. +| `side-effect-import` | ✓ | ✓ (since Phase 2b, #322) | +| `raw-js` | ✓ | ✓ (since Phase 2c) | +| `untyped-exception` | ✓ | ✓ (since Phase 2c) | +| `mutable-global` | ✓ | ✓ (since Phase 2c) | +| `inline-callback-record` | — | ✓ (since Phase 2c, walker-only by construction) | +| `oversized-function` | — | ✓ (since Phase 2c, walker-only by construction) | + +The walker improves on the regex by being structural: it reports +`side-effect-import` only when `let _ = Mod.value` sits at module top +level, distinguishes a `try { ... }` expression from the identifier +`try`, only flags `Mutable_global` for module-scoped state (not local +refs inside a function body), and dedupes structurally-overlapping +findings on the same line. ## Why a skeleton and not a transliteration @@ -119,18 +114,27 @@ to tree-sitter in Phase 2 behind something that already pays its way. ### Phase 2 — tree-sitter AST walker -- Install the pinned grammar from - `editors/tree-sitter-rescript/` (manifest-only vendoring of - `rescript-lang/tree-sitter-rescript@990214a`). -- Replace `Scanner` with a walker over the s-expression output of - `tree-sitter parse --quiet`, parsed by the existing `sexplib0` - dependency. -- Adds the two deferred patterns (callback records, oversized - functions) and unlocks **structural** translation of trivial forms - (e.g. `option` → `Option[X]`, `result` → `Result[Y, X]`, - `switch x { | A => ... }` → `match x { A => ... }`). -- The `Emitter` interface does not change: same skeleton shape, same - marker schema, richer body. +Vendoring of the pinned grammar +(`rescript-lang/tree-sitter-rescript@990214a`) lives in +`editors/tree-sitter-rescript/`; `install.sh` materialises the +parser into `tools/vendor/tree-sitter-rescript/`. + +- **Phase 2a (#321)** — `just install-grammar`, the + `migration-assistant` CI job that runs it, dual install path + (`cargo install tree-sitter-cli` or `npm install -g + tree-sitter-cli`). +- **Phase 2b (#322)** — the walker itself: subprocess to the + `tree-sitter` CLI, hand-rolled s-expression parser over the + default `[row, col]`-annotated output, AST-based detection of + `side-effect-import` only. +- **Phase 2c (this revision)** — walker covers all six anti-patterns + including the two that the scanner cannot see; `--engine=walker` + becomes the CLI default. The `Emitter` interface does not change; + the marker schema is the same. + +Walker output is deduplicated by `(kind, line)` so structurally- +overlapping AST matches don't inflate the bullet count above what +the line-based scanner would produce on the same file. ### Phase 3 — partial translation @@ -166,9 +170,12 @@ cd tools/res-to-affine/test ``` The fixture under `test/fixtures/sample.res` is synthetic and exercises -every Phase-1 anti-pattern. Real `.res` files from the estate (e.g. -`gitbot-fleet/bots/sustainabot/bot-integration/src/*.res`) can be run -ad hoc through the CLI without changes to the test suite. +every Phase-1 anti-pattern; `test/fixtures/phase2c.res` exercises the +two anti-patterns that are walker-only by construction +(`inline-callback-record`, `oversized-function`). Real `.res` files +from the estate (e.g. `gitbot-fleet/bots/sustainabot/bot-integration/ +src/*.res`) can be run ad hoc through the CLI without changes to the +test suite. ## Non-goals diff --git a/tools/res-to-affine/main.ml b/tools/res-to-affine/main.ml index 89128409..8eb22706 100644 --- a/tools/res-to-affine/main.ml +++ b/tools/res-to-affine/main.ml @@ -87,8 +87,8 @@ let output_arg = let engine_arg = let doc = - "Detection engine: 'scanner' (default, line-regex, Phase 1) or \ - 'walker' (tree-sitter AST, Phase 2). The walker requires the \ + "Detection engine: 'walker' (default, tree-sitter AST, Phase 2c) \ + or 'scanner' (line-regex, Phase 1). The walker requires the \ vendored grammar to be built — see `just install-grammar`. \ Falls back to 'scanner' if the grammar is missing or \ tree-sitter parse fails." @@ -96,7 +96,7 @@ let engine_arg = Cmdliner.Arg.( value & opt (enum ["scanner", Scanner_engine; "walker", Walker_engine]) - Scanner_engine & + Walker_engine & info ["engine"] ~docv:"ENGINE" ~doc) let grammar_dir_arg = diff --git a/tools/res-to-affine/scanner.ml b/tools/res-to-affine/scanner.ml index 07b24d70..bd69adbc 100644 --- a/tools/res-to-affine/scanner.ml +++ b/tools/res-to-affine/scanner.ml @@ -6,12 +6,16 @@ type kind = | Raw_js | Untyped_exception | Mutable_global + | Inline_callback_record + | Oversized_function let kind_to_label = function - | Side_effect_import -> "side-effect-import" - | Raw_js -> "raw-js" - | Untyped_exception -> "untyped-exception" - | Mutable_global -> "mutable-global" + | Side_effect_import -> "side-effect-import" + | Raw_js -> "raw-js" + | Untyped_exception -> "untyped-exception" + | Mutable_global -> "mutable-global" + | Inline_callback_record -> "inline-callback-record" + | Oversized_function -> "oversized-function" let kind_to_guidance = function | Side_effect_import -> @@ -29,6 +33,14 @@ let kind_to_guidance = function "Top-level mutable global. AffineScript does not encourage \ module-scoped mutation; pass state as an affine record or \ through an effect handler." + | Inline_callback_record -> + "3+ inline callbacks in one record/call. Lift to a row-polymorphic \ + handler record (see LESSONS.md §callback-record) so each handler \ + is named and individually overridable." + | Oversized_function -> + "Function spans >50 source lines. Re-decompose before porting; a \ + direct translation will preserve the size and the implicit \ + contract it bakes in." type finding = { kind : kind; diff --git a/tools/res-to-affine/scanner.mli b/tools/res-to-affine/scanner.mli index f0e4889f..d87fbb6e 100644 --- a/tools/res-to-affine/scanner.mli +++ b/tools/res-to-affine/scanner.mli @@ -10,21 +10,24 @@ using the vendored [editors/tree-sitter-rescript] grammar. The [Emitter] interface is stable across the two implementations. - Patterns detected today: + Patterns detected today by the Phase-1 scanner (line/regex): - {b side-effect import} : [let _ = Mod.foo] (ReScript module-load hack) - {b raw JS} : any line containing [%raw] (typed FFI required) - {b untyped exception} : [Promise.catch], [Js.Exn], [raise], [try] - {b mutable global} : top-level [ref] or [:=] assignment - Deferred to Phase 2 (need real AST): - - {b inline lambda callback record} : N>=3 [~handler: (...) =>] in a record - - {b oversized function} : function body >50 LOC *) + Additional kinds detected by the Phase-2 walker only (need real AST): + - {b inline callback record} : N>=3 inline function values in a record + literal or a single call's labelled-argument list + - {b oversized function} : function spans more than 50 source lines *) type kind = | Side_effect_import | Raw_js | Untyped_exception | Mutable_global + | Inline_callback_record + | Oversized_function val kind_to_label : kind -> string (** Short tag used in emitted comment markers (e.g. ["side-effect-import"]). *) diff --git a/tools/res-to-affine/test/fixtures/phase2c.res b/tools/res-to-affine/test/fixtures/phase2c.res new file mode 100644 index 00000000..9dddb3c3 --- /dev/null +++ b/tools/res-to-affine/test/fixtures/phase2c.res @@ -0,0 +1,83 @@ +// SPDX-License-Identifier: MIT +// Synthetic fixture for the two anti-patterns that were explicitly +// deferred from Phase 1 entirely because they need real AST: +// 1. inline-callback-record — 3+ inline function values in a single +// record literal or a single call's labelled-argument list +// 2. oversized-function — function spans more than 50 source rows +// Walker-only by construction; the line-regex scanner does not detect +// either pattern. + +open Types + +// --- inline-callback-record: a record literal with 4 inline lambdas +let handlers = { + onMount: () => Js.log("mounted"), + onUnmount: () => Js.log("unmounted"), + onClick: e => Js.log(e), + onHover: e => Js.log(e), +} + +// --- inline-callback-record at a call site: 3 labelled-argument lambdas +let _ = Widget.make( + ~onMount=() => Js.log("mounted"), + ~onUnmount=() => Js.log("unmounted"), + ~onClick=e => Js.log(e), +) + +// --- oversized-function: 60 source-row span. Body intentionally tedious +// to surface the row-span proxy the walker uses. +let huge = id => { + let a01 = id + 1 + let a02 = a01 + 1 + let a03 = a02 + 1 + let a04 = a03 + 1 + let a05 = a04 + 1 + let a06 = a05 + 1 + let a07 = a06 + 1 + let a08 = a07 + 1 + let a09 = a08 + 1 + let a10 = a09 + 1 + let a11 = a10 + 1 + let a12 = a11 + 1 + let a13 = a12 + 1 + let a14 = a13 + 1 + let a15 = a14 + 1 + let a16 = a15 + 1 + let a17 = a16 + 1 + let a18 = a17 + 1 + let a19 = a18 + 1 + let a20 = a19 + 1 + let a21 = a20 + 1 + let a22 = a21 + 1 + let a23 = a22 + 1 + let a24 = a23 + 1 + let a25 = a24 + 1 + let a26 = a25 + 1 + let a27 = a26 + 1 + let a28 = a27 + 1 + let a29 = a28 + 1 + let a30 = a29 + 1 + let a31 = a30 + 1 + let a32 = a31 + 1 + let a33 = a32 + 1 + let a34 = a33 + 1 + let a35 = a34 + 1 + let a36 = a35 + 1 + let a37 = a36 + 1 + let a38 = a37 + 1 + let a39 = a38 + 1 + let a40 = a39 + 1 + let a41 = a40 + 1 + let a42 = a41 + 1 + let a43 = a42 + 1 + let a44 = a43 + 1 + let a45 = a44 + 1 + let a46 = a45 + 1 + let a47 = a46 + 1 + let a48 = a47 + 1 + let a49 = a48 + 1 + let a50 = a49 + 1 + let a51 = a50 + 1 + let a52 = a51 + 1 + a52 +} diff --git a/tools/res-to-affine/test/test_walker.ml b/tools/res-to-affine/test/test_walker.ml index a83d21cd..6b329b80 100644 --- a/tools/res-to-affine/test/test_walker.ml +++ b/tools/res-to-affine/test/test_walker.ml @@ -105,6 +105,86 @@ let test_walker_only_module_toplevel () = "walker reports the line-8 import and only that one" [8] side_effect_lines +(* ---- Phase 2c parity: scanner-detected kinds via walker ------------------- + + The synthetic [fixtures/sample.res] exercises all four Phase-1 kinds. + After Phase 2c, the walker must report each of them too (lines per + the fixture comments). *) + +let scan_sample () = + let source = read_file fixture in + let path = Filename.concat (Sys.getcwd ()) fixture in + Walker.scan ~grammar_dir:(grammar_dir ()) ~path ~source + +let lines_for_kind findings k = + List.filter_map + (fun (f : Scanner.finding) -> + if f.kind = k then Some f.line else None) + findings + +let test_walker_finds_raw_js () = + skip_unless_ready (); + let findings = scan_sample () in + (* sample.res line 11: `let host = %raw(`globalThis.location.host`)`. *) + Alcotest.(check (list int)) + "walker reports raw-js on line 11" + [11] (lines_for_kind findings Scanner.Raw_js) + +let test_walker_finds_mutable_global () = + skip_unless_ready (); + let findings = scan_sample () in + let got = lines_for_kind findings Scanner.Mutable_global in + (* sample.res line 14: `let currentUser = ref(None)` — top-level + ref-binding; line 15: `currentUser := Some("alice")` — top-level + mutation. Both are mutable-global. *) + Alcotest.(check bool) + "walker reports mutable-global on lines 14 and 15" + true (List.mem 14 got && List.mem 15 got) + +let test_walker_finds_untyped_exception () = + skip_unless_ready (); + let findings = scan_sample () in + let got = lines_for_kind findings Scanner.Untyped_exception in + (* sample.res has untyped-exception flavours at: line 19 (`try {`), + line 22 (`Js.Exn.Error(_)`), line 28 (`Promise.catch(...)`). *) + let want_subset = [19; 22; 28] in + let missing = List.filter (fun l -> not (List.mem l got)) want_subset in + Alcotest.(check (list int)) + "walker reports untyped-exception on lines 19, 22, 28" + [] missing + +(* ---- Phase 2c new kinds: inline-callback-record + oversized-function ------ + + Synthetic fixture [phase2c.res] specifically exercises the two + anti-patterns deferred from Phase 1 entirely. These are walker-only + by construction; the scanner does not detect them. *) + +let phase2c_fixture = "fixtures/phase2c.res" + +let test_walker_finds_inline_callback_record () = + skip_unless_ready (); + let source = read_file phase2c_fixture in + let path = Filename.concat (Sys.getcwd ()) phase2c_fixture in + let findings = + Walker.scan ~grammar_dir:(grammar_dir ()) ~path ~source + in + let got = lines_for_kind findings Scanner.Inline_callback_record in + Alcotest.(check bool) + "walker reports inline-callback-record at least once on phase2c.res" + true (got <> []) + +let test_walker_finds_oversized_function () = + skip_unless_ready (); + let source = read_file phase2c_fixture in + let path = Filename.concat (Sys.getcwd ()) phase2c_fixture in + let findings = + Walker.scan ~grammar_dir:(grammar_dir ()) ~path ~source + in + let got = lines_for_kind findings Scanner.Oversized_function in + Alcotest.(check bool) + "walker reports oversized-function at least once on phase2c.res" + true (got <> []) + (* ---- s-exp parser sanity (NOT gated; pure OCaml) --------------------------- The walker subprocess is shelled out in the gated tests above. The @@ -117,11 +197,28 @@ let test_walker_only_module_toplevel () = let () = Alcotest.run "res-to-affine-walker" [ - ( "walker", + ( "walker-side-effect-import", [ Alcotest.test_case "side-effect-import found on sample.res" `Quick test_walker_finds_side_effect_import; Alcotest.test_case "module-toplevel-only, correct line" `Quick test_walker_only_module_toplevel; ] ); + ( "walker-phase2c-parity", + [ + Alcotest.test_case "raw-js found on sample.res line 11" + `Quick test_walker_finds_raw_js; + Alcotest.test_case "mutable-global found on sample.res lines 14+15" + `Quick test_walker_finds_mutable_global; + Alcotest.test_case + "untyped-exception found on sample.res lines 19/22/28" + `Quick test_walker_finds_untyped_exception; + ] ); + ( "walker-phase2c-new-kinds", + [ + Alcotest.test_case "inline-callback-record found on phase2c.res" + `Quick test_walker_finds_inline_callback_record; + Alcotest.test_case "oversized-function found on phase2c.res" + `Quick test_walker_finds_oversized_function; + ] ); ] diff --git a/tools/res-to-affine/walker.ml b/tools/res-to-affine/walker.ml index 7845d060..c6b4ff5c 100644 --- a/tools/res-to-affine/walker.ml +++ b/tools/res-to-affine/walker.ml @@ -221,25 +221,32 @@ let run_tree_sitter ~grammar_dir ~path : (string, string) result = Error "tree-sitter parse stopped" end -(* ---- detection: side-effect-import ---------------------------------------- - - AST shape we look for: - - (let_declaration ... - (let_binding ... - pattern: (value_identifier ...) <- text is exactly "_" - body: (...))) <- subtree contains a module- - qualified access (value_- - identifier_path or - member_expression with - uppercase head) - - The let_declaration must be at *module top level*: parent is - source_file, or parent is a block that is the body of a - module_declaration. let _ = X.f() inside a function body is a - normal "discard return value" idiom, not the module-load side- - effect anti-pattern; the regex band-aid in #319 was specifically - to suppress these — the walker eliminates them structurally. *) +(* ---- detection ------------------------------------------------------------ + + Phase 2b (PR #322) ported Side_effect_import alone. Phase 2c + extends the walker to the remaining three Phase-1 kinds + (Raw_js, Untyped_exception, Mutable_global) plus the two kinds + that were explicitly deferred from Phase 1 because they need + real AST (Inline_callback_record, Oversized_function), and flips + --engine=walker to the default in main.ml. + + Node types used below come from rescript-lang/tree-sitter-rescript + at the pinned commit 990214a (v6.0.0). The grammar names: + - let_declaration / let_binding (pattern + body fields) + - extension_expression (the `%raw` shape) + - try_expression (try/catch) + - call_expression (function/arguments fields) (raise(), ref()) + - member_expression / value_identifier_path (Js.Exn, Promise.catch) + - mutation_expression (x := y) + - record / record_field + - labeled_argument (label/value fields) + - function (arrow + named fns) + + "Module top level" means: walking ancestors outward from the + current node, no [function] or [let_binding] body wrapper + appears before [source_file] or [module_declaration]. Anything + inside a function body, or inside the body of an outer + binding, is by definition not a module-load anti-pattern. *) (* Slice source text out by [pos] range. *) let slice ~source ~start ~stop = @@ -320,39 +327,38 @@ let pattern_and_body ~lb = in pattern, body -(* Walk with ancestor context. [ancestors] lists ancestor node types from - immediate parent outward; the head is the immediate parent. *) -let rec collect ~source ancestors acc node = - let acc = - if node.ntype = "let_declaration" then - let at_module_toplevel = - match ancestors with - | "source_file" :: _ -> true - | "block" :: parent :: _ when parent = "module_declaration" -> true - | _ -> false - in - if at_module_toplevel then - List.fold_left - (fun acc c -> - if c.ntype = "let_binding" then check_binding ~source acc c - else acc) - acc node.children - else acc - else acc - in - List.fold_left - (fun acc c -> collect ~source (node.ntype :: ancestors) acc c) - acc node.children +(* True iff the current node sits at module top level — i.e. no + [function] or [let_binding] body intervenes between the current + node and the enclosing [source_file] or [module_declaration]. + [ancestors] is parent-first (head is the immediate parent). *) +let rec at_module_toplevel = function + | [] -> true + | ("function" | "let_binding") :: _ -> false + | "source_file" :: _ -> true + | "module_declaration" :: _ -> true + | _ :: rest -> at_module_toplevel rest + +let node_text ~source n = + String.trim (slice ~source ~start:n.start ~stop:n.stop) + +let starts_with prefix s = + let pn = String.length prefix in + String.length s >= pn && String.sub s 0 pn = prefix -and check_binding ~source acc lb = +let ends_with suffix s = + let sn = String.length suffix in + let n = String.length s in + n >= sn && String.sub s (n - sn) sn = suffix + +(* ---- side-effect-import (Phase 2b, preserved) ---- *) + +let check_let_binding_side_effect ~source acc lb = match pattern_and_body ~lb with | Some pat, Some body when is_underscore_pattern ~source pat && body_has_module_access ~source body -> let excerpt = - truncate - (String.trim - (slice ~source ~start:lb.start ~stop:lb.stop)) + truncate (String.trim (slice ~source ~start:lb.start ~stop:lb.stop)) in let finding : Scanner.finding = { kind = Scanner.Side_effect_import @@ -363,6 +369,200 @@ and check_binding ~source acc lb = finding :: acc | _ -> acc +let detect_side_effect_import ~source ancestors acc node = + if node.ntype = "let_declaration" && at_module_toplevel ancestors then + List.fold_left + (fun acc c -> + if c.ntype = "let_binding" + then check_let_binding_side_effect ~source acc c + else acc) + acc node.children + else acc + +(* ---- raw-js: any extension_expression (covers %raw, %bs.raw, ...) ---- *) + +let detect_raw_js ~source acc node = + if node.ntype = "extension_expression" then + let finding : Scanner.finding = + { kind = Scanner.Raw_js + ; line = node.start.row + 1 + ; excerpt = truncate (node_text ~source node) + } + in + finding :: acc + else acc + +(* ---- untyped-exception: try, raise(), Promise.catch, Js.Exn ---- *) + +let detect_untyped_exception ~source acc node = + let push acc = + let finding : Scanner.finding = + { kind = Scanner.Untyped_exception + ; line = node.start.row + 1 + ; excerpt = truncate (node_text ~source node) + } + in + finding :: acc + in + match node.ntype with + | "try_expression" -> push acc + | "call_expression" -> + let fn = + List.find_opt (fun c -> c.field = Some "function") node.children + in + (match fn with + | Some f when node_text ~source f = "raise" -> push acc + | _ -> acc) + | "member_expression" | "value_identifier_path" -> + let text = node_text ~source node in + let hits_js_exn = starts_with "Js.Exn" text in + let hits_promise_catch = + text = "Promise.catch" || ends_with "Promise.catch" text + in + if hits_js_exn || hits_promise_catch then push acc + else acc + | _ -> acc + +(* ---- mutable-global: top-level [let x = ref(...)] or top-level [:=] ---- *) + +let is_ref_call ~source n = + if n.ntype <> "call_expression" then false + else + match List.find_opt (fun c -> c.field = Some "function") n.children with + | Some f -> node_text ~source f = "ref" + | None -> false + +let detect_mutable_global ~source ancestors acc node = + if not (at_module_toplevel ancestors) then acc + else + match node.ntype with + | "let_declaration" -> + List.fold_left + (fun acc c -> + if c.ntype <> "let_binding" then acc + else + match pattern_and_body ~lb:c with + | _, Some body when is_ref_call ~source body -> + let excerpt = + truncate + (String.trim (slice ~source ~start:c.start ~stop:c.stop)) + in + let finding : Scanner.finding = + { kind = Scanner.Mutable_global + ; line = c.start.row + 1 + ; excerpt + } + in + finding :: acc + | _ -> acc) + acc node.children + | "mutation_expression" -> + let finding : Scanner.finding = + { kind = Scanner.Mutable_global + ; line = node.start.row + 1 + ; excerpt = truncate (node_text ~source node) + } + in + finding :: acc + | _ -> acc + +(* ---- inline-callback-record: >=3 inline function values in a single + record literal or a single call's argument list ---- *) + +let count_inline_function_values children = + List.fold_left + (fun n c -> + match c.ntype with + | "function" -> n + 1 + | "labeled_argument" | "record_field" | "field" -> + if List.exists (fun cc -> cc.ntype = "function") c.children + then n + 1 + else n + | _ -> n) + 0 children + +let call_argument_children node = + match List.find_opt (fun c -> c.field = Some "arguments") node.children with + | Some args -> args.children + | None -> [] + +let inline_callback_threshold = 3 + +let detect_inline_callback_record ~source acc node = + let container_children = + match node.ntype with + | "call_expression" -> Some (call_argument_children node) + | "record" -> Some node.children + | _ -> None + in + match container_children with + | None -> acc + | Some children -> + if count_inline_function_values children >= inline_callback_threshold then + let finding : Scanner.finding = + { kind = Scanner.Inline_callback_record + ; line = node.start.row + 1 + ; excerpt = truncate (node_text ~source node) + } + in + finding :: acc + else acc + +(* ---- oversized-function: row span > 50 ---- + + Source-row span as a proxy for "function body >50 LOC". A direct + line count of the body subtree would be more precise but spans + are cheap and sufficient for surfacing decomposition candidates; + precision belongs to Phase 3 where the tool actually rewrites + bodies. The threshold matches LESSONS.md's stated heuristic. *) + +let oversized_row_threshold = 50 + +let detect_oversized_function ~source acc node = + if node.ntype <> "function" then acc + else + let span = node.stop.row - node.start.row + 1 in + if span > oversized_row_threshold then + let finding : Scanner.finding = + { kind = Scanner.Oversized_function + ; line = node.start.row + 1 + ; excerpt = truncate (node_text ~source node) + } + in + finding :: acc + else acc + +(* ---- master walker -------------------------------------------------------- *) + +(* Visit every node; dispatch to every detector. [ancestors] is the + list of ancestor node types from immediate parent outward. *) +let rec collect ~source ancestors acc node = + let acc = detect_side_effect_import ~source ancestors acc node in + let acc = detect_raw_js ~source acc node in + let acc = detect_untyped_exception ~source acc node in + let acc = detect_mutable_global ~source ancestors acc node in + let acc = detect_inline_callback_record ~source acc node in + let acc = detect_oversized_function ~source acc node in + List.fold_left + (fun acc c -> collect ~source (node.ntype :: ancestors) acc c) + acc node.children + +(* Dedupe by (kind, line). The walker visits structural overlaps that + the line-based scanner would not — e.g. a Js.Exn member expression + nested inside a try_expression yields two Untyped_exception + findings on the same line; we keep one. *) +let dedupe (findings : Scanner.finding list) : Scanner.finding list = + let seen = Hashtbl.create 16 in + List.filter + (fun (f : Scanner.finding) -> + let key = (f.kind, f.line) in + if Hashtbl.mem seen key then false + else begin + Hashtbl.add seen key (); + true + end) + findings + (* ---- public entry point --------------------------------------------------- *) let scan ~grammar_dir ~path ~source = @@ -370,12 +570,13 @@ let scan ~grammar_dir ~path ~source = | Error msg -> failwith ("res-to-affine walker: " ^ msg) | Ok output -> let root = - try parse_sexp output with Parse_error m -> + try parse_sexp output + with Parse_error m -> failwith ("res-to-affine walker: s-exp parse failed — " ^ m) in let findings = collect ~source [] [] root in - (* Findings accumulated in reverse order; emitter expects - source order (line ascending). *) + let findings = dedupe findings in List.sort - (fun (a : Scanner.finding) (b : Scanner.finding) -> compare a.line b.line) + (fun (a : Scanner.finding) (b : Scanner.finding) -> + compare a.line b.line) findings diff --git a/tools/res-to-affine/walker.mli b/tools/res-to-affine/walker.mli index 17fc6af0..2e214fc7 100644 --- a/tools/res-to-affine/walker.mli +++ b/tools/res-to-affine/walker.mli @@ -8,14 +8,25 @@ [editors/tree-sitter-rescript/] grammar. The two pipelines share [Scanner.kind] and [Scanner.finding] so the emitter is unchanged. - Phase 2b ports a single anti-pattern: [Side_effect_import]. Phase - 2c ports the remaining three. The walker's discovery of an - anti-pattern is strictly stronger than the regex's: it requires - the [let _ = Mod.value] shape to live at *module top level* (the - actual anti-pattern's structural shape), not just to match a - column-0 prefix on any line. This eliminates the [let _ = - chained.call()] false-positive class that the Phase-1 regex - band-aided in #319. *) + Phase 2b ported a single anti-pattern: [Side_effect_import]. + Phase 2c (this revision) extends the walker to: + + - the remaining three Phase-1 kinds — [Raw_js], + [Untyped_exception], [Mutable_global] — via AST shapes + ([extension_expression], [try_expression]/[call_expression] + [raise]/member-expression [Promise.catch]/[Js.Exn], + [let_declaration] with [ref] body and [mutation_expression]); + - the two kinds that were explicitly deferred from Phase 1 + because they need real AST — [Inline_callback_record] + (≥3 inline function values in a single record or call) and + [Oversized_function] (function spans >50 source rows); + + The walker's discovery is strictly stronger than the regex's + on the three ported kinds: it requires structural module-top- + level placement for [Mutable_global], it tells [try {…}] apart + from the identifier [try] in scanner-only false-positive + contexts, and it does not double-fire on the same line for + structurally-nested matches. *) val scan : grammar_dir:string ->