Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
103 changes: 55 additions & 48 deletions tools/res-to-affine/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,14 @@ and the broader `idaptik` migration.
## Usage

```sh
# print skeleton to stdout (default: regex scanner)
# print skeleton to stdout (default: tree-sitter AST walker, Phase 2c)
dune exec tools/res-to-affine/main.exe -- path/to/Foo.res

# or write to a file
dune exec tools/res-to-affine/main.exe -- path/to/Foo.res -o Foo.affine

# opt into the Phase-2 tree-sitter AST walker
dune exec tools/res-to-affine/main.exe -- --engine=walker path/to/Foo.res
# opt back into the Phase-1 line-regex scanner (no grammar required)
dune exec tools/res-to-affine/main.exe -- --engine=scanner path/to/Foo.res
```

The output is **not compilable**. It is a starting point for the human:
Expand All @@ -36,8 +36,8 @@ needs re-decomposing.

| `--engine` | Implementation | When to use |
|---|---|---|
| `scanner` (default) | Line-anchored regex over the raw source (`scanner.ml`). | Default — no prerequisites, ships with the binary. |
| `walker` | Shells out to the vendored `tree-sitter` CLI, walks the AST (`walker.ml`). | When the regex's false-positive surface matters — eliminates the `let _ = chained.call()` class of misfire that the column-0 anchor in #319 had to band-aid. |
| `walker` (default) | Shells out to the vendored `tree-sitter` CLI, walks the AST (`walker.ml`). | Default since Phase 2c — covers all six anti-patterns including the two that the scanner cannot see (inline callback records, oversized functions) and eliminates the `let _ = chained.call()` / line-anchored false-positive classes. |
| `scanner` | Line-anchored regex over the raw source (`scanner.ml`). | Fallback when the vendored grammar is unavailable (no `tree-sitter` CLI, missing `tools/vendor/tree-sitter-rescript/`). Detects four of the six anti-patterns only. |

The walker requires the vendored `tree-sitter-rescript` grammar to be
built first:
Expand All @@ -50,42 +50,37 @@ just install-grammar
If the grammar isn't built or the `tree-sitter` CLI isn't on PATH, the
walker auto-falls-back to the scanner and prints the reason to stderr.

## What gets flagged (Phase 1)
## What gets flagged

The six anti-patterns surfaced in the
[idaptik Wave 3 pilot](https://github.com/hyperpolymath/idaptik/blob/main/migration/main/LESSONS.md),
of which the line-based scanner reliably detects four:
[idaptik Wave 3 pilot](https://github.com/hyperpolymath/idaptik/blob/main/migration/main/LESSONS.md):

| Tag | Detection | AffineScript answer |
| Tag | Detection (walker, default) | AffineScript answer |
|---|---|---|
| `side-effect-import` | `let _ = Mod.foo` at top level | Explicit registration call |
| `raw-js` | `%raw(...)` or `[%bs.raw ...]` | Typed extern (`ABI-FFI-README.md`) |
| `untyped-exception` | `Promise.catch`, `Js.Exn`, `raise`, `try` | `Result[E, A]` / `Validation[E, A]` |
| `mutable-global` | `:=` operator | Affine record threaded through |
| `side-effect-import` | `let _ = Mod.foo` at module top level (structural — not nested inside a function body) | Explicit registration call |
| `raw-js` | `extension_expression` node — any `%name(...)` or `[%bs.name ...]` | Typed extern (`ABI-FFI-README.md`) |
| `untyped-exception` | `try_expression`, `raise(...)` call, `Js.Exn.*` reference, `Promise.catch` member access | `Result[E, A]` / `Validation[E, A]` |
| `mutable-global` | Top-level `let x = ref(...)` (call-of-`ref` body) OR top-level `mutation_expression` (`x := y`) | Affine record threaded through |
| `inline-callback-record` | ≥ 3 inline `function` values in one `record` literal OR one call's `arguments` list (via `labeled_argument` or direct) | Row-polymorphic handler record (LESSONS.md §callback-record) |
| `oversized-function` | `function` node whose row span exceeds 50 source lines | Re-decompose before porting; do not transliterate |

Deferred to Phase 2 (need real AST):

- **inline lambda callback record** — N ≥ 3 `~handler: (...) =>` lambdas
inside one record literal (collapse to a row-polymorphic record).
- **oversized function** — function body > ~50 LOC (decompose).

### Walker coverage (Phase 2)
### Walker vs scanner coverage

| Anti-pattern | Scanner (regex) | Walker (AST) |
|---|---|---|
| `side-effect-import` | ✓ | ✓ Phase 2b |
| `raw-js` | ✓ | Phase 2c |
| `untyped-exception` | ✓ | Phase 2c |
| `mutable-global` | ✓ | Phase 2c |
| inline lambda callback record | — | Phase 2c |
| oversized function | — | Phase 2c |

The walker improves on the regex by being structural: it only reports
`side-effect-import` when `let _ = Mod.value` sits at module top level
(direct child of `source_file` or a `module_declaration` body), not when
the same shape appears inside a function body — where it is normally a
ReScript "discard the return value of a chained call" idiom, not a
module-load side effect.
| `side-effect-import` | ✓ | ✓ (since Phase 2b, #322) |
| `raw-js` | ✓ | ✓ (since Phase 2c) |
| `untyped-exception` | ✓ | ✓ (since Phase 2c) |
| `mutable-global` | ✓ | ✓ (since Phase 2c) |
| `inline-callback-record` | — | ✓ (since Phase 2c, walker-only by construction) |
| `oversized-function` | — | ✓ (since Phase 2c, walker-only by construction) |

The walker improves on the regex by being structural: it reports
`side-effect-import` only when `let _ = Mod.value` sits at module top
level, distinguishes a `try { ... }` expression from the identifier
`try`, only flags `Mutable_global` for module-scoped state (not local
refs inside a function body), and dedupes structurally-overlapping
findings on the same line.

## Why a skeleton and not a transliteration

Expand Down Expand Up @@ -119,18 +114,27 @@ to tree-sitter in Phase 2 behind something that already pays its way.

### Phase 2 — tree-sitter AST walker

- Install the pinned grammar from
`editors/tree-sitter-rescript/` (manifest-only vendoring of
`rescript-lang/tree-sitter-rescript@990214a`).
- Replace `Scanner` with a walker over the s-expression output of
`tree-sitter parse --quiet`, parsed by the existing `sexplib0`
dependency.
- Adds the two deferred patterns (callback records, oversized
functions) and unlocks **structural** translation of trivial forms
(e.g. `option<X>` → `Option[X]`, `result<X, Y>` → `Result[Y, X]`,
`switch x { | A => ... }` → `match x { A => ... }`).
- The `Emitter` interface does not change: same skeleton shape, same
marker schema, richer body.
Vendoring of the pinned grammar
(`rescript-lang/tree-sitter-rescript@990214a`) lives in
`editors/tree-sitter-rescript/`; `install.sh` materialises the
parser into `tools/vendor/tree-sitter-rescript/`.

- **Phase 2a (#321)** — `just install-grammar`, the
`migration-assistant` CI job that runs it, dual install path
(`cargo install tree-sitter-cli` or `npm install -g
tree-sitter-cli`).
- **Phase 2b (#322)** — the walker itself: subprocess to the
`tree-sitter` CLI, hand-rolled s-expression parser over the
default `[row, col]`-annotated output, AST-based detection of
`side-effect-import` only.
- **Phase 2c (this revision)** — walker covers all six anti-patterns
including the two that the scanner cannot see; `--engine=walker`
becomes the CLI default. The `Emitter` interface does not change;
the marker schema is the same.

Walker output is deduplicated by `(kind, line)` so structurally-
overlapping AST matches don't inflate the bullet count above what
the line-based scanner would produce on the same file.

### Phase 3 — partial translation

Expand Down Expand Up @@ -166,9 +170,12 @@ cd tools/res-to-affine/test
```

The fixture under `test/fixtures/sample.res` is synthetic and exercises
every Phase-1 anti-pattern. Real `.res` files from the estate (e.g.
`gitbot-fleet/bots/sustainabot/bot-integration/src/*.res`) can be run
ad hoc through the CLI without changes to the test suite.
every Phase-1 anti-pattern; `test/fixtures/phase2c.res` exercises the
two anti-patterns that are walker-only by construction
(`inline-callback-record`, `oversized-function`). Real `.res` files
from the estate (e.g. `gitbot-fleet/bots/sustainabot/bot-integration/
src/*.res`) can be run ad hoc through the CLI without changes to the
test suite.

## Non-goals

Expand Down
6 changes: 3 additions & 3 deletions tools/res-to-affine/main.ml
Original file line number Diff line number Diff line change
Expand Up @@ -87,16 +87,16 @@ let output_arg =

let engine_arg =
let doc =
"Detection engine: 'scanner' (default, line-regex, Phase 1) or \
'walker' (tree-sitter AST, Phase 2). The walker requires the \
"Detection engine: 'walker' (default, tree-sitter AST, Phase 2c) \
or 'scanner' (line-regex, Phase 1). The walker requires the \
vendored grammar to be built — see `just install-grammar`. \
Falls back to 'scanner' if the grammar is missing or \
tree-sitter parse fails."
in
Cmdliner.Arg.(
value & opt
(enum ["scanner", Scanner_engine; "walker", Walker_engine])
Scanner_engine &
Walker_engine &
info ["engine"] ~docv:"ENGINE" ~doc)

let grammar_dir_arg =
Expand Down
20 changes: 16 additions & 4 deletions tools/res-to-affine/scanner.ml
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,16 @@ type kind =
| Raw_js
| Untyped_exception
| Mutable_global
| Inline_callback_record
| Oversized_function

let kind_to_label = function
| Side_effect_import -> "side-effect-import"
| Raw_js -> "raw-js"
| Untyped_exception -> "untyped-exception"
| Mutable_global -> "mutable-global"
| Side_effect_import -> "side-effect-import"
| Raw_js -> "raw-js"
| Untyped_exception -> "untyped-exception"
| Mutable_global -> "mutable-global"
| Inline_callback_record -> "inline-callback-record"
| Oversized_function -> "oversized-function"

let kind_to_guidance = function
| Side_effect_import ->
Expand All @@ -29,6 +33,14 @@ let kind_to_guidance = function
"Top-level mutable global. AffineScript does not encourage \
module-scoped mutation; pass state as an affine record or \
through an effect handler."
| Inline_callback_record ->
"3+ inline callbacks in one record/call. Lift to a row-polymorphic \
handler record (see LESSONS.md §callback-record) so each handler \
is named and individually overridable."
| Oversized_function ->
"Function spans >50 source lines. Re-decompose before porting; a \
direct translation will preserve the size and the implicit \
contract it bakes in."

type finding = {
kind : kind;
Expand Down
11 changes: 7 additions & 4 deletions tools/res-to-affine/scanner.mli
Original file line number Diff line number Diff line change
Expand Up @@ -10,21 +10,24 @@
using the vendored [editors/tree-sitter-rescript] grammar. The
[Emitter] interface is stable across the two implementations.

Patterns detected today:
Patterns detected today by the Phase-1 scanner (line/regex):
- {b side-effect import} : [let _ = Mod.foo] (ReScript module-load hack)
- {b raw JS} : any line containing [%raw] (typed FFI required)
- {b untyped exception} : [Promise.catch], [Js.Exn], [raise], [try]
- {b mutable global} : top-level [ref] or [:=] assignment

Deferred to Phase 2 (need real AST):
- {b inline lambda callback record} : N>=3 [~handler: (...) =>] in a record
- {b oversized function} : function body >50 LOC *)
Additional kinds detected by the Phase-2 walker only (need real AST):
- {b inline callback record} : N>=3 inline function values in a record
literal or a single call's labelled-argument list
- {b oversized function} : function spans more than 50 source lines *)

type kind =
| Side_effect_import
| Raw_js
| Untyped_exception
| Mutable_global
| Inline_callback_record
| Oversized_function

val kind_to_label : kind -> string
(** Short tag used in emitted comment markers (e.g. ["side-effect-import"]). *)
Expand Down
83 changes: 83 additions & 0 deletions tools/res-to-affine/test/fixtures/phase2c.res
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
// SPDX-License-Identifier: MIT
// Synthetic fixture for the two anti-patterns that were explicitly
// deferred from Phase 1 entirely because they need real AST:
// 1. inline-callback-record — 3+ inline function values in a single
// record literal or a single call's labelled-argument list
// 2. oversized-function — function spans more than 50 source rows
// Walker-only by construction; the line-regex scanner does not detect
// either pattern.

open Types

// --- inline-callback-record: a record literal with 4 inline lambdas
let handlers = {
onMount: () => Js.log("mounted"),
onUnmount: () => Js.log("unmounted"),
onClick: e => Js.log(e),
onHover: e => Js.log(e),
}

// --- inline-callback-record at a call site: 3 labelled-argument lambdas
let _ = Widget.make(
~onMount=() => Js.log("mounted"),
~onUnmount=() => Js.log("unmounted"),
~onClick=e => Js.log(e),
)

// --- oversized-function: 60 source-row span. Body intentionally tedious
// to surface the row-span proxy the walker uses.
let huge = id => {
let a01 = id + 1
let a02 = a01 + 1
let a03 = a02 + 1
let a04 = a03 + 1
let a05 = a04 + 1
let a06 = a05 + 1
let a07 = a06 + 1
let a08 = a07 + 1
let a09 = a08 + 1
let a10 = a09 + 1
let a11 = a10 + 1
let a12 = a11 + 1
let a13 = a12 + 1
let a14 = a13 + 1
let a15 = a14 + 1
let a16 = a15 + 1
let a17 = a16 + 1
let a18 = a17 + 1
let a19 = a18 + 1
let a20 = a19 + 1
let a21 = a20 + 1
let a22 = a21 + 1
let a23 = a22 + 1
let a24 = a23 + 1
let a25 = a24 + 1
let a26 = a25 + 1
let a27 = a26 + 1
let a28 = a27 + 1
let a29 = a28 + 1
let a30 = a29 + 1
let a31 = a30 + 1
let a32 = a31 + 1
let a33 = a32 + 1
let a34 = a33 + 1
let a35 = a34 + 1
let a36 = a35 + 1
let a37 = a36 + 1
let a38 = a37 + 1
let a39 = a38 + 1
let a40 = a39 + 1
let a41 = a40 + 1
let a42 = a41 + 1
let a43 = a42 + 1
let a44 = a43 + 1
let a45 = a44 + 1
let a46 = a45 + 1
let a47 = a46 + 1
let a48 = a47 + 1
let a49 = a48 + 1
let a50 = a49 + 1
let a51 = a50 + 1
let a52 = a51 + 1
a52
}
Loading
Loading