Skip to content

Normalize CLI v0.3.2

Latest

Choose a tag to compare

@github-actions github-actions released this 10 May 21:57
· 68 commits to master since this release

Fixed

  • musl release binary now runs on NixOS and non-FHS distros. The bundled
    runtime/libgcc_s.so.1 was previously Ubuntu's glibc-linked copy, which
    depends on ld-linux-x86-64.so.2 — absent on NixOS. The release workflow
    now sources libgcc_s.so.1 from Alpine Linux (a musl-based distro), so the
    library depends on libc.so (musl) instead. The zig/cargo-zigbuild approach
    introduced to work around this is reverted; the build uses musl-gcc again.

Changed

  • normalize context --help now shows comprehensive inline reference. The help output
    includes frontmatter format, --match dot-path syntax, --stdin/--prefix JSON injection,
    --file structured file loading, and examples. Previously the description was a one-liner.

Fixed

  • normalize context --help no longer shows a duplicate context subcommand.
    The default action method was named context (same as the parent service), causing
    normalize context context to appear in help output. The method is now hidden from
    the subcommand list (#[cli(hidden)]); normalize context continues to work as
    the default action with all flags hoisted to the parent command.

  • cargo xtask build-grammars --cc "zig cc -target x86_64-linux-musl" now works.
    The --cc argument is split on whitespace so compound compilers like zig cc -target x86_64-linux-musl are correctly parsed into program + arguments. Previously Command::new
    was called with the entire string as the binary name, causing "No such file or directory"
    for every grammar. zig's lld linker also requires --allow-shlib-undefined instead of
    --unresolved-symbols=ignore-in-shared-libs; the xtask now detects zig cc and emits the
    correct flag.

  • Grammar ABI mismatch after normalize update. ensure_grammars_first_use now
    reads the .installed-version stamp and compares it against the running binary's
    version. If they differ (e.g. after a self-update), the stamp is deleted and grammars
    are re-downloaded for the current binary before any command runs. Previously, an
    existing stamp caused the check to short-circuit unconditionally, leaving stale 0.2.x
    .so files loaded by a 0.3.x binary.

  • normalize update now invalidates the grammar stamp immediately after replacing
    the binary, so the next invocation triggers a grammar re-download even if the process
    exits before ensure_grammars_first_use runs.

  • Friendly error for removed [embeddings] config key. Loading .normalize/config.toml
    now pre-checks for [embeddings] (removed in 0.3.0) and exits with a clear migration
    message instead of a generic parse error.

Added

  • normalize edit extract-function <file> --lines <start>-<end> --name <name> [--apply] command. Extracts a line range from a function into a new function using CFG liveness analysis. Infers parameters (variables live into the region from outside) and return values (variables defined inside the region and live after it) via backward-dataflow fixed-point over the facts index. Checks cfg_effects for async, generator, defer, and acquire/release semantics; emits warnings for defer crossing boundary, unbalanced resource lifetime, and escaping exception edges. Generates language-appropriate source for Rust, Python, Go, TypeScript/JavaScript, and Java. Default is dry-run; --apply writes the changes. Requires normalize structure rebuild.

  • CFG Phase 4: type-refined exception flow. Edge now carries exception_type: Option<String> for EdgeKind::Exception edges (None = conservative/unknown; Some("T") = typed). The CFG builder captures @cfg.exit.throw.type and @cfg.try.catch.type from .cfg.scm queries to emit typed exception edges. Exception edges in build_try are emitted per catch type; ExitThrow edges carry the thrown type.

  • Exception type captures in 5 languages. Java: thrown type from object_creation_expression.type; catch type from catch_formal_parameter/catch_type/type_identifier (handles multi-catch IOException | SQLException). Python: thrown type from raise_statement/call.function; catch type from except_clause/identifier (single) and except_clause/as_pattern/tuple/identifier (multi). JavaScript/TypeScript/TSX: thrown type from throw_statement/new_expression.constructor; catch clauses are untyped (catches all). C++: thrown type from throw_statement/call_expression.function (identifier or qualified_identifier); catch type from catch_clause/parameter_list/parameter_declaration.type. C#: thrown type from throw_statement/object_creation_expression.type; catch type from catch_clause/catch_declaration.type.

  • exception_type column in cfg_edges SQLite table. Nullable TEXT column added to cfg_edges. Schema version bumped to 15. Both refresh_call_graph and reindex_files paths updated.

  • CfgEdgeFact.exception_type field. cfg_edge Datalog preamble relation extended to 7 fields: cfg_edge(file, func, func_line, from, to, kind, exception_type). liveness.dl updated to use the 7-field form (_, _ for the two new wildcards). relations.add_cfg_edge gains the exception_type parameter. Both facts.rs and runner.rs now load all_cfg_edges() from the index into Relations.

  • exception_flow.dl builtin Datalog rule. Derives exception_reaches(file, func, func_line, throw_block, catch_block, type), unhandled_exception(file, func, func_line, throw_block, type), and can_throw(file, func, func_line). Disabled by default; enable with normalize rules enable exception_flow.

  • Mermaid renderer shows exception type on edges. Exception edges now render as b3 -->|"exception: IOException"| b5 when a type is known, and b3 -->|"exception"| exit when conservative.

  • normalize analyze exceptions <file> [--function <name>] command. Reports throw sites with their exception type and the catch clauses they route to. Flags unhandled throws (escaping to function exit). Shows catch clauses with types and handled-throw counts (including "dead catch?" annotation for clauses that handle 0 throws). Requires normalize structure rebuild.

  • Cfg::throw_edges() helper. Returns an iterator over all EdgeKind::Exception edges in the CFG.

  • CFG Phase 3: effects tracking. BasicBlock now carries effects: Vec<Effect>. New EffectKind enum: Await, Defer, Yield, Acquire, Release, Send, Receive. New BlockKind variants: Deferred, Acquire, Release. New EdgeKind variants: Suspend, Resume. The builder collects @cfg.effect.* captures from .cfg.scm queries and assigns them to the enclosing block.

  • Effect queries for Rust, Python, TypeScript, JavaScript, Go. @cfg.effect.await on await_expression (Rust, TS, JS) and (await) (Python). @cfg.effect.yield on yield_expression (TS, JS) and (yield) (Python). @cfg.effect.acquire on with_statement (Python). Go effect queries: @cfg.effect.defer on defer_statement, @cfg.effect.send on go_statement and send_statement, @cfg.effect.receive on unary <- expressions.

  • cfg_effects SQLite table. New table in the structural index: cfg_effects (file, function_qname, function_start_line, block_id, kind, byte_offset, line, label). Populated by normalize structure rebuild. Schema version bumped to 14.

  • CfgEffectFact Datalog relation. New cfg_effect(file, func, func_line, block, kind, line, label) relation in normalize-facts-rules-api, exposed in the Datalog preamble and loaded in the rules run and facts pipelines.

  • effects.dl builtin rule. Derives async_function, defer_function, generator_function, resource_acquire, resource_leak from cfg_effect. Disabled by default; enable with normalize rules enable effects.

  • normalize analyze effects <file> [--function <name>] command. Reports suspension points, deferred calls, yields, resource acquisitions, and channel operations for functions in a file. Requires normalize structure rebuild.

  • normalize analyze liveness <file> --function <name> command. Computes live-in and live-out variable sets per basic block using standard backward-dataflow liveness analysis. Requires the structural index (normalize structure rebuild). Returns a LivenessReport with per-block BlockLiveness entries showing which variables are live at block entry and exit.

  • CFG Phase 2: def/use captures. BasicBlock now carries defs: Vec<DefSite> and uses: Vec<UseSite>. The builder recognises @cfg.def/@cfg.def.name and @cfg.use/@cfg.use.name captures from .cfg.scm queries and assigns them to the enclosing block. Rust, Python, and Go .cfg.scm files updated with variable definition captures.

  • CFG SQLite persistence. Four new tables in the structural index: cfg_blocks, cfg_edges, cfg_defs, cfg_uses. Populated by normalize structure rebuild for all files whose language has a .cfg.scm query. Schema version bumped to 13.

  • Datalog CFG relations. Four new relations in normalize-facts-rules-api: cfg_block(file, func, func_line, block, kind), cfg_edge(file, func, func_line, from, to, kind), cfg_def(file, func, func_line, block, name), cfg_use(file, func, func_line, block, name). Available in .dl rule files via the preamble.

  • liveness.dl builtin rule. Registers standard backward-dataflow liveness (live_in/live_out derived relations) as a disabled-by-default built-in Datalog rule. Enable with normalize rules enable liveness.

  • normalize cfg command. New normalize-cfg crate with a control flow graph builder and Mermaid renderer. normalize cfg <file> [-f <function>] builds a CFG from any file with a supported .cfg.scm query and renders it as a Mermaid flowchart TD. Data model: Cfg, BasicBlock (with BlockKind: Entry, Exit, Statement, Branch, LoopHead, LoopBody, LoopExit, Catch, Unreachable), Edge (with EdgeKind: Fallthrough, ConditionalTrue, ConditionalFalse, BackEdge, Break, Continue, Return, Exception). Rust, Python, and Go have bundled queries.

  • GrammarLoader::get_cfg(name) in normalize-languages. Parallel to get_complexity, loads .cfg.scm query files from the grammar search path with fallback to bundled queries.

  • Rust CFG query (rust.cfg.scm). Captures if_expression (branch + condition/then/else), match_expression (match + arms), while_expression/for_expression (loop + condition/body), loop_expression (unconditional loop + body), return_expression, break_expression, continue_expression, and panic!/todo!/unreachable! macros (throw). Snapshot tests for 6 fixtures: linear, branch, loop, nested, early_return, match.

  • Python CFG query (python.cfg.scm). Captures if_statement (branch + condition/then/else), match_statement/case_clause (Python 3.10+), for_statement/while_statement (loop + condition/body), try_statement/except_clause/finally_clause, return_statement, break_statement, continue_statement, raise_statement. Snapshot tests for 4 fixtures: linear, branch, loop, early_return.

  • Go CFG query (go.cfg.scm). Captures if_statement (branch + then/else), expression_switch_statement/expression_case_clause (match), for_statement (loop + body; covers range, condition, and unconditional forms), return_statement, break_statement, continue_statement. Snapshot tests for 4 fixtures (skipped gracefully when Go grammar not installed).

  • TypeScript and TSX CFG queries (typescript.cfg.scm, tsx.cfg.scm). Captures if_statement/else_clause (branch), switch_statement/switch_case/switch_default (match), for_statement/for_in_statement/while_statement/do_statement (loop), try_statement/catch_clause/finally_clause, return_statement, break_statement, continue_statement, throw_statement. Verified against arborium-typescript and arborium-tsx node-types.json. Snapshot tests for 6 fixtures: linear, branch, loop, early_return, try_catch, switch.

  • JavaScript CFG query (javascript.cfg.scm). Identical control flow grammar as TypeScript (shared arborium base). Same captures; verified against arborium-javascript node-types.json. Snapshot tests for 4 fixtures: linear, branch, loop, early_return.

  • Java CFG query (java.cfg.scm). Captures if_statement (branch), switch_expression/switch_block_statement_group/switch_rule (match — covers both statement and expression form), for_statement/enhanced_for_statement/while_statement/do_statement (loop), try_statement/try_with_resources_statement/catch_clause/finally_clause, return_statement, break_statement, continue_statement, throw_statement. Labeled break/continue are captured as exits; label resolution deferred. Verified against arborium-java node-types.json. Snapshot tests for 5 fixtures (skipped gracefully when Java grammar not installed).

  • CFG coverage matrix test. New coverage_matrix.rs test in normalize-cfg/tests/ classifies all registered languages as HAS_CFG (rust, python, go, typescript, tsx, javascript, java), NOT_APPLICABLE (data/markup/config formats: json, yaml, toml, xml, html, css, scss, graphql, sql, and others), or DEFERRED (languages with control flow but no query yet). The cfg_has_cfg_languages_return_some test asserts all HAS_CFG grammar names return Some from GrammarLoader::get_cfg.

  • CFG Phase 1: batch queries for 69 additional languages. Added .cfg.scm query files for all DEFERRED languages: C-family (C, C++, ObjC, C#, Kotlin, Swift, Dart), JVM/functional (Scala, Groovy, VB, Haskell, OCaml, F#, Elixir, Erlang, Clojure, Gleam, ReScript, Idris, Agda, Lean, CommonLisp, Scheme, Elisp), scripting (Ruby, Lua, PHP, Perl, Bash, Fish, Awk, Zsh, PowerShell, Batch, Vim), systems/scientific (Zig, Ada, D, Prolog, R, Julia, MATLAB, GLSL, HLSL, Verilog, VHDL), and domain/config (Nix, HCL, Starlark, Elm, Jinja2, Svelte, Vue, CMake, Meson, TLA+, jq). Moved dockerfile and query (tree-sitter query language) to NOT_APPLICABLE. Remaining DEFERRED: asm, x86asm (assembly — need grammar inspection), uiua (array language). Snapshot tests added for Lua and Jinja2 (grammars installed); all other tests skip gracefully when grammar not installed. Coverage matrix cfg_has_cfg_languages_return_some now verifies 76 languages.

  • ModuleResolver for 7 additional languages. Elm (elm.json source-directories, module name → file path), Nix (relative ./path.nix resolution; <nixpkgs> → NotFound), R (source("./file.R") relative load; library(pkg) → NotFound), Julia (include("file.jl") relative include + workspace Project.toml package lookup), MATLAB (filename stem = function name; searches workspace root + src/ + lib/), Prolog (relative use_module + bare name search in workspace root; library(...) → NotFound), D (dub.json sourcePaths, mypackage.utilsmypackage/utils.d). Resolver matrix test updated.

  • ModuleResolver for 20 additional languages. JVM languages: Java, Kotlin, Groovy, Scala (Maven/Gradle src/main/<lang> conventions). .NET languages: C#, VB, F# (namespace→file path mapping). Swift (SPM Sources/<target> directory targets). Dart (pubspec.yaml package: import resolution). Zig (@import relative path resolution). Elixir (Mix lib/ CamelCase↔snake_case). Erlang (1:1 module=file). Haskell (Cabal hs-source-dirs). OCaml (capitalized stem convention). Lua (require dot-path). PHP (composer.json PSR-4 autoload). Perl (lib/ :: path). Clojure (src/ dot-namespace). Common Lisp (workspace stem). Scheme (R7RS .sld/.scm). Gleam (gleam.toml src/). ReScript (bsconfig.json sources).

  • Phase 0 scaffold: cross-file name resolution infrastructure. New Datalog predicates in normalize-facts-rules-api: resolved_import, module, export, reexport, symbol_use, resolved_reference, resolved_call, module_search_path. New ModuleResolver trait in normalize-languages::traits for per-language import resolution, with supporting types ImportSpec, ModuleId, Resolution, ResolverConfig. New crate normalize-module-resolve re-exporting the trait and types. New resolution.dl Datalog rules deriving resolved_reference and resolved_call (disabled by default, requires normalize structure rebuild).

  • Rust ModuleResolver (RustModuleResolver). Resolves use/mod import specifiers to file paths within Cargo workspaces. Handles workspace_config (parses Cargo.toml for workspace members and crate names), module_of_file (derives canonical module path from file's position under src/), and resolve (maps crate::module::name to the .rs file, handles super::/self:: relative paths, returns NotFound for stdlib and external crates). Tested with a 3-file fixture in normalize-refactor/tests/.

  • TypeScript/TSX ModuleResolver (TsModuleResolver). Resolves relative imports (./, ../), tsconfig.json compilerOptions.paths aliases and baseUrl, and .js.ts extension elision. Returns NotApplicable for non-TS/TSX files, NotFound for node_modules.

  • JavaScript ModuleResolver (JsModuleResolver). Resolves relative imports (.js, .mjs, /index.js), jsconfig.json compilerOptions.paths and baseUrl. Returns NotFound for bare specifiers (node_modules).

  • Python ModuleResolver (PythonModuleResolver). Resolves relative imports (from . import, from ..pkg import), detects src/ layout, searches workspace root for absolute imports. Returns NotFound for stdlib/third-party.

  • Go ModuleResolver (GoModuleResolver). Parses go.mod to extract the module path; resolves import paths to directory targets within the module. Returns NotFound for stdlib and third-party packages.

  • Ruby ModuleResolver (RubyModuleResolver). Resolves require_relative to .rb files relative to the caller. Returns NotFound for bare require (gems not modeled).

  • ModuleResolver pass in structure rebuild pipeline. After the existing resolve_all_imports pass, resolve_imports_via_module_resolver() runs as a second pass using per-language resolvers to populate resolved_file for any imports still unresolved. Applies to full rebuild, incremental update, and single-file update_file. All three rebuild paths now include this pass.

  • find_references confidence tagging. CallerRef and ImportRef now carry a confidence field ("resolved" | "heuristic"). Results are tagged "resolved" when the definition file's language has a ModuleResolver, and "heuristic" otherwise. Downstream consumers (rename, safe-delete) can filter on this field.


Installation

curl -fsSL https://rhi.zone/normalize/install.sh | sh
irm https://rhi.zone/normalize/install.ps1 | iex

Manual download: pick the archive for your platform from the assets below and verify with SHA256SUMS.txt.