Skip to content

v1.5.0

Choose a tag to compare

@dannote dannote released this 04 May 10:31
· 7 commits to master since this release

New

  • Guard-aware normalization — in :abstract mode, all calls inside
    when guard clauses are abstracted so that functions differing only in
    guard predicates are detected as clones. Covers Kernel guards, Erlang
    BIFs, defguard macros, and library guards like Integer.is_even/1.
  • Boolean operator canonicalization&&/||/! are rewritten to
    and/or/not so stylistic choice between short-circuit and keyword
    operators doesn’t prevent clone matching.
  • Sigil ~w expansion~w(foo bar)a is expanded to [:foo, :bar]
    so sigil word-lists match their literal equivalents.
  • MinHash-accelerated fuzzy detection — large posting lists (>50
    entries) now use MinHash signatures for O(k) approximate Jaccard instead
    of O(|A|+|B|) exact set operations. Removes the hard posting-list cap,
    improving recall for large monorepos without sacrificing precision.
  • HTML report syntax highlighting via Makeup — proper Elixir
    tokenization with dark/light theme support, replacing the regex-based
    highlighter.
  • Configurable detection tuning — previously hardcoded constants are
    now available as config options and CLI flags:
    • max_window_size (--max-window-size, default: 4) — max consecutive
      sibling functions combined into a single fingerprint for cross-module
      clone detection.
    • mass_tolerance (--mass-tolerance, default: 0.3) — max relative size
      difference for Type-III comparison.

Changed

  • ignored_attributes default — derived from
    Module.reserved_attributes/0 instead of a hardcoded list. Picks up
    5 previously missing attributes and stays current with future Elixir
    versions automatically.

Performance

  • Fused normalizer — metadata stripping, boolean canonicalization, sigil
    expansion, pipe normalization, and variable renaming run in a single AST
    walk instead of 4 separate traversals. Ash (572 files) ~14% faster.

Benchmarked on real-world projects with full Type-I/II/III detection
(literal_mode: :abstract, min_similarity: 0.85, normalize_pipes: true):

Project Files Clones Time
Broadway 22 1 45ms
Nx 42 12 674ms
Nerves 50 2 172ms
Ecto 56 19 525ms
Commanded 63 8 147ms
Oban 66 16 193ms
Phoenix 74 14 607ms
Elixir stdlib 105 84 1.6s
Surface 109 31 513ms
Absinthe 263 63 590ms
Livebook 265 62 2.1s
Plausible 465 80 2.4s
Ash 572 535 5.8s