Skip to content

elixir-vibe/ex_dna

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

71 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ExDNA 🧬

Code duplication detector for Elixir, inspired by jscpd but built on Elixir's native AST instead of token matching.

Because ExDNA understands code structure β€” not just text β€” fn(a, b) -> a + b end and fn(x, y) -> x + y end are recognized as the same code. It also tells you how to fix each clone: extract a function, a macro, or a behaviour callback.

Features

  • Three clone types β€” exact copies (I), renamed variables / changed literals (II), and near-miss clones via structural similarity (III)
  • Multi-clause awareness β€” consecutive def/defp clauses with the same name/arity are analyzed as a single unit, catching duplicated pattern-matching functions that individual clauses are too small to flag
  • Delegation pattern detection β€” def foo(x), do: foo(x, []) followed by def foo(x, opts) are grouped as one unit, catching duplicated wrapper+body pairs across modules
  • Sibling window detection β€” adjacent functions copied between modules are caught even when the surrounding code differs
  • Refactoring suggestions β€” extract function, extract macro, extract behaviour with @callback
  • Smart naming β€” suggestions are named after the dominant struct, call, or pattern (build_changeset, contact_step) instead of extracted_function
  • Pipe normalization β€” x |> f() and f(x) match as the same code
  • Field order normalization β€” %User{name: x, age: y} and %User{age: y, name: x} match in Type-II mode
  • Cross-file grouping β€” actions/ ↔ tools/ (6 clones, 298 nodes) instead of listing each pair
  • @no_clone annotation β€” suppress known/intentional duplicates
  • Incremental Mix.Task.Compiler β€” only re-analyzes changed files
  • LSP server β€” pushes clone diagnostics to your editor alongside Expert or ElixirLS
  • Credo integration β€” drop-in replacement for DuplicatedCode, reuses Credo's parsed ASTs
  • CI-ready β€” exits with code 1 when clones are found, or use --max-clones for a clone budget
  • Four output formats β€” Credo-style console, JSON, self-contained HTML, and SARIF for GitHub Code Scanning
  • Fast β€” parallel file parsing, Plausible (465 files) in ~1 second, Ash (554 files) in ~6 seconds with full Type-I/II/III detection

Installation

def deps do
  [{:ex_dna, "~> 1.3", only: [:dev, :test], runtime: false}]
end

Usage

mix ex_dna                              # scan lib/
mix ex_dna lib/accounts lib/admin       # specific paths
mix ex_dna --literal-mode abstract      # enable Type-II (renamed vars)
mix ex_dna --min-similarity 0.85        # enable Type-III (near-miss)
mix ex_dna --min-mass 50                # fewer, larger clones
mix ex_dna --max-clones 10              # fail only above budget
mix ex_dna --format json                # machine-readable
mix ex_dna --format html                # browsable report
mix ex_dna --format sarif               # GitHub Code Scanning

Deep-dive into a specific clone:

mix ex_dna.explain 3

Shows the full anti-unification breakdown β€” common structure, divergence points, and the suggested extraction with call sites.

Programmatic API

report = ExDNA.analyze("lib/")
report = ExDNA.analyze(["lib/", "test/"])
report = ExDNA.analyze(paths: ["lib/"], min_mass: 20, literal_mode: :abstract)

report.clones   #=> [%ExDNA.Detection.Clone{}, ...]
report.stats    #=> %{files_analyzed: 42, total_clones: 3, ...}

Configuration

Options are layered: defaults β†’ .ex_dna.exs β†’ CLI flags.

Create .ex_dna.exs in your project root:

%{
  min_mass: 25,
  ignore: ["lib/my_app_web/templates/**"],
  excluded_macros: [:schema, :pipe_through, :plug],
  normalize_pipes: true
}
Option CLI flag Default Description
min_mass --min-mass 30 Minimum AST nodes for a fragment
min_similarity --min-similarity 1.0 Threshold for Type-III (set < 1.0 to enable)
literal_mode --literal-mode keep keep = Type-I only, abstract = also Type-II
normalize_pipes --normalize-pipes false Treat x |> f() same as f(x)
excluded_macros --exclude-macro [] Macro calls to skip entirely
ignored_attributes --ignore-attribute (see below) Module attribute names to skip
parse_timeout β€” 5000 Max ms per file (kills hung parses)
ignore --ignore [] Glob patterns to exclude
β€” --max-clones β€” Clone budget (exit 1 only above this)
β€” --format console console, json, html, or sarif

Default ignored attributes: moduledoc, doc, typedoc, type, typep, opaque, spec, callback, macrocallback, impl, behaviour, optional_callbacks, deprecated, derive, enforce_keys, before_compile, after_compile, after_verify, compile, dialyzer, external_resource, on_load, on_definition, vsn, no_clone.

Custom module attributes like @extensions, @timeout, or @fields are fingerprinted and will be reported as duplicates when they appear with the same value in multiple modules.

Suppressing clones

@no_clone true
def validate(params) do
  # intentional duplication, won't be flagged
end

Incremental detection

Add ExDNA as a compiler for automatic detection on mix compile:

def project do
  [compilers: Mix.compilers() ++ [:ex_dna]]
end

Only changed files are re-analyzed. Cache is stored in .ex_dna_cache (add to .gitignore).

Editor integration

ExDNA ships an LSP server that pushes warnings inline on every save. It runs alongside your primary Elixir LSP.

mix ex_dna.lsp

Neovim

vim.lsp.config('ex_dna', {
  cmd = { 'mix', 'ex_dna.lsp' },
  root_markers = { 'mix.exs' },
  filetypes = { 'elixir' },
})

Credo integration

ExDNA ships a Credo check that replaces the built-in DuplicatedCode with full Type-I/II/III detection and refactoring suggestions. It reuses Credo's already-parsed ASTs β€” no double parsing.

Use as a Credo plugin (recommended) β€” automatically registers the check and disables the built-in DuplicatedCode:

# .credo.exs
%{
  configs: [
    %{
      name: "default",
      plugins: [{ExDNA.Credo, []}]
    }
  ]
}

Or add directly to the :enabled checks list:

{ExDNA.Credo, []}

And disable the built-in check:

{Credo.Check.Design.DuplicatedCode, false}

All ExDNA options are available as check/plugin params. By default the Credo check uses the same path scope as mix ex_dna (lib/); pass paths: ["lib/", "test/"] if you want Credo to include test files too.

{ExDNA.Credo, [
  paths: ["lib/", "test/"],
  min_mass: 40,
  literal_mode: :abstract,
  excluded_macros: [:schema, :pipe_through],
  normalize_pipes: true,
  min_similarity: 0.85
]}

How it works

  1. Parse β€” Code.string_to_quoted/2 on every .ex/.exs file (parallel, with per-file timeout)
  2. Normalize β€” strip line/column metadata β†’ rename variables to positional placeholders ($0, $1) β†’ optionally abstract literals β†’ optionally flatten pipes β†’ sort struct/map fields
  3. Fingerprint β€” walk every subtree above min_mass nodes, hash with BLAKE2b; also generate sliding windows over module-level sibling sequences and compute structural sub-hashes for fuzzy candidate pruning
  4. Detect β€” group by hash (Type I/II); use inverted index on sub-hashes + Jaccard similarity + tree edit distance for Type III
  5. Filter β€” prune nested clones, keep the largest match per location
  6. Suggest β€” anti-unify each clone pair to compute the common structure, generate extract-function/macro/behaviour suggestions

License

MIT

About

Code duplication detector powered by Elixir AST analysis

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors