Skip to content

hashmap-kz/godedup

godedup

Find structurally duplicate functions in Go code.

License Go Report Card Go Reference Workflow Status GitHub Issues Go Version Latest Release Start contributing

godedup detects copy-pasted Go functions even when variable names, literals, and package calls were changed.


Example

godedup ./...
$ godedup --output=table --no-tests
 
GROUP  TYPE   SIM   FUNCTION                     LOCATION                     STMTS  LINES
------------------------------------------------------------------------------------------
1      EXACT  100%  store.(*UserRepo).findByID   internal/store/user.go:84    5      14
1      EXACT  100%  store.(*OrderRepo).findByID  internal/store/order.go:91   5      14
------------------------------------------------------------------------------------------
2      EXACT  100%  api.writeJSON                internal/api/user.go:31      4      9
2      EXACT  100%  api.writeJSON                internal/api/order.go:28     4      9
------------------------------------------------------------------------------------------
3      NEAR   88%   worker.(*EmailJob).validate  internal/worker/email.go:55  7      19
3      NEAR   88%   worker.(*SMSJob).validate    internal/worker/sms.go:48    8      21

How It Works

godedup hashes every function's AST structure, normalizing away:

  • Identifier names - userID and orderID are treated the same
  • Literal values - "users" and "orders" are treated the same
  • Package qualifiers - fmt.Println and log.Println are treated the same

Two functions that do the same thing with different variable names, different string literals, and different package calls hash identically.

Exact clones - top-level hash collision: identical control flow and structure.

Near clones - edit distance on per-statement hash sequences: same structure with a few added/removed statements.


Install

go install github.com/hashmap-kz/godedup@latest
brew tap hashmap-kz/homebrew-tap
brew install godedup

Usage

godedup [flags] [path ...]

Flags:
  --min-similarity  float    minimum similarity threshold (default: 0.85)
  --min-stmts       int      minimum statements to analyze (default: 3)
  --exact                    report only exact structural clones
  --no-tests                 exclude test files
  --output          string   output format: text, table, json (default: text)
  --version                  print version

CI integration - exits with code 1 if any clones found:

- name: Check for duplicate code
  run: godedup --exact ./...

JSON output for custom tooling:

godedup --output=json ./... | jq '.[] | select(.exact) | .functions[].name'

Normalization Rules

What How normalized
Variable names all -> same token
String literals all -> ""
Numeric literals all -> 0
Package qualifiers fmt.X and log.X -> same structure
nil / true / false preserved (semantic meaning)
Operator types preserved (+ != -)
Statement order preserved (order matters)
Control flow structure preserved (the whole point)

Limitations

godedup is intentionally simple. It does not try to prove semantic equivalence.

It finds structural duplication, not all possible duplication.

For example, these may not match:

  • same behavior implemented with different control flow
  • helper functions extracted in only one copy
  • loops rewritten as recursion
  • code generated by tools, unless you include it in the scanned paths

License

MIT License. See LICENSE for details.

About

Find structurally duplicate functions in Go code

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors