Code duplication analyzer and refactoring planner for LLMs.
reDUP scans codebases for duplicated functions, blocks, and structural patterns — then builds a prioritized refactoring map that LLMs can consume to eliminate redundancy systematically.
- Exact duplicate detection via SHA-256 block hashing
- Structural clone detection — same AST shape, different variable names
- Fuzzy near-duplicate matching via SequenceMatcher / rapidfuzz
- Function-level analysis using Python AST extraction
- Impact scoring — prioritizes duplicates by
saved_lines × similarity - Refactoring planner — generates concrete extract/inline suggestions
- Three output formats: JSON (tooling), YAML (humans), TOON (LLMs)
- CLI with
typer+richfor interactive use
pip install redupWith optional dependencies:
pip install redup[all] # Everything
pip install redup[fuzzy] # rapidfuzz for better similarity matching
pip install redup[ast] # tree-sitter for multi-language AST
pip install redup[lsh] # datasketch for LSH near-duplicate detection# Scan current directory, output TOON to stdout
redup scan .
# Scan with JSON output saved to file
redup scan ./src --format json --output ./reports/
# Scan with all formats
redup scan . --format all --output ./redup_output/
# Only function-level duplicates (faster)
redup scan . --functions-only
# Custom thresholds
redup scan . --min-lines 5 --min-sim 0.9
# Show installed optional dependencies
redup infofrom pathlib import Path
from redup import ScanConfig, analyze
from redup.reporters.toon_reporter import to_toon
from redup.reporters.json_reporter import to_json
config = ScanConfig(
root=Path("./my_project"),
extensions=[".py"],
min_block_lines=3,
min_similarity=0.85,
)
result = analyze(config=config, function_level_only=True)
print(f"Found {result.total_groups} duplicate groups")
print(f"Lines recoverable: {result.total_saved_lines}")
# For LLM consumption
print(to_toon(result))
# For tooling / CI
Path("duplication.json").write_text(to_json(result))# redup/duplication | 3 groups | 12f 4200L | 2026-03-22
SUMMARY:
files_scanned: 12
total_lines: 4200
dup_groups: 3
saved_lines: 84
DUPLICATES[3] (ranked by impact):
[E0001] !! EXAC calculate_tax L=8 N=3 saved=16 sim=1.00
billing.py:1-8 (calculate_tax)
shipping.py:1-8 (calculate_tax)
returns.py:1-8 (calculate_tax)
REFACTOR[1] (ranked by priority):
[1] ○ extract_function → utils/calculate_tax.py
WHY: 3 occurrences of 8-line block across 3 files — saves 16 lines
FILES: billing.py, shipping.py, returns.py
{
"summary": {
"total_groups": 3,
"total_saved_lines": 84
},
"groups": [
{
"id": "E0001",
"type": "exact",
"normalized_name": "calculate_tax",
"fragments": [
{"file": "billing.py", "line_start": 1, "line_end": 8},
{"file": "shipping.py", "line_start": 1, "line_end": 8}
],
"saved_lines_potential": 16
}
],
"refactor_suggestions": [
{
"priority": 1,
"action": "extract_function",
"new_module": "utils/calculate_tax.py",
"risk_level": "low"
}
]
}src/redup/
├── __init__.py # Public API
├── __main__.py # python -m redup
├── core/
│ ├── models.py # Pydantic data models
│ ├── scanner.py # File discovery + block extraction
│ ├── hasher.py # SHA-256 / structural fingerprinting
│ ├── matcher.py # Fuzzy similarity comparison
│ ├── planner.py # Refactoring suggestion generator
│ └── pipeline.py # Orchestrator: scan → hash → match → plan
├── reporters/
│ ├── json_reporter.py # JSON output
│ ├── yaml_reporter.py # YAML output
│ └── toon_reporter.py # TOON output (LLM-optimized)
└── cli_app/
└── main.py # Typer CLI
1. SCAN Walk project, read files, extract function-level + sliding-window blocks
2. HASH Generate exact (SHA-256) and structural (normalized AST) fingerprints
3. GROUP Bucket by hash, keep only groups with 2+ blocks from different locations
4. MATCH Verify candidates with fuzzy similarity (SequenceMatcher / rapidfuzz)
5. DEDUP Remove overlapping groups (keep highest-impact)
6. PLAN Generate prioritized refactoring suggestions with risk assessment
7. REPORT Export to JSON / YAML / TOON
reDUP is part of the wronai developer toolchain:
- code2llm — static analysis engine (health diagnostics, complexity)
- reDUP — deep duplication analysis and refactoring planning
- code2docs — automatic documentation generation
- vallm — validation of LLM-generated code proposals
Typical workflow:
code2llmanalyzes the project →.toondiagnosticsredupfinds duplicates →duplication.toon- Feed both to an LLM for targeted refactoring
vallmvalidates the LLM's proposals before merging
git clone https://github.com/semcod/redup.git
cd redup
pip install -e ".[dev]"
pytestApache License 2.0 - see LICENSE for details.
Created by Tom Sapletta - tom@sapletta.com