reDUP

Code duplication analyzer and refactoring planner for LLMs.

reDUP scans codebases for duplicated functions, blocks, and structural patterns — then builds a prioritized refactoring map that LLMs can consume to eliminate redundancy systematically.

Features

Exact duplicate detection via SHA-256 block hashing
Structural clone detection — same AST shape, different variable names
Fuzzy near-duplicate matching via SequenceMatcher / rapidfuzz
Function-level analysis using Python AST extraction
Impact scoring — prioritizes duplicates by saved_lines × similarity
Refactoring planner — generates concrete extract/inline suggestions
Three output formats: JSON (tooling), YAML (humans), TOON (LLMs)
CLI with typer + rich for interactive use

Installation

pip install redup

With optional dependencies:

pip install redup[all]       # Everything
pip install redup[fuzzy]     # rapidfuzz for better similarity matching
pip install redup[ast]       # tree-sitter for multi-language AST
pip install redup[lsh]       # datasketch for LSH near-duplicate detection

Quick Start

CLI

# Scan current directory, output TOON to stdout
redup scan .

# Scan with JSON output saved to file
redup scan ./src --format json --output ./reports/

# Scan with all formats
redup scan . --format all --output ./redup_output/

# Only function-level duplicates (faster)
redup scan . --functions-only

# Custom thresholds
redup scan . --min-lines 5 --min-sim 0.9

# Show installed optional dependencies
redup info

Python API

from pathlib import Path
from redup import ScanConfig, analyze
from redup.reporters.toon_reporter import to_toon
from redup.reporters.json_reporter import to_json

config = ScanConfig(
    root=Path("./my_project"),
    extensions=[".py"],
    min_block_lines=3,
    min_similarity=0.85,
)

result = analyze(config=config, function_level_only=True)

print(f"Found {result.total_groups} duplicate groups")
print(f"Lines recoverable: {result.total_saved_lines}")

# For LLM consumption
print(to_toon(result))

# For tooling / CI
Path("duplication.json").write_text(to_json(result))

Output Formats

TOON (LLM-optimized)

# redup/duplication | 3 groups | 12f 4200L | 2026-03-22

SUMMARY:
  files_scanned: 12
  total_lines:   4200
  dup_groups:    3
  saved_lines:   84

DUPLICATES[3] (ranked by impact):
  [E0001] !! EXAC  calculate_tax  L=8 N=3 saved=16 sim=1.00
      billing.py:1-8  (calculate_tax)
      shipping.py:1-8  (calculate_tax)
      returns.py:1-8  (calculate_tax)

REFACTOR[1] (ranked by priority):
  [1] ○ extract_function   → utils/calculate_tax.py
      WHY: 3 occurrences of 8-line block across 3 files — saves 16 lines
      FILES: billing.py, shipping.py, returns.py

JSON (machine-readable)

{
  "summary": {
    "total_groups": 3,
    "total_saved_lines": 84
  },
  "groups": [
    {
      "id": "E0001",
      "type": "exact",
      "normalized_name": "calculate_tax",
      "fragments": [
        {"file": "billing.py", "line_start": 1, "line_end": 8},
        {"file": "shipping.py", "line_start": 1, "line_end": 8}
      ],
      "saved_lines_potential": 16
    }
  ],
  "refactor_suggestions": [
    {
      "priority": 1,
      "action": "extract_function",
      "new_module": "utils/calculate_tax.py",
      "risk_level": "low"
    }
  ]
}

Architecture

src/redup/
├── __init__.py            # Public API
├── __main__.py            # python -m redup
├── core/
│   ├── models.py          # Pydantic data models
│   ├── scanner.py         # File discovery + block extraction
│   ├── hasher.py          # SHA-256 / structural fingerprinting
│   ├── matcher.py         # Fuzzy similarity comparison
│   ├── planner.py         # Refactoring suggestion generator
│   └── pipeline.py        # Orchestrator: scan → hash → match → plan
├── reporters/
│   ├── json_reporter.py   # JSON output
│   ├── yaml_reporter.py   # YAML output
│   └── toon_reporter.py   # TOON output (LLM-optimized)
└── cli_app/
    └── main.py            # Typer CLI

Analysis Pipeline

1. SCAN      Walk project, read files, extract function-level + sliding-window blocks
2. HASH      Generate exact (SHA-256) and structural (normalized AST) fingerprints
3. GROUP     Bucket by hash, keep only groups with 2+ blocks from different locations
4. MATCH     Verify candidates with fuzzy similarity (SequenceMatcher / rapidfuzz)
5. DEDUP     Remove overlapping groups (keep highest-impact)
6. PLAN      Generate prioritized refactoring suggestions with risk assessment
7. REPORT    Export to JSON / YAML / TOON

Integration with wronai Toolchain

reDUP is part of the wronai developer toolchain:

code2llm — static analysis engine (health diagnostics, complexity)
reDUP — deep duplication analysis and refactoring planning
code2docs — automatic documentation generation
vallm — validation of LLM-generated code proposals

Typical workflow:

code2llm analyzes the project → .toon diagnostics
redup finds duplicates → duplication.toon
Feed both to an LLM for targeted refactoring
vallm validates the LLM's proposals before merging

Development

git clone https://github.com/semcod/redup.git
cd redup
pip install -e ".[dev]"
pytest

License

Apache License 2.0 - see LICENSE for details.

Author

Created by Tom Sapletta - tom@sapletta.com

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.idea		.idea
examples		examples
src/redup		src/redup
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
VERSION		VERSION
goal.yaml		goal.yaml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

reDUP

Features

Installation

Quick Start

CLI

Python API

Output Formats

TOON (LLM-optimized)

JSON (machine-readable)

Architecture

Analysis Pipeline

Integration with wronai Toolchain

Development

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

reDUP

Features

Installation

Quick Start

CLI

Python API

Output Formats

TOON (LLM-optimized)

JSON (machine-readable)

Architecture

Analysis Pipeline

Integration with wronai Toolchain

Development

License

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages