Skip to content

kvsankar/dryscope

Repository files navigation

dryscope

CI PyPI Python License: MIT pytest pre-commit Ruff ty Xenon uv

dryscope helps you find the parts of a large repository that are actually worth reading before you ask an AI agent, stronger model, or human reviewer to clean it up.

The name is a conflation of DRY ("Don't Repeat Yourself") and telescope: DRY for repeated code or overlapping knowledge, and telescope for looking across a large repository to spot the places worth inspecting.

It scans code and docs for repeated implementation shapes, copy-pasted helpers, overlapping document sections, and scattered documentation topics. The output is a ranked shortlist of files and sections that deserve attention first, not a claim that every match should be refactored.

dryscope is a narrowing tool:

  • for code, Code Match (code-match) surfaces structural duplicate candidates and Code Review (code-review) filters them down to a shortlist
  • for docs, it has three named tracks:
    • Docs Map (docs-map): profiles documents, discovers canonical labels, builds a topic/facet view, and suggests multi-document consolidation clusters
    • Section Match (docs-section-match): compares heading-based sections and ranks concrete section-level consolidation/link recommendations
    • Doc Pair Review (docs-pair-review): uses an LLM to review selected related document pairs

dryscope process diagram

Motivation

Large-repo cleanup usually starts with a vague but painful question: "where is the duplication, and what should I look at first?" That is hard to answer by searching for keywords or asking an agent to inspect the whole repository.

Developers run into a few common problems:

  • an agent burns context reading unrelated files before it finds the repeated pattern
  • a refactor starts from one obvious copy-paste case and misses nearby structural clones
  • duplicate-code tools report a wall of boilerplate, generated code, and harmless conventions
  • agent-driven development creates requirements, design, research, planning, and status docs spread across the repository
  • documentation may overlap in intent even when it does not repeat the same text
  • reviewers know something is repeated, but not which files form the smallest useful work batch

Code Match Motivation

Coding agents are only as good as the context they can see. If an agent does not notice an existing helper, service method, parser, validation path, or UI branch, it may solve the same local problem again somewhere else. Fast agent-generated or vibe-coded projects are especially prone to this: the code works, but similar logic accumulates across commands, endpoints, components, tests, and migration scripts.

That redundancy is a code health problem, not just an aesthetic one. DRY is about keeping one reason to change in one place. When the same behavior exists in several functions or classes, a bug fix may land in only one copy, edge-case handling can drift, and future agents may copy the wrong version because there is no obvious canonical implementation.

Code Match is built to find those candidates before a cleanup pass starts. It uses tree-sitter to parse supported languages into code units such as functions, classes, methods, Java constructors, and JavaScript/TypeScript function-valued declarations. It normalizes each unit by removing comments/docstrings and replacing identifiers and literals with placeholders, then embeds the normalized code and ranks similar units with hybrid semantic and token similarity. The result is a shortlist of exact, near-identical, and structural duplicate candidates that a human developer, coding agent, or optional Code Review pass can inspect before refactoring.

Docs Map Motivation

This is especially common with spec-driven work using coding agents. A repo can accumulate many documents that all address similar requirements, designs, or research questions from different angles. When those documents are handed back to an agent as context, the model gets a large, unfocused pile of partially overlapping intent instead of a clear source of truth.

Docs Map is built for that problem. It profiles documents, normalizes aboutness and reader-intent labels, builds a topic/facet view of the corpus, and suggests document groups that a human developer or agent can consolidate. The goal is to make documentation context smaller, sharper, and easier to trust before it becomes input to more agent work.

dryscope makes that first pass cheaper. It parses code into comparable units, chunks docs by section, ranks similar items, and can run an LLM-backed review pass to separate likely refactor, review, and noise findings. The result is a concrete starting point: a smaller set of files, sections, and reasons that you can hand to an agent or reviewer before spending expensive attention on the full repository.

Docs Map Taxonomy Example

Suppose a repo has agent-written docs like:

Document Raw signals
docs/search-requirements.md product requirements, search filters, ranking expectations
docs/search-design.md search architecture, indexing pipeline, query API
research/vector-search.md embeddings, retrieval quality, ranking experiments
plans/search-rollout.md rollout checklist, implementation status, open risks

Docs Map turns those local descriptions into a corpus-level taxonomy:

Taxonomy area Example output
Canonical aboutness labels search experience, indexing pipeline, ranking quality, vector retrieval
Reader intents define requirements, explain architecture, compare approaches, track rollout
Facets doc_role: requirements/design/research/plan, lifecycle: current/draft, audience: maintainer/agent
Topic tree Search -> Query behavior, Indexing, Ranking, Rollout
Consolidation cluster group the requirements, design, research, and rollout docs around search experience when they should share one source of truth or cross-reference each other

That taxonomy is what lets dryscope report "these docs overlap in purpose" even when the same paragraphs were not copied between files.

Section Match Example

Docs Map works at the document and corpus level. Section Match works at the section level.

Two documents can be different in purpose but still repeat the same supporting material. For example, a requirements spec and a corresponding design doc should not be merged just because they belong to the same feature. But both might contain a Configuration section that explains the same environment variables, .rc files, feature flags, or deployment settings.

Document Document-level purpose Repeated section-level content
docs/search-requirements.md define user-visible behavior and constraints Configuration: required environment variables and defaults
docs/search-design.md explain architecture, components, and data flow Configuration: same environment variables, rc file, and feature flags

Section Match is built to find that microscopic redundancy. It points to specific repeated sections where one shared reference, one canonical configuration page, or a cross-link would reduce drift. It does not imply the whole documents have the same purpose.

How The Tracks Fit Together

Track Motivation
Code Match Find repeated implementation shapes before a refactor starts, especially in agent-generated code where similar logic may be recreated in different files.
Code Review Filter Code Match output so framework boilerplate, coincidental structure, and low-payoff matches do not consume expensive human or model attention.
Docs Map Find document-level intent overlap across requirements, designs, research, plans, and status docs so a repo can recover clearer sources of truth.
Section Match Find repeated section-level material inside otherwise distinct documents, such as duplicated configuration or deployment explanations.
Doc Pair Review Add deeper judgment for selected related document pairs when the relationship is not obvious from taxonomy or section similarity alone.
Docs Report Pack Package the docs findings as Markdown, HTML, and JSON so humans can review them and agents can consume the same narrowed context.

Features

  • Code Match — Python, Go, Java, JavaScript, JSX, TypeScript, and TSX duplicate-code candidates via tree-sitter + embeddings
  • Code Review — optional LLM/policy pass that classifies Code Match findings as refactor, review, or noise
  • Docs Map — LLM document descriptors, canonical label taxonomy, topic tree, facets, diagnostics, and consolidation clusters
  • Section Match — Markdown, MDX, RST, AsciiDoc, and plaintext section-level redundancy via heading chunks and embedding similarity
  • Doc Pair Review — optional LLM analysis of selected related document pairs
  • Docs Report Pack — HTML/Markdown/JSON docs reports with numbered collapsible sections and the same structure across formats
  • Saved report cleanup — prune old .dryscope/runs outputs by count or date, dry-run first by default
  • Hybrid similarity — 70% embedding cosine + 30% token Jaccard with size-ratio filtering
  • Deterministic escalation policy — keeps review findings plus higher-value refactor findings for expensive follow-up
  • Project profiles — auto-detects Django and pytest-factories, applies smart exclusions
  • Agent skills — install as both a Claude Code and Codex skill
  • Unified JSON output — structured findings[] schema for agent consumption

Positioning

dryscope is best used as:

  • a first-pass scanner before repository-wide refactors
  • a repo narrowing tool before handing work to an agent or stronger model
  • a Code Match candidate generator for structural refactor opportunities
  • a Docs Map and Section Match aid for answering "how should these docs be organized?"
  • a prefilter that helps decide what a deeper reviewer should read first

It is not positioned as:

  • a general-purpose lint replacement
  • a universal duplicate-code product for every developer workflow
  • a perfect semantic clone detector
  • a final refactor oracle
  • a complete replacement for human or stronger-model judgment

The strongest use case is not "find every duplicate." It is "before I ask an agent to clean this up, show me the small set of likely duplicate code and docs consolidation targets worth spending attention on."

Installation

dryscope is published on PyPI: https://pypi.org/project/dryscope/.

For one-off CLI runs without a persistent install, use uvx or pipx run:

uvx dryscope --help
uvx dryscope scan .
pipx run dryscope --help

For a persistent isolated tool install, use either uv tool install or pipx:

uv tool install dryscope
dryscope --help
pipx install dryscope
dryscope --help

For a project virtual environment, use pip or uv pip:

python -m venv .venv
source .venv/bin/activate
python -m pip install dryscope
dryscope --help
uv venv
uv pip install dryscope
uv run dryscope --help

There is no separate uv pipx command. The uv equivalents are uvx for one-off tool runs and uv tool install for persistent tool installs.

The default install supports API embedding models through LiteLLM. Set the provider API key for your embedding model, such as OPENAI_API_KEY for text-embedding-3-small. Local sentence-transformer embeddings are optional because they pull in PyTorch. Install the extra only when you need local embeddings:

uv tool install "dryscope[local-embeddings]"
pipx install "dryscope[local-embeddings]"
python -m pip install "dryscope[local-embeddings]"

For repository development from a source checkout:

uv sync --extra dev
uv run dryscope --help

Development Quality Gates

For repository development, install the dev extra and enable the checked-in pre-commit hooks:

uv sync --extra dev
uv run pre-commit install
uv run pre-commit run --all-files

The default commit hooks are intentionally fast and low-noise:

  • standard file checks for Python syntax, JSON/TOML/YAML, merge conflicts, large files, private keys, case conflicts, executable/shebang consistency, broken symlinks, debug statements, trailing whitespace, final newlines, and LF line endings
  • uv lock --check when pyproject.toml or uv.lock changes
  • ruff check --fix for lint, import sorting, pyupgrade-style rewrites, bugbear checks, comprehensions, and McCabe complexity rule coverage
  • ruff format for Python formatting

Stricter gates are available as manual hooks while the current baseline is being tightened:

uv run pre-commit run ty-check --hook-stage manual --all-files
uv run pre-commit run xenon-complexity --hook-stage manual --all-files

ty-check runs static type analysis over dryscope, tests, and benchmarks. xenon-complexity reports cyclomatic-complexity hot spots and is configured as a ratchet for functions, modules, and repository average complexity. These manual checks are useful before larger refactors even when they are not yet suitable as default commit blockers.

Quick Start

# Progressive help
dryscope --help
dryscope help output
dryscope help json
dryscope scan --help

# Code Match (default)
dryscope scan /path/to/project

# Section Match (docs default)
dryscope scan /path/to/docs --docs

# Code Match with local embeddings, after installing dryscope[local-embeddings]
dryscope scan /path/to/project --embedding-model all-MiniLM-L6-v2

# Docs Report Pack: Docs Map + Section Match + Doc Pair Review
dryscope scan /path/to/docs --docs --stage docs-report-pack --backend cli -f html

# Code Match + Section Match
dryscope scan /path/to/project --code --docs

# Code Match JSON output for agents
dryscope scan /path/to/project -f json

# Code Match filtered by language
dryscope scan /path/to/project --lang python

# Code Review
dryscope scan /path/to/project --verify

# Bounded Code Review for large duplicate-rich repos
dryscope scan /path/to/project --verify --max-findings 15

# Stricter Code Match threshold and token filter
dryscope scan /path/to/project -t 0.95 --min-tokens 15

Real-World Examples

Public examples from recent validation passes:

  • kvsankar/sattosat

    • code scan produced a 2-item shortlist
    • one clear refactor candidate survived: duplicated TLE epoch parsing logic across two scripts and one library module
    • docs scan produced 0 recommendations
  • stellar/stellar-docs

    • docs scan found real overlap in repeated sequence-diagram flows
    • grouped Section Match output reduced noisy pairwise suggestions into a compact 4-item shortlist
  • gethomepage/homepage

    • docs scan found 0 overlap pairs
    • with the old large-repo guard enabled, the pipeline exited early instead of spending LLM work on a large negative repo

Recent AI-generated / agent-oriented public repo checks show the code path doing the intended narrowing job:

Repo Structural candidates Verified shortlist from top 15
CLI-Anything-WEB 94 5
nanowave 82 10
ClaudeCode_generated_app 51 6
VibesOS 23 4

These are candidate shortlists, not precision/recall claims. The benchmark pack keeps reviewed labels for selected findings, including real refactor candidates and at least one false-positive regression case.

For docs-heavy repositories, the current docs report is organized around named docs tracks:

  1. Docs Map (docs-map): document descriptors -> canonical label normalization -> topic tree/facets -> docs map clusters.
  2. Section Match (docs-section-match): document sections -> embeddings -> matched section pairs -> section match recommendations.
  3. Doc Pair Review (docs-pair-review): selected related document pairs -> LLM relationship/action review.

Configuration

Generate a default config file:

dryscope init

This creates .dryscope.toml:

[code]
min_lines = 6
min_tokens = 0
max_cluster_size = 15
threshold = 0.90
embedding_model = "text-embedding-3-small"
escalate_refactor_min_lines = 40
escalate_refactor_min_actionability = 2.0
escalate_refactor_min_units = 3
keep_same_file_refactors = false
# exclude = ["**/test_*.py"]
# exclude_type = ["BaseModel"]

[docs]
include = ["*.md", "*.mdx", "*.rst", "*.txt", "*.adoc"]
exclude = ["node_modules", "venv", ".git", ".dryscope", "*.lock"]
threshold_similarity = 0.9
threshold_intent = 0.8
min_content_words = 15
include_intra = false
token_weight = 0.3
# Same embedding backend choices as [code].
embedding_model = "text-embedding-3-small"
intent_max_docs = 0
llm_max_doc_pairs = 250
intent_skip_without_similarity_min_docs = 0

[docs.map]
# Generic seed dimensions shown to the LLM. These are suggestions, not a
# product-specific taxonomy; dryscope still infers the corpus topic tree.
facet_dimensions = ["doc_role", "audience", "lifecycle", "content_type", "surface", "canonicality"]

[docs.map.facet_values]
doc_role = ["guide", "reference", "tutorial", "spec", "plan", "status", "research", "changelog", "architecture", "decision", "overview", "troubleshooting"]
audience = ["user", "contributor", "maintainer", "operator", "internal", "agent"]
lifecycle = ["current", "proposed", "historical", "deprecated", "draft", "unknown"]
content_type = ["concept", "workflow", "api", "troubleshooting", "decision", "benchmark", "example", "architecture", "requirements"]
surface = ["public", "internal", "generated", "extension", "package", "integration"]
canonicality = ["primary", "supporting", "archive", "duplicate", "index", "unknown"]

[llm]
model = "claude-haiku-4-5-20251001"
backend = "cli"       # "cli" (claude -p), "codex-cli", "litellm" (provider API keys), or "ollama" (local Ollama)
max_cost = 5.00
concurrency = 8
# ollama_host = "http://localhost:11434"
# cli_strip_api_key = true
# cli_permission_mode = "bypassPermissions"
# cli_dangerously_skip_permissions = false

[cache]
enabled = true
path = "~/.cache/dryscope/cache.db"

Configuration layers: defaults → .dryscope.toml → CLI flags.

LLM Backend Configuration

dryscope supports four verification backends:

  • cli
    • shells out to claude -p
    • good when you use Claude CLI with OAuth/session auth
  • codex-cli
    • shells out to codex exec
    • good when you use Codex CLI directly
  • litellm
    • uses provider APIs through LiteLLM
    • good for OpenAI, Anthropic, Gemini, Azure OpenAI, Bedrock, OpenRouter, and other LiteLLM-supported providers
  • ollama
    • uses the local Ollama HTTP API
    • good for local/private verification without a cloud provider

Claude CLI

[llm]
backend = "cli"
model = "claude-haiku-4-5-20251001"
# cli_strip_api_key = true
# cli_permission_mode = "bypassPermissions"
# cli_dangerously_skip_permissions = false
dryscope scan /path/to/project --verify --backend cli --llm-model claude-haiku-4-5-20251001

Codex CLI

[llm]
backend = "codex-cli"
# Use the Codex default model, or set one your Codex auth supports.
model = "gpt-5.4"
dryscope scan /path/to/project --verify --backend codex-cli --llm-model gpt-5.4

codex-cli shells out to codex exec. On this machine, explicit mini models like gpt-4o-mini were rejected under ChatGPT-account Codex auth, while the default Codex model worked. If you want mini models through Codex CLI, use API-key login with codex login --with-api-key if your account supports them.

LiteLLM Providers

Use litellm when you want hosted provider APIs.

OpenAI example:

[llm]
backend = "litellm"
model = "gpt-4o"
OPENAI_API_KEY=... dryscope scan /path/to/project --verify --backend litellm --llm-model gpt-4o

Anthropic example:

[llm]
backend = "litellm"
model = "claude-3-5-sonnet-latest"
ANTHROPIC_API_KEY=... dryscope scan /path/to/project --verify --backend litellm --llm-model claude-3-5-sonnet-latest

Ollama

[llm]
backend = "ollama"
model = "qwen2.5:3b"
# ollama_host = "http://localhost:11434"
dryscope scan /path/to/project --verify --backend ollama --llm-model qwen2.5:3b

Agent Skills

dryscope install    # Install as both Claude Code and Codex skills
dryscope uninstall  # Remove the skill

dryscope install creates a shared skill venv under $XDG_DATA_HOME/dryscope/skill-venv or ~/.local/share/dryscope/skill-venv, then renders SKILL.md into both ~/.claude/skills/dryscope and ~/.codex/skills/dryscope.

CLI Reference

dryscope scan <path> [OPTIONS]
Option Default Description
--code / --no-code --code Run Code Match
--docs / --no-docs off Run docs tracks
--lang all Filter: python, go, java, js, jsx, ts, tsx
-t, --threshold 0.90 Similarity threshold (0.0-1.0)
-f, --format terminal Output: terminal, json, markdown, html
-m, --min-lines 6 Minimum lines per code unit
--min-tokens 0 Minimum unique normalized tokens
--max-cluster-size 15 Drop clusters larger than this
--max-findings Limit Code Match/Code Review to the top N code findings
-e, --exclude Glob patterns to exclude; applies to Code Match and docs tracks
--exclude-type Base class types to exclude (code)
--embedding-model text-embedding-3-small Embedding model; API models use LiteLLM, local sentence-transformers such as all-MiniLM-L6-v2 require the dryscope[local-embeddings] extra
--verify off Run Code Review for code; run full docs tracks for docs
--llm-model claude-haiku-4-5-20251001 LLM model for Code Review and Doc Pair Review
--stage docs-section-match Docs stage: docs-section-match runs Section Match; docs-report-pack adds Docs Map and Doc Pair Review
--resume off Resume from latest docs run
--intra off Include intra-document overlap (docs)
--threshold-intent 0.8 Docs Map topic-pair threshold
--llm-max-doc-pairs config Maximum document pairs for Doc Pair Review
--concurrency config Max parallel LLM calls for docs full stage
--backend config LLM backend: cli, codex-cli, litellm, or ollama

Report cleanup:

Command Description
dryscope reports clean <path> --keep-last N Keep the newest N saved report runs
dryscope reports clean <path> --keep-since YYYY-MM-DD Keep runs on or after a calendar date
dryscope reports clean <path> --keep-since YYYY-MM Keep runs on or after the first day of a month
dryscope reports clean <path> --keep-days N Keep runs from the last N days
--force Actually delete runs; without this, cleanup is preview-only
dryscope init         # Generate .dryscope.toml
dryscope install      # Install Claude Code and Codex skills
dryscope uninstall    # Remove Claude Code and Codex skills
dryscope cache stats  # Show cache statistics
dryscope cache clear  # Clear the cache
dryscope reports clean /path/to/project --keep-last 5          # Preview deleting older saved runs
dryscope reports clean /path/to/project --keep-days 30 --force # Delete runs older than 30 days

Saved Report Cleanup

Docs scans are saved under .dryscope/runs/<run-id>/ with report.md, report.html, report.json, and resumable stage artifacts. Cleanup is dry-run by default:

# Keep the newest 10 runs; preview only
dryscope reports clean /path/to/project --keep-last 10

# Keep reports from April 2026 onward; preview only
dryscope reports clean /path/to/project --keep-since 2026-04-01

# Keep reports from the last 30 days and actually delete older runs
dryscope reports clean /path/to/project --keep-days 30 --force

When multiple keep rules are supplied, dryscope keeps the union. For example, --keep-last 5 --keep-days 30 preserves the newest five runs plus any run from the last 30 days. After deletion, .dryscope/latest is repointed to the newest remaining run.

Report Format Structure

report.md, report.html, and report.json use the same top-level section order: Run Overview, Docs Map, Docs Map Clusters, Section Match, optional Doc Pair Review, Docs Map Taxonomy, and Methodology.

For machine-readable output contracts, see JSON output.

At the top level, JSON keeps only run metadata, a compact summary, and the ordered report_structure; detailed payloads live under their owning sections.

Each detailed list is owned by one section. For example, topic documents live inside Docs Map, consolidation documents live inside Docs Map Clusters, and canonical labels/aliases live inside Docs Map Taxonomy. The report avoids "sample first, full list later" output; long lists are collapsible in Markdown/HTML and nested under the corresponding section in JSON.

Code findings use file paths relative to the scan root. That keeps JSON output stable across clone locations and prevents external artifact directories from affecting Code Review context.

How It Works

Code Pipeline

  1. Parse — tree-sitter extracts functions, classes, and methods
  2. Normalize — identifiers/literals replaced with placeholders; comments stripped
  3. Embed — API embeddings through LiteLLM or local sentence-transformers embeddings
  4. Compare — hybrid similarity (70% cosine + 30% token Jaccard) with size-ratio filtering
  5. Cluster — Union-Find groups similar pairs, scored by actionability
  6. Code Review (optional) — LLM classifies each cluster as refactor, review, or noise
  7. Escalate (with --verify) — deterministic policy keeps all review findings and only higher-value refactor findings

Docs Pipeline

  1. Chunk — split documents into heading-based sections
  2. Embed — API embeddings through LiteLLM or local sentence-transformers embeddings
  3. Section Match — hybrid similarity finds cross-document section overlap
  4. Docs Map descriptors (full stage) — LLM profiles each document with title, summary, aboutness labels, reader intents, role, audience, lifecycle, content type, surface, and canonicality
  5. Docs Map taxonomy (full stage) — deterministic matching plus optional LLM canonicalization turns raw aboutness/intent labels into a corpus-level canonical label taxonomy
  6. Docs Map discovery (full stage) — LLM builds a candidate topic tree, facets, diagnostics, and consolidation clusters
  7. Match intent pairs (full stage) — canonical labels are embedded to find related document pairs for optional deeper pair analysis
  8. Doc Pair Review (full stage) — LLM classifies selected related document pairs with action recommendations when within cost limits
  9. Docs Report Pack — markdown, HTML, and JSON share the same top-down structure: run overview, Docs Map, Docs Map Clusters, Section Match, optional Doc Pair Review, and Docs Map Taxonomy

What Good Output Looks Like

For code:

  • a small shortlist of refactor and review findings
  • exact or near-exact helpers extracted across files
  • borderline same-file or low-payoff duplicates left as review

For docs:

  • a Docs Map section showing topic groups, facets, and diagnostics
  • Docs Map clusters from canonical labels shared by multiple documents
  • Section Match recommendations only when section-level overlap exists
  • 0 Section Match recommendations on clean negative repos, while Docs Map may still report organizational signals
  • a few grouped Section Match recommendations on docs-heavy repos
  • one family recommendation for many near-identical sibling docs, rather than many pairwise duplicates

Documentation

  • Architecture — how the code, docs, reporting, cache, and CLI pieces fit together
  • Analysis — positioning, alternatives, benchmark notes, and product-readiness context
  • Process image brief — single-file brief for generating the dryscope engineering process diagram
  • JSON output — machine-readable output contracts for agents and scripts
  • Roadmap — forward-looking planning notes kept separate from architecture
  • Synthetic examples — small exposition-only examples for similarity, Code Match, Docs Map, and Section Match
  • Benchmark pack — public benchmark harness, artifact locations, and refresh commands
  • Benchmark quality report — readable TP/FP/FN summary generated from public labels
  • Agent guidance and Claude guidance — repository-specific instructions for coding agents
  • Packaged dryscope skill — skill instructions installed for agent workflows

Benchmarking

dryscope includes a checked-in public benchmark pack under benchmarks/README.md.

It only references public repositories and reviewed public labels. Private repo evaluation should remain local and out of the checked-in benchmark files.

The current benchmark evidence supports public alpha positioning: dryscope can find and narrow repeated implementation shapes in AI-generated or agent-oriented repositories. The labels are still intentionally sparse, so the benchmark pack should be read as regression evidence for the narrowing workflow, not as a precision/recall claim.

For quality assessment, run:

uv run python benchmarks/run_quality_report.py

That report scores generated benchmark outputs against curated public labels using TP/FP/FN, labeled precision, curated recall, F1, and precision@K/recall@K. True negatives are intentionally omitted because the non-duplicate search space is too large to enumerate.

Benchmark clones and generated outputs are kept outside the repo under ${DRYSCOPE_BENCHMARK_ROOT:-~/.dryscope/benchmarks} by default. Result and report directories are run-specific, and filenames/metadata identify benchmark inputs as <repo>@<commit>. The runners refuse to reuse a non-empty result directory unless --overwrite is passed.

License

MIT

About

Don't Repeat Yourself Scope - code and docs duplicate finder

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages