feat: add `diecut extract` command (engine-only) by raiderrobert · Pull Request #117 · raiderrobert/diecut

raiderrobert · 2026-02-27T05:32:02Z

Summary

Adds the diecut extract command — creates reusable templates from existing projects by replacing literal values with template variables.

This PR contains the extraction engine only (explicit --var flags). Auto-detection, interactive prompts, and conditional files are deferred to #118.

What it does

diecut extract ./my-project --var project_name=my-app --output ./template scans the project, replaces matching values with {{ project_name }}, and writes a ready-to-use template
Generates diecut.toml config with discovered variables and computed variants (e.g. snake_case, PascalCase)
Handles file/directory renaming, content stubbing for deep files (--stub-depth), and gitignore-style excludes
Supports --dry-run to preview changes without writing

New modules

src/extract/ — orchestrator (mod.rs), scanner, replacer, config generator, variant detection, exclusion, content stubbing
src/commands/extract.rs — CLI handler
27 new integration tests

Follow-up: #118 adds auto-detection, interactive prompts, and conditional files (~2,000 additional lines).

Test plan

cargo fmt --check passes
cargo clippy -- -D warnings passes
All 226 tests pass (195 unit + 31 integration)
Smoke tested: diecut extract /tmp/test-project --var project_name=my-app --output /tmp/test-template

…projects Automates the biggest friction point in diecut: turning an existing project into a reusable template. Point it at a project, tell it which values are variables, and it produces a ready-to-use template with diecut.toml, .die suffixed files, and computed case variants. Key capabilities: - Auto-detects case variants (kebab, snake, PascalCase, SCREAMING_SNAKE, etc.) - Longest-match-first replacement prevents overlapping value corruption - Templates path components (my-app/src/ → {{ project_name }}/src/) - Detects conditional files (.github/, Dockerfile, etc.) for optional inclusion - Interactive by default with --batch for CI/scripting - --dry-run to preview without writing - Generates commented diecut.toml with prompted + computed variables

…sions Add a custom `camelcase` Tera filter that properly lowercases the first word and title-cases the rest (e.g., "my-app" -> "myApp"). Register it via tera_with_filters() in the prompt engine and render walker. Fix computed variable expressions in generated diecut.toml to include {{ }} delimiters so they evaluate as Tera templates rather than being treated as literal text.

Use word-boundary-aware matching so short variable values like "app" don't get replaced inside longer words like "application". A match is only accepted when the characters immediately before and after it are not word-like (alphanumeric, underscore, or hyphen).

Instead of repeating verbose inline filter chains like {{ project_name | replace(from="-", to=" ") | title | replace(from=" ", to="") }} in every template file, reference the computed variable names already defined in diecut.toml (e.g., {{ project_name_pascal }}). Extract is_canonical_variant() as a public helper to deduplicate the canonical-variant check between replacement rule building and computed variable generation.

Add 4-tier automatic variable detection for `diecut extract`: - Tier 1: Directory name (0.95 confidence) - Tier 2: Ecosystem configs - Cargo.toml, package.json, pyproject.toml, go.mod (0.85-0.90 confidence) - Tier 3: Git metadata - remote org, user.name (0.65-0.70 confidence) - Tier 4: Frequency analysis with Levenshtein merging (scored 0.30-1.0) Auto-detection runs when no --var flags are provided. Includes noise filtering for language keywords, common libraries, file format words, and stopwords. Scoring emphasizes variant diversity to prefer identifiers that appear in multiple case forms.

Auto-detect now always runs when no --var is provided instead of requiring --auto. Renamed --batch to -y/--yes to align with CLI conventions. Added --min-confidence threshold flag. Name collisions from multiple detection sources are now preserved for interactive resolution instead of silently deduplicating.

Run cargo fmt to fix formatting issues and fix two clippy lints: - Remove redundant closure in strip_email call - Remove identity map on first.as_str()

… safety

Replace Regex::new() calls inside function bodies with std::sync::LazyLock statics so regexes are compiled once instead of on every invocation. Bumps MSRV to 1.80.

- Guard against infinite loop in merge-chain resolution by tracking visited nodes when walking the merge map - Count path-only occurrences in file_count so confidence scoring doesn't miss values that appear only in file paths - Rewrite apply_replacements as a single-pass algorithm that collects all match positions first, preventing later rules from corrupting Tera expressions inserted by earlier rules - Propagate IO errors (e.g. permission denied) from scan instead of silently dropping unreadable files; only downgrade to binary on InvalidData (UTF-8 decode failure)

detect_excludes only checked if exclude patterns existed at the project root, missing patterns like node_modules at deeper levels (e.g. docs/node_modules/). Always include all DEFAULT_EXCLUDES since should_exclude already handles nested matching via path components. Also skip symlinks that resolve to directories during scan. pnpm's node_modules/.pnpm uses symlinks to directories, and walkdir reports these as non-directory entries, causing read_to_string to fail with "Is a directory".

Git worktrees are working copies, not part of the project source. Without this, extract would template duplicate files from any active worktrees in the project.

Classify text files with 0 template replacements as boilerplate (config, dotfiles, CI) or content (prose, source). Boilerplate is copied in full; content files are stubbed to minimal placeholders so templates preserve structure without project-specific prose. Interactive confirmation now shows three categories: Templated, Boilerplate, and Stubbed.

The rename of detect_excludes to all_default_excludes and the new relevant_config_excludes function were already referenced by mod.rs but the file itself was not staged in the previous commit.

Content files deeper than N path components (default 2) are now dropped entirely instead of being stubbed. Shallow content files like README.md or docs/guide.md are still stubbed as before. The threshold is configurable via --stub-depth.

- Extract 6 interactive UI functions from mod.rs into interactive.rs - Deduplicate config parsers with push_config_candidate helper - Replace Tier 4 frequency analysis (~770 lines of noise-filter lists) with a ~60-line multi-variant heuristic requiring ≥2 case forms - Remove strsim dependency (no longer needed)

Files deeper than stub_depth were only dropped when they had 0 template replacements. Deep files with incidental replacements (e.g. a project name appearing in a nested reference doc) were still kept as .die templates. Now the depth check applies regardless of replacement count.

Non-boilerplate files deeper than stub_depth are now removed from the scan result before frequency analysis runs. This prevents detecting variables that only appear in files that would be dropped anyway.

cloudflare-workers-and-pages · 2026-03-04T16:15:16Z

Deploying diecut with Cloudflare Pages

Latest commit:	`2f62404`
Status:	✅ Deploy successful!
Preview URL:	https://9515e3bd.diecut.pages.dev
Branch Preview URL:	https://feat-extract-auto-detect.diecut.pages.dev

View logs

Remove auto-detect, interactive prompts, and conditional files to reduce PR scope. These features are preserved on feat/extract-auto-detect for a follow-up PR. - Delete auto_detect.rs (1,140 lines), interactive.rs (426 lines), conditional.rs (170 lines) - Remove --yes and --min-confidence CLI flags - Move count_occurrences to scan.rs (test-only) - Remove 4 auto-detect integration tests - Strip dead params and deduplicate DEFAULT_EXCLUDES

Defer variants, stub/drop, copy-without-render, camelcase filter, and config_gen module to a follow-up PR. Inline minimal config generation. Remove --stub-depth flag. 991 lines changed (down from 2,539).

Move exclude patterns to default_excludes.txt (Rust + macOS only). Add --exclude-from flag to use a custom exclude file. Replace all_default_excludes() with load_excludes(Option<&Path>).

Cover word-boundary matching, longest-match-first ordering, overlap resolution, no-rescan guarantee, Unicode handling, and path replacements.

raiderrobert and others added 25 commits February 27, 2026 00:31

fix: resolve cargo fmt and clippy warnings

375616a

Run cargo fmt to fix formatting issues and fix two clippy lints: - Remove redundant closure in strip_email call - Remove identity map on first.as_str()

fix(extract): resolve merge chains in cluster deduplication

cc93d61

refactor(extract): use enum for PlannedExtractFile content

bdd420b

fix(extract): replace partial_cmp().unwrap() with total_cmp() for NaN…

917ea77

… safety

fix(extract): use dedicated error variant for malformed --var arguments

a35eb2f

fix(extract): disable git terminal prompts during auto-detection

9bb5be9

refactor(extract): consolidate duplicate count_occurrences functions

b490f8c

perf(extract): use LazyLock for Regex compilation

c163f04

Replace Regex::new() calls inside function bodies with std::sync::LazyLock statics so regexes are compiled once instead of on every invocation. Bumps MSRV to 1.80.

fix(extract): exclude .worktrees/ from template extraction

e28b004

Git worktrees are working copies, not part of the project source. Without this, extract would template duplicate files from any active worktrees in the project.

fix(extract): commit missing exclude.rs refactor

fc75d15

The rename of detect_excludes to all_default_excludes and the new relevant_config_excludes function were already referenced by mod.rs but the file itself was not staged in the previous commit.

refactor: autodetect

b0d69da

fix(extract): filter deep files before auto-detect

0760e23

Non-boilerplate files deeper than stub_depth are now removed from the scan result before frequency analysis runs. This prevents detecting variables that only appear in files that would be dropped anyway.

refactor: improve extraction

2f62404

raiderrobert changed the title ~~feat: add diecut extract command~~ feat: add diecut extract command (engine-only) Mar 4, 2026

raiderrobert mentioned this pull request Mar 4, 2026

feat(extract): add auto-detection, interactive prompts, and conditional files #123

Open

5 tasks

refactor(extract): trim to verbatim-only for PR review

094222a

Defer variants, stub/drop, copy-without-render, camelcase filter, and config_gen module to a follow-up PR. Inline minimal config generation. Remove --stub-depth flag. 991 lines changed (down from 2,539).

raiderrobert added 2 commits March 4, 2026 23:24

refactor(extract): load excludes from embedded file with override

e9e76b8

Move exclude patterns to default_excludes.txt (Rust + macOS only). Add --exclude-from flag to use a custom exclude file. Replace all_default_excludes() with load_excludes(Option<&Path>).

test(extract): add unit tests for replace.rs

13c313a

Cover word-boundary matching, longest-match-first ordering, overlap resolution, no-rescan guarantee, Unicode handling, and path replacements.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add `diecut extract` command (engine-only)#117

feat: add `diecut extract` command (engine-only)#117
raiderrobert wants to merge 29 commits intomainfrom
feat/extract-command

raiderrobert commented Feb 27, 2026 •

edited

Loading

Uh oh!

cloudflare-workers-and-pages bot commented Mar 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

raiderrobert commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What it does

New modules

Test plan

Uh oh!

cloudflare-workers-and-pages bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying diecut with Cloudflare Pages

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

raiderrobert commented Feb 27, 2026 •

edited

Loading

cloudflare-workers-and-pages bot commented Mar 4, 2026 •

edited

Loading