Skip to content

feat: add diecut extract command (engine-only)#117

Open
raiderrobert wants to merge 29 commits intomainfrom
feat/extract-command
Open

feat: add diecut extract command (engine-only)#117
raiderrobert wants to merge 29 commits intomainfrom
feat/extract-command

Conversation

@raiderrobert
Copy link
Owner

@raiderrobert raiderrobert commented Feb 27, 2026

Summary

Adds the diecut extract command — creates reusable templates from existing projects by replacing literal values with template variables.

This PR contains the extraction engine only (explicit --var flags). Auto-detection, interactive prompts, and conditional files are deferred to #118.

What it does

  • diecut extract ./my-project --var project_name=my-app --output ./template scans the project, replaces matching values with {{ project_name }}, and writes a ready-to-use template
  • Generates diecut.toml config with discovered variables and computed variants (e.g. snake_case, PascalCase)
  • Handles file/directory renaming, content stubbing for deep files (--stub-depth), and gitignore-style excludes
  • Supports --dry-run to preview changes without writing

New modules

  • src/extract/ — orchestrator (mod.rs), scanner, replacer, config generator, variant detection, exclusion, content stubbing
  • src/commands/extract.rs — CLI handler
  • 27 new integration tests

Follow-up: #118 adds auto-detection, interactive prompts, and conditional files (~2,000 additional lines).

Test plan

  • cargo fmt --check passes
  • cargo clippy -- -D warnings passes
  • All 226 tests pass (195 unit + 31 integration)
  • Smoke tested: diecut extract /tmp/test-project --var project_name=my-app --output /tmp/test-template

raiderrobert and others added 25 commits February 27, 2026 00:31
…projects

Automates the biggest friction point in diecut: turning an existing project
into a reusable template. Point it at a project, tell it which values are
variables, and it produces a ready-to-use template with diecut.toml, .die
suffixed files, and computed case variants.

Key capabilities:
- Auto-detects case variants (kebab, snake, PascalCase, SCREAMING_SNAKE, etc.)
- Longest-match-first replacement prevents overlapping value corruption
- Templates path components (my-app/src/ → {{ project_name }}/src/)
- Detects conditional files (.github/, Dockerfile, etc.) for optional inclusion
- Interactive by default with --batch for CI/scripting
- --dry-run to preview without writing
- Generates commented diecut.toml with prompted + computed variables
…sions

Add a custom `camelcase` Tera filter that properly lowercases the first
word and title-cases the rest (e.g., "my-app" -> "myApp"). Register it
via tera_with_filters() in the prompt engine and render walker.

Fix computed variable expressions in generated diecut.toml to include
{{ }} delimiters so they evaluate as Tera templates rather than being
treated as literal text.
Use word-boundary-aware matching so short variable values like "app"
don't get replaced inside longer words like "application". A match is
only accepted when the characters immediately before and after it are
not word-like (alphanumeric, underscore, or hyphen).
Instead of repeating verbose inline filter chains like
{{ project_name | replace(from="-", to=" ") | title | replace(from=" ", to="") }}
in every template file, reference the computed variable names already
defined in diecut.toml (e.g., {{ project_name_pascal }}).

Extract is_canonical_variant() as a public helper to deduplicate the
canonical-variant check between replacement rule building and computed
variable generation.
Add 4-tier automatic variable detection for `diecut extract`:
- Tier 1: Directory name (0.95 confidence)
- Tier 2: Ecosystem configs - Cargo.toml, package.json, pyproject.toml,
  go.mod (0.85-0.90 confidence)
- Tier 3: Git metadata - remote org, user.name (0.65-0.70 confidence)
- Tier 4: Frequency analysis with Levenshtein merging (scored 0.30-1.0)

Auto-detection runs when no --var flags are provided. Includes noise
filtering for language keywords, common libraries, file format words,
and stopwords. Scoring emphasizes variant diversity to prefer
identifiers that appear in multiple case forms.
Auto-detect now always runs when no --var is provided instead of
requiring --auto. Renamed --batch to -y/--yes to align with CLI
conventions. Added --min-confidence threshold flag. Name collisions
from multiple detection sources are now preserved for interactive
resolution instead of silently deduplicating.
Run cargo fmt to fix formatting issues and fix two clippy lints:
- Remove redundant closure in strip_email call
- Remove identity map on first.as_str()
Replace Regex::new() calls inside function bodies with
std::sync::LazyLock statics so regexes are compiled once
instead of on every invocation. Bumps MSRV to 1.80.
- Guard against infinite loop in merge-chain resolution by tracking
  visited nodes when walking the merge map
- Count path-only occurrences in file_count so confidence scoring
  doesn't miss values that appear only in file paths
- Rewrite apply_replacements as a single-pass algorithm that collects
  all match positions first, preventing later rules from corrupting
  Tera expressions inserted by earlier rules
- Propagate IO errors (e.g. permission denied) from scan instead of
  silently dropping unreadable files; only downgrade to binary on
  InvalidData (UTF-8 decode failure)
detect_excludes only checked if exclude patterns existed at the project
root, missing patterns like node_modules at deeper levels (e.g.
docs/node_modules/). Always include all DEFAULT_EXCLUDES since
should_exclude already handles nested matching via path components.

Also skip symlinks that resolve to directories during scan. pnpm's
node_modules/.pnpm uses symlinks to directories, and walkdir reports
these as non-directory entries, causing read_to_string to fail with
"Is a directory".
Git worktrees are working copies, not part of the project source.
Without this, extract would template duplicate files from any
active worktrees in the project.
Classify text files with 0 template replacements as boilerplate
(config, dotfiles, CI) or content (prose, source). Boilerplate is
copied in full; content files are stubbed to minimal placeholders
so templates preserve structure without project-specific prose.

Interactive confirmation now shows three categories: Templated,
Boilerplate, and Stubbed.
The rename of detect_excludes to all_default_excludes and the new
relevant_config_excludes function were already referenced by mod.rs
but the file itself was not staged in the previous commit.
Content files deeper than N path components (default 2) are now dropped
entirely instead of being stubbed. Shallow content files like README.md
or docs/guide.md are still stubbed as before. The threshold is
configurable via --stub-depth.
- Extract 6 interactive UI functions from mod.rs into interactive.rs
- Deduplicate config parsers with push_config_candidate helper
- Replace Tier 4 frequency analysis (~770 lines of noise-filter lists)
  with a ~60-line multi-variant heuristic requiring ≥2 case forms
- Remove strsim dependency (no longer needed)
Files deeper than stub_depth were only dropped when they had 0 template
replacements. Deep files with incidental replacements (e.g. a project
name appearing in a nested reference doc) were still kept as .die
templates. Now the depth check applies regardless of replacement count.
Non-boilerplate files deeper than stub_depth are now removed from the
scan result before frequency analysis runs. This prevents detecting
variables that only appear in files that would be dropped anyway.
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Mar 4, 2026

Deploying diecut with  Cloudflare Pages  Cloudflare Pages

Latest commit: 2f62404
Status: ✅  Deploy successful!
Preview URL: https://9515e3bd.diecut.pages.dev
Branch Preview URL: https://feat-extract-auto-detect.diecut.pages.dev

View logs

Remove auto-detect, interactive prompts, and conditional files to
reduce PR scope. These features are preserved on feat/extract-auto-detect
for a follow-up PR.

- Delete auto_detect.rs (1,140 lines), interactive.rs (426 lines),
  conditional.rs (170 lines)
- Remove --yes and --min-confidence CLI flags
- Move count_occurrences to scan.rs (test-only)
- Remove 4 auto-detect integration tests
- Strip dead params and deduplicate DEFAULT_EXCLUDES
@raiderrobert raiderrobert changed the title feat: add diecut extract command feat: add diecut extract command (engine-only) Mar 4, 2026
Defer variants, stub/drop, copy-without-render, camelcase filter,
and config_gen module to a follow-up PR. Inline minimal config
generation. Remove --stub-depth flag.

991 lines changed (down from 2,539).
Move exclude patterns to default_excludes.txt (Rust + macOS only).
Add --exclude-from flag to use a custom exclude file.
Replace all_default_excludes() with load_excludes(Option<&Path>).
Cover word-boundary matching, longest-match-first ordering, overlap
resolution, no-rescan guarantee, Unicode handling, and path replacements.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant