feat: add diecut extract command (engine-only)#117
Open
raiderrobert wants to merge 29 commits intomainfrom
Open
feat: add diecut extract command (engine-only)#117raiderrobert wants to merge 29 commits intomainfrom
diecut extract command (engine-only)#117raiderrobert wants to merge 29 commits intomainfrom
Conversation
…projects
Automates the biggest friction point in diecut: turning an existing project
into a reusable template. Point it at a project, tell it which values are
variables, and it produces a ready-to-use template with diecut.toml, .die
suffixed files, and computed case variants.
Key capabilities:
- Auto-detects case variants (kebab, snake, PascalCase, SCREAMING_SNAKE, etc.)
- Longest-match-first replacement prevents overlapping value corruption
- Templates path components (my-app/src/ → {{ project_name }}/src/)
- Detects conditional files (.github/, Dockerfile, etc.) for optional inclusion
- Interactive by default with --batch for CI/scripting
- --dry-run to preview without writing
- Generates commented diecut.toml with prompted + computed variables
…sions
Add a custom `camelcase` Tera filter that properly lowercases the first
word and title-cases the rest (e.g., "my-app" -> "myApp"). Register it
via tera_with_filters() in the prompt engine and render walker.
Fix computed variable expressions in generated diecut.toml to include
{{ }} delimiters so they evaluate as Tera templates rather than being
treated as literal text.
Use word-boundary-aware matching so short variable values like "app" don't get replaced inside longer words like "application". A match is only accepted when the characters immediately before and after it are not word-like (alphanumeric, underscore, or hyphen).
Instead of repeating verbose inline filter chains like
{{ project_name | replace(from="-", to=" ") | title | replace(from=" ", to="") }}
in every template file, reference the computed variable names already
defined in diecut.toml (e.g., {{ project_name_pascal }}).
Extract is_canonical_variant() as a public helper to deduplicate the
canonical-variant check between replacement rule building and computed
variable generation.
Add 4-tier automatic variable detection for `diecut extract`: - Tier 1: Directory name (0.95 confidence) - Tier 2: Ecosystem configs - Cargo.toml, package.json, pyproject.toml, go.mod (0.85-0.90 confidence) - Tier 3: Git metadata - remote org, user.name (0.65-0.70 confidence) - Tier 4: Frequency analysis with Levenshtein merging (scored 0.30-1.0) Auto-detection runs when no --var flags are provided. Includes noise filtering for language keywords, common libraries, file format words, and stopwords. Scoring emphasizes variant diversity to prefer identifiers that appear in multiple case forms.
Auto-detect now always runs when no --var is provided instead of requiring --auto. Renamed --batch to -y/--yes to align with CLI conventions. Added --min-confidence threshold flag. Name collisions from multiple detection sources are now preserved for interactive resolution instead of silently deduplicating.
Run cargo fmt to fix formatting issues and fix two clippy lints: - Remove redundant closure in strip_email call - Remove identity map on first.as_str()
Replace Regex::new() calls inside function bodies with std::sync::LazyLock statics so regexes are compiled once instead of on every invocation. Bumps MSRV to 1.80.
- Guard against infinite loop in merge-chain resolution by tracking visited nodes when walking the merge map - Count path-only occurrences in file_count so confidence scoring doesn't miss values that appear only in file paths - Rewrite apply_replacements as a single-pass algorithm that collects all match positions first, preventing later rules from corrupting Tera expressions inserted by earlier rules - Propagate IO errors (e.g. permission denied) from scan instead of silently dropping unreadable files; only downgrade to binary on InvalidData (UTF-8 decode failure)
detect_excludes only checked if exclude patterns existed at the project root, missing patterns like node_modules at deeper levels (e.g. docs/node_modules/). Always include all DEFAULT_EXCLUDES since should_exclude already handles nested matching via path components. Also skip symlinks that resolve to directories during scan. pnpm's node_modules/.pnpm uses symlinks to directories, and walkdir reports these as non-directory entries, causing read_to_string to fail with "Is a directory".
Git worktrees are working copies, not part of the project source. Without this, extract would template duplicate files from any active worktrees in the project.
Classify text files with 0 template replacements as boilerplate (config, dotfiles, CI) or content (prose, source). Boilerplate is copied in full; content files are stubbed to minimal placeholders so templates preserve structure without project-specific prose. Interactive confirmation now shows three categories: Templated, Boilerplate, and Stubbed.
The rename of detect_excludes to all_default_excludes and the new relevant_config_excludes function were already referenced by mod.rs but the file itself was not staged in the previous commit.
Content files deeper than N path components (default 2) are now dropped entirely instead of being stubbed. Shallow content files like README.md or docs/guide.md are still stubbed as before. The threshold is configurable via --stub-depth.
- Extract 6 interactive UI functions from mod.rs into interactive.rs - Deduplicate config parsers with push_config_candidate helper - Replace Tier 4 frequency analysis (~770 lines of noise-filter lists) with a ~60-line multi-variant heuristic requiring ≥2 case forms - Remove strsim dependency (no longer needed)
Files deeper than stub_depth were only dropped when they had 0 template replacements. Deep files with incidental replacements (e.g. a project name appearing in a nested reference doc) were still kept as .die templates. Now the depth check applies regardless of replacement count.
Non-boilerplate files deeper than stub_depth are now removed from the scan result before frequency analysis runs. This prevents detecting variables that only appear in files that would be dropped anyway.
Deploying diecut with
|
| Latest commit: |
2f62404
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://9515e3bd.diecut.pages.dev |
| Branch Preview URL: | https://feat-extract-auto-detect.diecut.pages.dev |
Remove auto-detect, interactive prompts, and conditional files to reduce PR scope. These features are preserved on feat/extract-auto-detect for a follow-up PR. - Delete auto_detect.rs (1,140 lines), interactive.rs (426 lines), conditional.rs (170 lines) - Remove --yes and --min-confidence CLI flags - Move count_occurrences to scan.rs (test-only) - Remove 4 auto-detect integration tests - Strip dead params and deduplicate DEFAULT_EXCLUDES
diecut extract commanddiecut extract command (engine-only)
5 tasks
Defer variants, stub/drop, copy-without-render, camelcase filter, and config_gen module to a follow-up PR. Inline minimal config generation. Remove --stub-depth flag. 991 lines changed (down from 2,539).
Move exclude patterns to default_excludes.txt (Rust + macOS only). Add --exclude-from flag to use a custom exclude file. Replace all_default_excludes() with load_excludes(Option<&Path>).
Cover word-boundary matching, longest-match-first ordering, overlap resolution, no-rescan guarantee, Unicode handling, and path replacements.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds the
diecut extractcommand — creates reusable templates from existing projects by replacing literal values with template variables.This PR contains the extraction engine only (explicit
--varflags). Auto-detection, interactive prompts, and conditional files are deferred to #118.What it does
diecut extract ./my-project --var project_name=my-app --output ./templatescans the project, replaces matching values with{{ project_name }}, and writes a ready-to-use templatediecut.tomlconfig with discovered variables and computed variants (e.g. snake_case, PascalCase)--stub-depth), and gitignore-style excludes--dry-runto preview changes without writingNew modules
src/extract/— orchestrator (mod.rs), scanner, replacer, config generator, variant detection, exclusion, content stubbingsrc/commands/extract.rs— CLI handlerFollow-up: #118 adds auto-detection, interactive prompts, and conditional files (~2,000 additional lines).
Test plan
cargo fmt --checkpassescargo clippy -- -D warningspassesdiecut extract /tmp/test-project --var project_name=my-app --output /tmp/test-template