Skip to content

feat: eng-71 recursive glob#2089

Merged
cherkanovart merged 11 commits into
mainfrom
feat/eng-71-recursive-glob
May 18, 2026
Merged

feat: eng-71 recursive glob#2089
cherkanovart merged 11 commits into
mainfrom
feat/eng-71-recursive-glob

Conversation

@cherkanovart
Copy link
Copy Markdown
Contributor

@cherkanovart cherkanovart commented May 13, 2026

Summary

Adds recursive glob (**) support to bucket include/exclude in i18n.json. Pattern like config/locales/**/[locale].yml now matches files at any depth, replacing the previous manual enumeration of each depth level.

Implementation highlights:

  • DFS + memoization in mapPatternToSource aligns variable-length ** segments against source paths.
  • Recursive intra-segment matcher (restoreLocaleInSegment) restores [locale] placeholders when a segment contains multiple [locale] tokens (e.g. [locale]-fixed-[locale].json).
  • Two-stage ambiguity detection: hard error when ** admits multiple valid [locale] positions, or when [locale] cannot be unambiguously restored inside a segment. Replaces the silent fallback that closed previous attempts (PRs feat: recursive glob patterns #1178, feat: recursive glob patterns #1179, feat: more flexible glob patterns  #1180).
  • Default ignore list (node_modules, .git, dist, build, .next, .turbo) applied only for patterns containing **. Concrete and single-* patterns keep the previous traversal behavior exactly.
  • follow: false for ** patterns to avoid symlink cycles.

Manual verification (real filesystem, not mocks)

Run via lingo.dev show files on a scratch fixture with locale files at depths 0/1/2, plus decoys in node_modules/ and dist/.

Main scenarios

# Pattern Expected Result
1 config/locales/[locale].yml (legacy concrete) 1 file at depth 0
2 config/locales/*/[locale].yml (legacy *) 1 file at depth 1 only
3 config/locales/**/[locale].yml 4 files across depths 0/1/2/2
4 **/[locale].yml from repo root 4 files, node_modules and dist filtered out ✅ default-ignore works
5 config/locales/**/[locale].yml + exclude: admin/**/[locale].yml 2 files, admin subtree dropped
6 config/locales/**/**/[locale].yml (consecutive **) Same as #3 — redundant but handled ✅ algorithm doesn't choke on degenerate input
7 [locale][locale].txt against enen.txt (no separator) Resolves deterministically to [locale][locale].txt ✅ recursive intra-segment matcher unwinds it
8 **/[locale]/**/dummy.txt against en/x/en/dummy.txt (locale value appears twice as a directory) Hard throw — [locale] has two valid positions ✅ actionable error, no silent corruption

Edge cases

# Pattern Concern Result
1 multi/**/[locale]/[locale].json Multiple [locale] segments separated by ** ✅ resolves at depth 0 and depth 2
2 locales/**/[locale]/strings.json, source en-US → target de-DE Dash in locale name with ** ✅ both depths, target substituted
3 {"path":"locales/**/[locale]/file.json","delimiter":"-"} (config uses _, on-disk uses -) resolveOverriddenLocale flows through ** ✅ source matches disk casing, target paths use dash
4 .config/locales/**/[locale].yml Dotfile directories under ** dot: true lets minimatch descend

Round-trip (Moses' check)

Created fr.yml and es.yml next to every en.yml in the fixture, then ran the CLI. Every CLI-resolved path matches a real file on disk — diff between CLI output and find output is empty.

Cross-platform end-to-end verification (macOS + Windows)

Standalone fixture project (config/locales/**/[locale].yml at depths 0/1/2, decoys in node_modules/, dist/, admin/ excluded, plus legacy patterns for backward compat) translated end-to-end via lingo.dev run against the real API on both platforms with the locally-built CLI from this branch.

Platform Shape A (**) Shape B (legacy enumerated depths) Backward compat
macOS (darwin) 12/12 tasks processed, 0 failed 12/12 tasks processed, 0 failed ✅ identical output
Windows 11 (PowerShell) 12/12 tasks processed, 0 failed 12/12 tasks processed, 0 failed ✅ identical output

Both shapes produced the same set of 12 target files. Windows-side checks:

  • Path separators (\) round-trip correctly through mapPatternToSource and restoreLocaleInSegment — depth-0/1/2 yaml files all resolved.
  • exclude: ["config/locales/admin/**/[locale].yml"] filtered admin/ on Windows (this was the candidate Windows-specific risk).
  • DEFAULT_GLOB_IGNORE filtered node_modules\some-pkg\locales\en.yml and dist\locales\en.yml on Windows.
  • Legacy patterns (legacy/[locale]/messages.json, legacy/single-star/*/[locale].json) unchanged.

Backward compatibility guarantees (non-** patterns)

To ensure existing customer configs are not affected by the algorithmic rewrite, the new path-resolution code is gated behind a strict **-segment check; non-** patterns are routed through a near-verbatim copy of the pre-PR implementation.

  • Gate: pathPatternChunks.includes("**") — checks the segment array, not the joined string, so a literal foo**bar inside a segment is not mistaken for a globstar.
  • Legacy branch: For patterns without a ** segment, restoration uses the original regex with /i (added in this PR to fix mixed-case locales on macOS / Windows) and the original silent-fallback behavior. glob.sync is invoked with follow: true, ignore: undefined — identical to pre-PR.
  • Exclude semantics: differenceBy keys on pathPattern only (not delimiter), preserving the pre-PR contract that an exclude entry cancels an include with a matching path regardless of delimiter shape.

This combination guarantees that any existing valid i18n.json produces identical bucket output before and after this PR. The new strict mapping algorithm, ambiguity probes, and DEFAULT_GLOB_IGNORE only activate when the user explicitly opts in by writing ** in their pattern.

Backward-compat coverage in the test suite:

  • does not treat literal ** inside a segment as a recursive globstar (foo**bar stays on the legacy path)
  • silently leaves the source chunk untouched when [locale] cannot be matched (legacy fallback)
  • leaves source chunk untouched when legacy regex cannot match (multi-[locale] segment)
  • exclude subtracts include by pathPattern regardless of delimiter mismatch

Algorithmic safeguards

Two ambiguity layers, both with explicit CLIError instead of the silent fallback that closed PR #1180:

  • Segment-level ambiguity (via mapPatternToSource + forbid probe). When ** admits two source-index positions for the [locale] segment, the second placement is detected and the CLI refuses to pick arbitrarily. Example: **/[locale]/**/dummy.txt against en/x/en/dummy.txt.
  • Intra-segment failure (via restoreLocaleInSegment). When the locale value cannot be located inside a matched segment such that the rest of the segment still matches the pattern, the CLI throws rather than returning the literal pattern as a fallback. Deterministic multi-[locale] patterns ([locale]-fixed-[locale].json, [locale][locale].txt) resolve correctly thanks to recursive matching.

Rollback plan

Single-file revert is sufficient: git revert <merge-sha> on packages/cli/src/cli/utils/buckets.ts and packages/cli/package.json (to drop minimatch). No data migrations or lockfile changes are involved (lock file keys are MD5 of the pattern string, which stays stable across the change).


Summary by CodeRabbit

  • New Features

    • Support recursive glob patterns (**) in bucket include/exclude (e.g., config/locales/**/[locale].yml) with clear errors when a match can't unambiguously map to the locale placeholder.
    • Automatic exclusion of common vendored/build directories when using **.
  • Documentation

    • Updated changeset and config field descriptions clarifying ** behavior and safety rules.
  • Tests

    • Expanded test coverage for recursive glob handling, dedupe, exclusion, and edge cases.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 13, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Enable recursive ** glob patterns in bucket include/exclude, add minimatch, rewrite glob expansion to restore [locale] via DFS/memoized mapping, introduce ambiguousPathPattern CLI error, and add tests and docs.

Changes

Bucket Path Glob Expansion with Recursive Pattern Support

Layer / File(s) Summary
Dependencies and error definitions
packages/cli/package.json, packages/cli/src/cli/utils/errors.ts
minimatch (10.2.5) is added as a dependency; exported docLinks map gains ambiguousPathPattern key to support new error cases.
Core recursive glob expansion logic
packages/cli/src/cli/utils/buckets.ts
minimatch and DEFAULT_GLOB_IGNORE are introduced to avoid descending into vendored/build directories. Include patterns are deduplicated by (pathPattern, delimiter); exclusions use the same dedupe key. Prior ** rejection is removed; expandPlaceholderedGlob normalizes paths, adjusts glob.sync options when ** is present, and restores [locale] via DFS-memoized helpers (mapPatternToSource, alignPatternToSource, buildLocalePlaceholderSegment, restoreLocaleInSegment) that throw CLIError with ambiguousPathPattern on ambiguous alignments.
Test coverage for glob patterns and edge cases
packages/cli/src/cli/utils/buckets.spec.ts
Test helper makeI18nConfig gains optional exclude. New Vitest cases cover recursive ** with varied locale placements, deduplication/overlap, ** in exclude patterns, empty results, unmappable-path errors, and glob.sync option behaviors (ignore, follow, non-recursive backcompat).
Schema and changeset documentation updates
packages/spec/src/config.ts, .changeset/recursive-glob-patterns.md
Zod .describe(...) strings for bucket include/exclude are reformatted; changeset documents ** support, default **-scoped ignores, and ambiguous-mapping error behavior.

Sequence Diagram(s)

sequenceDiagram
  participant CLI
  participant GlobSync
  participant Mapper
  participant SegmentBuilder
  CLI->>GlobSync: call glob.sync(pattern, ignore/follow tuned for **)
  GlobSync->>Mapper: return matched file paths
  Mapper->>SegmentBuilder: align pattern segments to path segments (DFS + memo)
  SegmentBuilder->>CLI: emit restored `[locale]` paths or throw ambiguousPathPattern
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • vrcprl

Poem

🐰 Globs bloom with ** stars so bright,
I hop through paths to find locale light,
Minimatch guides each segment's art,
Tests and docs keep each step smart,
Ambiguity checked, no paths depart.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: eng-71 recursive glob' directly describes the main feature (recursive glob support) and references the issue ticket, clearly summarizing the primary change.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed PR description is comprehensive and addresses all required template sections with substantial detail and manual verification evidence.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/eng-71-recursive-glob

Comment @coderabbitai help to get the list of available commands and usage tips.

@cherkanovart cherkanovart changed the title Feat/eng 71 recursive glob feat: eng-71 recursive glob May 13, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/cli/src/cli/utils/buckets.ts`:
- Around line 263-270: The code currently chooses the first successful k in the
isDoubleStar branch (inside dfs), which can silently pick the wrong alignment
when multiple k values match; change the loop to detect ambiguity: iterate all k
from j..sourceLength, collect/count successful dfs(i+1,k) results, if count==0
leave matched false (return false), if count==1 set parent.set(memoKey, { i2:
i+1, j2: kFound }) and matched=true, if count>1 throw an explicit Error (or
return a hard-failure) indicating an ambiguous `**` alignment for the current
memoKey/pattern segment so callers know the match is ambiguous rather than
arbitrarily picking the first. Ensure you still memoize only the unique mapping
when setting parent.
- Around line 317-336: The function buildLocalePlaceholderSegment currently only
handles a single placeholder occurrence because it computes leftGlob/rightGlob
from the first placeholder and only restores the first matching locale in
sourceChunk; detect multiple occurrences of the placeholder by counting
occurrences in patternChunk (e.g., split or indexOf loop) and either (a)
explicitly reject them with a clear thrown Error mentioning
buildLocalePlaceholderSegment, placeholder and patternChunk, or (b) implement
full support by iterating over each placeholder instance: for each placeholder
occurrence compute its leftGlob/rightGlob relative to that position, scan
sourceChunk for matching locale occurrences (using leftMatches/rightMatches
logic) and replace them with placeholder while preserving other matches; choose
one approach and add/update tests accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d466f1b9-3a86-4ecc-826c-781ab15275d3

📥 Commits

Reviewing files that changed from the base of the PR and between 19955fb and 62c8ea1.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (5)
  • packages/cli/package.json
  • packages/cli/src/cli/utils/buckets.spec.ts
  • packages/cli/src/cli/utils/buckets.ts
  • packages/cli/src/cli/utils/errors.ts
  • packages/spec/src/config.ts

Comment thread packages/cli/src/cli/utils/buckets.ts
Comment thread packages/cli/src/cli/utils/buckets.ts Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/cli/src/cli/utils/buckets.spec.ts`:
- Around line 461-503: Reset the shared glob mock before each test and tighten
assertions to only check the current test call: add a beforeEach that
clears/reset vi.mocked(glob.sync) (or call mockGlobSync reset helper) and then
replace toHaveBeenCalledWith assertions with either
expect(vi.mocked(glob.sync)).toHaveBeenLastCalledWith(...) or assert
expect(vi.mocked(glob.sync)).toHaveBeenCalledTimes(1) plus toHaveBeenCalledWith,
keeping references to mockGlobSync and getBuckets so you clear the mock before
invoking getBuckets and then validate the glob.sync call in that single test.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f8b7f873-f7eb-4021-8603-c8610190cace

📥 Commits

Reviewing files that changed from the base of the PR and between eaf624e and 38dc784.

📒 Files selected for processing (2)
  • packages/cli/src/cli/utils/buckets.spec.ts
  • packages/cli/src/cli/utils/buckets.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/cli/src/cli/utils/buckets.ts

Comment thread packages/cli/src/cli/utils/buckets.spec.ts
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 13, 2026

Caution

Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted.

Error details
{}

@cherkanovart cherkanovart merged commit 0106b48 into main May 18, 2026
4 checks passed
@cherkanovart cherkanovart deleted the feat/eng-71-recursive-glob branch May 18, 2026 09:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants