Switch tarx to go-git's gitignore matcher#777
Conversation
A vanilla `bridgetown new` ships with `tmp/pids/.keep` plus a `.gitignore` that uses `/tmp/*` and a `!/tmp/pids/` negation to keep the pidfile directory tracked. Deploys would drop `tmp/` from the tar entirely, so Puma had nowhere to write `tmp/pids/server.pid` at boot. Root cause was `shibumi/go-pathspec`, which exposes a single-string path API with no way to indicate the path is a directory. It silently mishandles trailing-slash patterns (even basic positive matches like `node_modules/` against directory paths). Our `isGitignored` used a dual-form workaround (try both `relPath` and `relPath + "/"`) that papered over simple cases but broke negation: the no-slash form matches `/tmp/*` and short-circuits before the trailing-slash negation `!/tmp/pids/` can apply. The walker then `SkipDir`s `tmp/pids/`, exits `tmp/` having accepted zero files inside, and lazy directory emission drops `tmp/` from the tar entirely. `go-git/v5/plumbing/format/gitignore` is the parser go-git itself uses, with a `Match(path []string, isDir bool) bool` API. Binary delta is +20KB after dead-code elimination. All 19 existing tarx tests pass with zero modifications. `TestMakeTarBridgetownTmpPids` is the new regression guard. End-to-end verified: `/app/tmp/pids/.keep` now lands in the deployed image as `app:app`, and Puma boots without any `PIDFILE` workaround. Fixes MIR-1075.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThe tarx package replaces go-pathspec with go-git’s Review rate limit: 3/5 reviews remaining, refill in 12 minutes and 32 seconds. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/tarx/tarx.go`:
- Around line 32-35: The file currently swallows all errors from
os.ReadFile(path) when loading .gitignore, which hides real read failures;
change the reader so it only treats a missing file (os.IsNotExist(err)) as
optional and returns patterns, nil in that case, but for other errors return the
error to the caller; update the .gitignore-reading function to return (patterns,
error) accordingly and propagate that error through ComputeManifest and
makeTarWithFilter instead of treating every failure as “no patterns”.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ac632f55-18d2-4ead-bbff-d280e9216905
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (3)
go.modpkg/tarx/tarx.gopkg/tarx/tarx_test.go
parseGitignoreFile was returning nil on any read error, which made a permission-denied or transient I/O failure on a .gitignore look identical to the file not existing. That silently turned "couldn't read your ignore rules" into "no patterns, ship everything." Only treat os.IsNotExist as the optional case now and propagate everything else up through ComputeManifest and makeTarWithFilter.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/tarx/tarx.go`:
- Around line 290-293: The goroutine that calls filepath.Walk currently ignores
the error returned by Walk, causing nested .gitignore parse errors (from
parseGitignoreFile) to be lost and the reader to see EOF; modify the goroutine
that uses os.Pipe to capture the error returned by filepath.Walk and propagate
it to the reader by writing the error through the pipe (or closing the writer
with the error) so callers of the tar generation observe the failure; locate the
Walk invocation and its callback, ensure any non-nil error from filepath.Walk is
forwarded (not swallowed) and that parseGitignoreFile errors returned from the
callback are returned by Walk and sent across the pipe to the reader side.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
The goroutine in makeTarWithFilter discarded filepath.Walk's return value, so any error from inside the walk (now including the gitignore parse errors the previous commit started returning) reached the reader as a silent EOF. Switched the pipe from os.Pipe to io.Pipe so we have CloseWithError available, captured the walk error, and made sure the tar and gzip trailers flush before we close.
evanphx
left a comment
There was a problem hiding this comment.
All good, but I think we need to fix the cross-contanimation of the gitignore patterns from sibling directories.
We were calling gitignore.NewMatcher inside the walker callback for both the includes and ignores, so we'd rebuild a matcher per file. Hoist them out: the includes matcher never changes during the walk, and the ignores matcher only needs rebuilding when a nested .gitignore adds new patterns. Same shape applied in ComputeManifest and makeTarWithFilter. Also add TestMakeTarSiblingGitignoresDoNotCrossPollute to lock down that nested .gitignores stay scoped to their own subtree. The accumulating slice mechanic looks fishy at a glance (patterns from subdirA's .gitignore stay in the slice when we walk subdirB), but go-git's Pattern carries a domain set at parse time and Match early-returns NoMatch if the path doesn't start with that domain. The test confirms cross-pollution doesn't happen, and stripping the domain makes the test fail with the expected symptom.
evanphx
left a comment
There was a problem hiding this comment.
Ah, the pattern domain thing makes sense. I had to go through it, basically it's clamping the pattern to paths that are inside the path to the gitignore.
A user report came in about a vanilla
bridgetown newapp failing to boot after deploy withErrno::ENOENT - tmp/pids/server.pid. Bridgetown ships withtmp/pids/.keepplus a.gitignorethat uses/tmp/*and!/tmp/pids/to keep the pidfile directory tracked. When miren bundles source for deploy, the gitignore walker dropstmp/from the tar entirely, sotmp/pids/never makes it into the image, and Puma has nowhere to write its pidfile.The walker was using
shibumi/go-pathspec, a generic glob library whoseGitIgnorefunction takes a single string path with no way to indicate it's a directory. OurisGitignoredworked around that by checking bothrelPathandrelPath + "/"and OR-ing the result, but the dual-form check breaks negations: the no-slash form matches/tmp/*and short-circuits before the trailing-slash negation!/tmp/pids/ever gets a chance. Once the walkerSkipDirstmp/pids/, lazy directory emission dropstmp/from the tar. Pathspec is a generic glob library, not a real gitignore implementation, and we'd been bending it into the wrong shape.Swapping to
go-git/v5/plumbing/format/gitignore, the parser go-git itself uses, gets us a properMatch(path []string, isDir bool) boolAPI. All 19 existing tarx tests pass unchanged, which is a strong signal the swap is behavior-preserving for everything except the bug we were after.TestMakeTarBridgetownTmpPidsis the new regression guard. Binary delta is +20KB after dead-code elimination.Closes MIR-1075.