Avoid per-match allocations in matchPattern and walkRecursive#11
Avoid per-match allocations in matchPattern and walkRecursive#11
Conversation
| } | ||
|
|
||
| if m.Match(matchPath) { | ||
| if m.MatchPath(filepath.ToSlash(entryRel), entry.IsDir()) { |
There was a problem hiding this comment.
I don't know if this is feasible, but would it be possible to compile the patterns from the gitignore files anticipating the hosting operating system's (or file system's) path separator, such that all of the subsequent match tests don't need to attempt this path separator conversion?
It would be nice if we could do some extra work up front, and then save perhaps many thousands of calls to filepath.ToSlash later—even if those calls mostly terminate with a single byte comparison.
There was a problem hiding this comment.
On anything other than Windows filepath.ToSlash already short-circuits to return path after the Separator == '/' check, so there is nothing to save here on Unix.
On Windows it is a real allocation per entry. Avoiding it is awkward though: the matcher splits the input on / and the public Match/MatchPath contract is slash-separated paths, so compiling patterns to the OS separator would either change that contract or require match to split on both. The narrower fix is to keep rel in slash form inside walkRecursive and only FromSlash when calling fn, which trades one conversion for another in a different spot.
Happy to look at the walkRecursive rework as a follow-up if Windows is a target you care about, but I would rather not widen this PR further.
There was a problem hiding this comment.
Thank you for the explanation.
In my own program, since I know that I'm not going to use it anything other than Linux and macOS, I'm not bothering with the path separator conversion calls.
| if p.prefix != "" { | ||
| prefixSegs := strings.Split(p.prefix, "/") | ||
| if len(segs) < len(prefixSegs) { | ||
| if len(p.prefix) > 0 { |
There was a problem hiding this comment.
This may just disappear in the compiler output, but consider saving this length for several subsequent uses in this block.
| if len(p.prefix) > 0 { | |
| if prefixLength := len(p.prefix); prefixLength > 0 { |
Then s/len\(p\.prefix\)/prefixLength/g.
Were it not for the assignment to segs at line 354, we could drop this check for a positive length, because all of the statements that follow would still work correctly even when p.prefix is empty. It's wasteful, though, to get to the segs = segs[len(p.prefix):] slicing and assignment statement unnecessarily.
There was a problem hiding this comment.
Done in c4aa225 (went with n rather than prefixLength to keep the lines short).
| return false | ||
| } | ||
| for i, ps := range prefixSegs { | ||
| for i, ps := range p.prefix { |
There was a problem hiding this comment.
This could use slices.Equal, but we then pay the cost of the length comparison again.
if !slices.Equal(p.prefix, segs[:len(p.prefix)]) {There was a problem hiding this comment.
Agreed, sticking with the explicit loop since it avoids the redundant length check and the extra import.
| } | ||
|
|
||
| if m.Match(matchPath) { | ||
| if m.MatchPath(filepath.ToSlash(entryRel), entry.IsDir()) { |
There was a problem hiding this comment.
Thank you for the explanation.
In my own program, since I know that I'm not going to use it anything other than Linux and macOS, I'm not bothering with the path separator conversion calls.
Three small allocation reductions on the match path, raised by @seh while reviewing #10.
matchPatternwas callingstrings.Split(p.prefix, "/")on every invocation for patterns that came from a nested.gitignore. The prefix never changes after compilation, socompilePatternnow splits it once and stores the segments on thepatternstruct.walkRecursivewas building a match path with a trailing/for directory entries and callingMatch, which then strips the slash off again. It now callsMatchPathwithentry.IsDir()directly and skips the concatenation.walkRecursivewas also callingos.Staton each directory's.gitignorebeforeAddFromFile, butAddFromFilealready handles a missing file by returning early on theos.ReadFileerror. Dropping the guard saves a syscall per directory walked.Added
BenchmarkMatchNestedPatternsandBenchmarkWalkto cover these paths. On an M1 Pro:The existing match benchmarks also dropped a bit (around 10%) though those have no prefixed patterns so that may partly be noise.