Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port owners-discovery PathGlob matching to rust #7299

Merged
merged 1 commit into from Mar 1, 2019

Conversation

Projects
None yet
2 participants
@stuhood
Copy link
Member

commented Mar 1, 2019

Problem

As described in #6795, we have python code with functionality that duplicates rust PathGlob matching, which is undesirable. So undesirable, in fact, that we recently noticed a bug where --changed and --owner-of (using that code) would not correct identify a file as being owned (globs like ./*.py would never match).

Solution

PathGlob expansion is currently implemented by lazily traversing a filesystem, and it is unclear how to decouple the lazy traversal from the glob matching without double-matching paths (once to implement the laziness, and then again to accept a complete path). Additionally, the codepath we are replacing is not particularly performance sensitive.

So: this patch takes the approach of implementing a small impl VFS for MemFS against which to expand PathGlobs. This means we pay the cost of the laziness, but it ensures an identical implementation of glob matching (assuming that the small MemFS is correct).

Result

Fixes #6795, and ensures that matching PathGlobs against a set of in memory file paths has identical behaviour to expanding against a filesystem.

@stuhood stuhood force-pushed the twitter:stuhood/rust-globs-memfs branch from 31a6ebc to 40e38a4 Mar 1, 2019

@cosmicexplorer
Copy link
Contributor

left a comment

This is awesome!!

return True
return False
path_globs = PathGlobs(include=patterns, exclude=exclude_patterns)
return Native().match_path_globs(path_globs, paths)

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Mar 1, 2019

Contributor

I didn't realize you could just construct a Native() in any random python code and it's giving me ideas!

This comment has been minimized.

Copy link
@stuhood

stuhood Mar 1, 2019

Author Member

It got much easier recentlyish: #6979

@stuhood

This comment has been minimized.

Copy link
Member Author

commented Mar 1, 2019

Failed on a known flake (#6838). Merging.

@stuhood stuhood merged commit dea6270 into pantsbuild:master Mar 1, 2019

1 check failed

continuous-integration/travis-ci/pr The Travis CI build failed
Details

@stuhood stuhood deleted the twitter:stuhood/rust-globs-memfs branch Mar 1, 2019

stuhood added a commit that referenced this pull request Mar 20, 2019

Simplify glob matching and directly match in-memory globs as Patterns. (
#7402)

### Problem

#7299 aligned the behaviour of in-memory (eager) and on-disk (lazy) glob matching, but unfortunately caused a serious performance regression in detection of file owners.

The on-disk (lazy) implementation of glob matching from `GlobMatchingImplementation::expand` was imposing about a 30x overhead when matching globs in memory... and unfortunately, we do that more often than I had realized.

### Solution

In two independently reviewable commits:
1. Simplify `GlobMatchingImplementation::expand` based on the realization that the `PathGlob` caching that we had been doing was a holdout from when we used to memoize them in the graph, and which would never hit due to the unique trailing components of each `PathGlob`.
2. Introduce and use `PathGlobs::matches(Vec<PathBuf>) -> bool` to replace `MemFS::expand` with an eager operation. Expand `FilespecTest` to confirm that this redundant implementation does not go out of sync.

### Result

`GlobMatchingImplementation::expand` is ~3x faster (representing a 20% speedup for `BuildGraph` hydration in a large repository), and owners detection for a command like:
```
time ./pants --changed-diffspec='2c9c338cd7^..2c9c338' list
```
... runs at approximately the speed that it did before #7299.

stuhood added a commit that referenced this pull request Mar 20, 2019

Simplify glob matching and directly match in-memory globs as Patterns. (
#7402)

### Problem

#7299 aligned the behaviour of in-memory (eager) and on-disk (lazy) glob matching, but unfortunately caused a serious performance regression in detection of file owners.

The on-disk (lazy) implementation of glob matching from `GlobMatchingImplementation::expand` was imposing about a 30x overhead when matching globs in memory... and unfortunately, we do that more often than I had realized.

### Solution

In two independently reviewable commits:
1. Simplify `GlobMatchingImplementation::expand` based on the realization that the `PathGlob` caching that we had been doing was a holdout from when we used to memoize them in the graph, and which would never hit due to the unique trailing components of each `PathGlob`.
2. Introduce and use `PathGlobs::matches(Vec<PathBuf>) -> bool` to replace `MemFS::expand` with an eager operation. Expand `FilespecTest` to confirm that this redundant implementation does not go out of sync.

### Result

`GlobMatchingImplementation::expand` is ~3x faster (representing a 20% speedup for `BuildGraph` hydration in a large repository), and owners detection for a command like:
```
time ./pants --changed-diffspec='2c9c338cd7^..2c9c338' list
```
... runs at approximately the speed that it did before #7299.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.