-
-
Notifications
You must be signed in to change notification settings - Fork 605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of SnapshotSubset
#9706
Comments
cosmicexplorer
added a commit
that referenced
this issue
Jun 23, 2020
[ci skip-jvm-tests] ### Problem Fixes #9706. This PR blocks #9781. ### Solution This PR is split into two sets of commits: 1. Add a [criterion](https://github.com/bheisler/criterion.rs) benchmark to test the issue described in #9706. - We add a benchmark for both the merge and subset operations. 2. Move many higher-level operations on `Snapshot` into a separate file `snapshot_ops.rs`, managed by a `SnapshotOps` trait. - Improve the performance of the snapshot `subset()` operation as described below. In more depth: #### Subsetting Snapshot subsetting currently (with a glob) will have to recursively walk all subdirectories of a digest. When we select with the glob `src/**`, we needlessly traverse all files recursively within the directory `src/`, when really all we need to do is just select the single `src/` DirectoryNode from the containing Directory and create a new Directory proto from that, which is much, much less work. The above was performed by partially reimplementing glob matching logic to support early termination in `snapshot_ops.py`. Notably, this solution requires no extra caches. ### Result This output was produced by running `./build-support/bin/native/cargo bench --manifest-path=src/rust/engine/Cargo.toml --package store` on the first commit, then the last commit: ``` materialize_directory/materialize_directory time: [79.315 ms 82.801 ms 90.041 ms] change: [+85.021% +102.67% +126.80%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 10 measurements (20.00%) 2 (20.00%) high severe snapshot_subset/wildcard time: [1.0343 ms 1.0643 ms 1.0909 ms] change: [-99.882% -99.875% -99.866%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 10 measurements (10.00%) 1 (10.00%) high severe snapshot_merge/snapshot_merge time: [1.2689 s 1.3645 s 1.4771 s] change: [+17.873% +23.450% +31.007%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 10 measurements (10.00%) 1 (10.00%) high severe ```
Thanks @cosmicexplorer for fixing this!! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
As first discovered in #9702 (comment), calling
await Get[Snapshot](SnapshotSubset(original_snapshot.digest, PathGlobs(original_snapshot.files)
with 10k files in the snapshot takes 35 seconds.To reproduce, use this diff and run
./v2 test src/python/pants/core/util_rules/strip_source_roots_test.py
:The text was updated successfully, but these errors were encountered: