New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ThreadPool for cache fetching and rust tar for artifact extration #6748

Merged
merged 45 commits into from Nov 19, 2018

Conversation

Projects
None yet
5 participants
@wisechengyi
Contributor

wisechengyi commented Nov 9, 2018

Problem

response = self._request('GET', cache_key)
would hang and thus never exiting the multiprocessing pool. The closest issue I can find is https://stackoverflow.com/questions/25943923/requests-get-hangs-when-called-in-a-multiprocessing-pool which doesn't have any solution.

I was able to repro the issue almost consistently with PyCharm debug mode with

clean-all compile --cache-read-from="https://dummy_url/" examples/tests/java/org/pantsbuild/example/hello/greet/:
...
[zinc]
  [cache] (stuck here)

and the issue seems to be resolved by using ThreadPool.

Solution

Since most of the untar work was done in Python and was CPU bound, we chose to move that to rust for the performance.

@wisechengyi wisechengyi changed the title from [wip] use threadpool instead multiprocessing pool due to hang to [wip] use threadpool instead of multiprocessing pool due to hang Nov 9, 2018

@wisechengyi

This comment has been minimized.

Contributor

wisechengyi commented Nov 9, 2018

If there is no objection to the idea, I can go refactor the proc terminologies to thread.

@stuhood

This comment has been minimized.

Member

stuhood commented Nov 9, 2018

@wisechengyi : One of the reasons that this is a multiprocessing pool is to avoid the GIL while unpacking cached artifacts. So you'd want to benchmark this carefully to ensure that it didn't regress.

But yea, if it doesn't regress, using a threadpool here would be strictly simpler.

@wisechengyi

This comment has been minimized.

Contributor

wisechengyi commented Nov 9, 2018

Looks like there is certain perf impact. Alternatively we can do the requests part with ThreadPool, but the unzipping part with subproc

@stuhood

This comment has been minimized.

Member

stuhood commented Nov 10, 2018

Looks like there is certain perf impact. Alternatively we can do the requests part with ThreadPool, but the unzipping part with subproc.

A nearby middle ground would be to continue to use requests, switch to a threadpool, but then expose a rust method to do the un-tar'ing. The GIL is released while you are in native code, so if we exposed a native method for just the untar'ing, it would likely have advantages both for performance and for portability.

We have used the tar crate in the past, but it doesn't directly support compressed archives. The rust cookbook shows this combo though: https://rust-lang-nursery.github.io/rust-cookbook/compression/tar.html ... which looks fairly straightforward.

@wisechengyi

This comment has been minimized.

Contributor

wisechengyi commented Nov 10, 2018

Thanks, would probably give it a go!

CMLivingston and others added some commits Nov 8, 2018

Minor cleanups to integration tests (#6734)
Minor comment cleanups to integration tests in the Python backend.
Rename DirectoryDigest to Digest (#6740)
I'm about to use it to reference raw file contents
Add intrinsic to download a file (#6660)
* Allow temporary http server to be started

* Disable HTTPS support in grpc

We're finding it hard to link both grpcio (which statically links
openssl) and reqwest (which dynamically links openssl).

We plan to switch to Tower soon, so just disabling HTTPS support at the
grpcio layer for now, which I don't think anyone is relying on.

* Add intrinsic to download a file
Update to rust 1.30 (#6741)
* Update to rust 1.30

* Re-fmt

* Ditch macro_uses: explicitly using specific macros is more consistent, and should
marginally speed up the compiler.
Allow tests to run with isolated stores (#6743)
Also, add a decorator to explicitly use an empty store for a block of
code.

Each unique scheduler will have a unique store. This will be important
soon, as I'm about to add tests for caching of file downloads which
relies on the in-store state.
Run integration test against Python 3 (#6732)
### Problem
As we push towards python 3 support for pants, we need integration tests to also run against python 3. 

### Solution
Run integration tests against python 3

### Notes:
- As the cost of running integration test against both python 2 and 3 is too costly, it was decided we would switch CI to run python 3 only with a daily cron job running same tests against python 2. This Cron job is managed in the Travis dashboard.
- We have a blacklist at `build-support/known_py3_integration_failures.txt` for all tests failing in python3. We do run those tests against python2.
- We added 1 shard and reshuffled integration shards. So we now have 8 integration tests shards, 6 for python 3 and 2 for python 2 (blacklisted tests).
Ignore paths more deeply to avoid graph impact when ignored files are…
… added/removed (#6752)

### Problem

The path ignoring that is controlled by the `--pants-ignore` global option currently ignores paths during glob expansion. But glob expansion consumes cached filesystem operations, meaning that those cached filesystem operations are not pre-filtered for ignored files. The effect is that the cached `DirectoryListing` for a directory still changes when ignored files are added and removed.

### Solution

Push path filtering down to `scandir_sync`, so that it is below the `Scandir`/`DirectoryListing` caching. In a separate commit, log both the number of `cleared` and `dirtied` nodes.

### Result

Node dirtying should be effective in more cases, because creating/removing a file at an ignored path will only clear a single node, while all other nodes can be cleaned without re-running.

Example for creating a file `ignore_this_file` after running `./pants --enable-pantsd --pants-ignore='+["ignore_this_file"]' list ::` :
```
# Before
I1109 12:14:50.272815 86270 scheduler_service.py:88] enqueuing 1 changes for subscription all_files
I1109 12:14:50.306777 86270 scheduler.py:282] invalidated 10433 nodes for: set([u'', u'ignore_this_file'])
I1109 12:14:54.746129 86270 pailgun_server.py:72] handling pailgun request: `./pants --enable-pantsd --pants-ignore=+["ignore_this_file"] list ::`
W1109 12:14:54.918494 86270 native.py:413] Globs did not match. Excludes were: []. Unmatched globs were: ["testprojects/src/java/org/pantsbuild/testproject/bundle/file1.aaaa", "testprojects/src/java/org/pantsbuild/testproject/bundle/file2.aaaa"].
<snip more running globs>
I1109 12:14:56.162152 86270 pailgun_server.py:81] pailgun request completed: `./pants --enable-pantsd --pants-ignore=+["ignore_this_file"] list ::`

# After
I1109 12:11:46.683013 83652 scheduler_service.py:88] enqueuing 1 changes for subscription all_files
W1109 12:11:46.719578 83652 native.py:413] invalidation: cleared 1 and dirtied 10432 nodes for: {"", "ignore_this_file"}
I1109 12:11:51.063949 83652 pailgun_server.py:72] handling pailgun request: `./pants --enable-pantsd --pants-ignore=+["ignore_this_file"] list ::`
I1109 12:11:51.748687 83652 pailgun_server.py:81] pailgun request completed: `./pants --enable-pantsd --pants-ignore=+["ignore_this_file"] list ::`
```
@wisechengyi

This comment has been minimized.

Contributor

wisechengyi commented Nov 12, 2018

Hooked up the rust tar api, relatively straight forward. Currently digging on some discrepancies that caused some caches to be missed.

@wisechengyi

This comment has been minimized.

Contributor

wisechengyi commented Nov 15, 2018

There are some legit errors in CI. Will dig more.

Show resolved Hide resolved src/python/pants/cache/artifact.py Outdated
Show resolved Hide resolved src/rust/engine/tar_api/Cargo.toml Outdated
Show resolved Hide resolved src/rust/engine/tar_api/src/tar_api.rs

@wisechengyi wisechengyi changed the title from [wip] Use ThreadPool for cache fetching and Rust tar to unzip the artifact to [wip] Use ThreadPool for cache fetching and rust tar to unzip the artifact Nov 16, 2018

wisechengyi added some commits Nov 16, 2018

@wisechengyi wisechengyi changed the title from [wip] Use ThreadPool for cache fetching and rust tar to unzip the artifact to Use ThreadPool for cache fetching and rust tar to unzip the artifact Nov 16, 2018

@wisechengyi wisechengyi changed the title from Use ThreadPool for cache fetching and rust tar to unzip the artifact to Use ThreadPool for cache fetching and rust tar for artifact extration Nov 16, 2018

@illicitonion

Looks great :) Thanks!

Show resolved Hide resolved src/python/pants/cache/artifact.py Outdated
Show resolved Hide resolved src/rust/engine/tar_api/src/tar_api.rs Outdated
Show resolved Hide resolved src/rust/engine/tar_api/src/tar_api.rs Outdated
Show resolved Hide resolved src/rust/engine/tar_api/src/tar_api.rs Outdated

wisechengyi added some commits Nov 16, 2018

@@ -787,6 +789,10 @@ def buffer(self, cdata):
def to_ids_buf(self, types):
return self.context.type_ids_buf([TypeId(self.context.to_id(t)) for t in types])
def decompress_tarball(self, tarfile_path, dest_dir):
result = self.lib.decompress_tarball(tarfile_path, dest_dir)
self.context.raise_or_return(result)

This comment has been minimized.

@illicitonion

illicitonion Nov 16, 2018

Contributor

You'll need the return, still :)

This comment has been minimized.

@wisechengyi

wisechengyi Nov 16, 2018

Contributor

Will do.

In this particular case, it is just a Ok(()) coming from rust, so I can keep the caller as is, right? as in not having to handle the result, but only catching the exception.

  def extract(self):
    # Note(yic): unlike the python implementation before, now we do not update self._relpath
    # after the extraction.
    try:
      self.NATIVE_BINARY.decompress_tarball(self._tarfile.encode('utf-8'),
                                                   self._artifact_root.encode('utf-8'))
    except Exception as e:
      raise ArtifactError("Extracting artifact failed:\n{}".format(e))

This comment has been minimized.

@illicitonion

illicitonion Nov 19, 2018

Contributor

Yeah, in this case it doesn't actually make a difference, but when someone changes the rust to return something, it will be totally non-obvious to them that the python bridge code is just swallowing the value - I've made this mistake a bunch of times, and it's super frustrating :)

wisechengyi added some commits Nov 17, 2018

@wisechengyi

This comment has been minimized.

Contributor

wisechengyi commented Nov 18, 2018

Plan to land this Sunday evening before the release process on Monday.

@wisechengyi

This comment has been minimized.

Contributor

wisechengyi commented Nov 19, 2018

Merging. Feel free to leave a comment in case I missed anything.

@wisechengyi wisechengyi merged commit c83f6f1 into pantsbuild:master Nov 19, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@wisechengyi wisechengyi deleted the wisechengyi:nohang branch Nov 19, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment