New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Noop resolution performance #6577

Merged
merged 10 commits into from Oct 3, 2018

Conversation

Projects
None yet
3 participants
@blorente
Copy link
Contributor

blorente commented Oct 1, 2018

Problem

In preparation for remote execution, #6544 introduced many calls to capture_snapshots, which added a significant overhead to coursier fetching.

Solution

A partial solution is to unify all the snapshot capturing into a single call to capture_snapshots.

Result

  • Both coursier and ivy resolvers now make a single call to capture_snapshots, which reduced the overhead somewhat, though not completely. The synchronous call to capture_snapshots adds a 4-5 seconds of overhead to fetching from coursier on a relatively big project. A solution is discussed in a separate comment, below.

EDIT This is not true anymore, it was a faulty measurement on my part:

These are the noop measured times in my laptop. They all clean all and reset the daemon after every run:

Measured execution time: 73s for commit 46e9dd901bb768a3256966762f7f1b82063a09d9 (this pr)
Measured execution time: 96s for commit 6268a77962ff8227360e25d2c50831f0ac773d94 (master with the regression)
Measured execution time: 38s for commit f60c40a929a70bfd9e8aa3f9b4b954c4bd06883e (master straight before the regression) 
  • Separate test cases to cover ivy and coursier resolution (coursier wasn't covered before).
  • The test cases now include 2 3rdparty dependencies, to cover for a logic bug introduced while developing this.

Acknowledgements

Credit to @dotordogh for pairing on the issue and coming up with the final fix.

@@ -373,7 +373,7 @@ def main(args: Array[String]) {
path = os.path.join(compile_dir, path_suffix)
self.assertTrue(os.path.exists(path), "Want path {} to exist".format(path))

def test_hermetic_binary_with_3rdparty_dependencies(self):
def test_hermetic_binary_with_3rdparty_dependencies_ivy(self):

This comment has been minimized.

@dotordogh

dotordogh Oct 2, 2018

Contributor

In case the default becomes coursier, could you please explicitly set the resolver to ivy in the config?

@blorente

This comment has been minimized.

Copy link
Contributor

blorente commented Oct 2, 2018

@dotordogh Addressed comments

@blorente

This comment has been minimized.

Copy link
Contributor

blorente commented Oct 2, 2018

A temporary solution to the fixed overhead is to hide the computations behind the --execution-strategy=hermetic flag, so that they only compute when we care about them.

@illicitonion
Copy link
Contributor

illicitonion left a comment

Looks great, thanks! I just have one or two super minor cleanups


jar_paths = []
for target, jars_to_snapshot in targets_and_jars:
[jar_paths.append(fast_relpath(jar.pants_path, get_buildroot())) for jar in jars_to_snapshot]

This comment has been minimized.

@illicitonion

illicitonion Oct 2, 2018

Contributor

It's a little odd to construct this list just to throw it away... Should this just be a for loop?

This comment has been minimized.

@blorente

blorente Oct 2, 2018

Contributor

That's a big oopsie on my part.

This comment has been minimized.

@blorente

blorente Oct 2, 2018

Contributor

I'll solve it inline but I cannot test it locally, I'm on the run.

# We assume that (1) jars_to_snapshot has the same number of ResolveJars as snapshots does Snapshots,
# and that (2) capture_snapshots preserves ordering.
digests = [snapshot.directory_digest for snapshot in snapshots]
digest_iterator = iter(digests)

This comment has been minimized.

@illicitonion

illicitonion Oct 2, 2018

Contributor

It may be slightly cleaner to use:
for (target, jars_to_snapshot), digest in zip(targets_and_jars, digests):

rather than to extract an explicit iter and call next on it, but probably not sufficiently much to actually change it :)

This comment has been minimized.

@blorente

blorente Oct 2, 2018

Contributor

I am not sure of the behavior of that, does it create one ((target, [jars_to_snapshot]), digest) tuple for each digest? If that is the case, it wouldn't help, because we need to map digests to jars, not to (target, [jars]) pairs.

If zip has another behavior, then it could be great! I can look into it tomorrow.

@blorente

This comment has been minimized.

Copy link
Contributor

blorente commented Oct 2, 2018

That temporary solution can only be applied to coursier for now.

blorente added some commits Oct 2, 2018

@blorente

This comment has been minimized.

Copy link
Contributor

blorente commented Oct 3, 2018

I modified the class hierarchy a bit, which is probably a discussion worth having.
In order to include the flag only for coursier_resolve and ivy_task_mixin, I think that including it in JvmResolverBase makes the most sense. For that, JvmResolverBase had to extend either NailgunTask or TaskBase. Since IvyTaskMixin does not extend NailgunTask, it would be a bit fiddly to choose that. Therefore, I thought it best to make JvmResolverBase extend TaskBase, and make the 2 client classes extend JvmResolverBase instead of TaskBase.

@illicitonion illicitonion merged commit d86c3cc into pantsbuild:master Oct 3, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@blorente blorente deleted the blorente:blorente/fix-resolution-performance branch Oct 29, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment