Reduce hashing during v2 transitive graph walks #4109

Merged
merged 2 commits into from Dec 1, 2016

Conversation

Projects
None yet
4 participants
@stuhood
Member

stuhood commented Dec 1, 2016

Problem

Transitive graph walks in the v2 engine involve doing lots of deduping merges of collections of HydratedTarget objects. Pre-change, this was taking up about 70% of the total runtime of ./pants --enable-v2-engine list $target for a highly connected $target.

Solution

OrderedSet is significantly slower for individual dedupe operations (particularly when it is converted back into a tuple afterward), so we switch to deduping a generator using a throwaway set and collecting it into a tuple.

Additionally, because HydratedTarget objects are guaranteed to have an Address, we implement equality/hash checks using an Address lifted from the inner structs.

Result

The runtime of ./pants --enable-v2-engine list $target is improved by approximately 2x.

@kwlzn

kwlzn approved these changes Dec 1, 2016

lgtm!

@JieGhost

Looks good! Remind me of a similar change I made in fs.py. What I found there is same to your finding, ie, using set is much faster than OrderedSet.

@stuhood stuhood merged commit ce807a2 into pantsbuild:master Dec 1, 2016

1 check was pending

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details

lenucksi added a commit to lenucksi/pants that referenced this pull request Apr 25, 2017

Reduce hashing during v2 transitive graph walks (#4109)
### Problem

Transitive graph walks in the v2 engine involve doing lots of deduping merges of collections of `HydratedTarget` objects. Pre-change, this was taking up about 70% of the total runtime of `./pants --enable-v2-engine list $target` for a highly connected `$target`.

### Solution

OrderedSet is significantly slower for individual dedupe operations (particularly when it is converted back into a tuple afterward), so we switch to deduping a generator using a throwaway set and collecting it into a tuple.

Additionally, because `HydratedTarget` objects are guaranteed to have an Address, we implement equality/hash checks using an Address lifted from the inner structs.

### Result

The runtime of `./pants --enable-v2-engine list $target` is improved by approximately 2x.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment