New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor command line target spec resolution and check that all target roots exist #6480

Merged
merged 22 commits into from Sep 16, 2018

Conversation

Projects
None yet
4 participants
@cosmicexplorer
Copy link
Contributor

cosmicexplorer commented Sep 10, 2018

Problem

This should fail regardless of the order of the command-line target specs (the target named a does not exist):

> ./pants test tests/python/pants_test/option:a
# ...
ResolveError: "a" was not found in namespace "tests/python/pants_test/option". Did you mean one of:
          :options_integration
          :testing
> ./pants test tests/python/pants_test/option:{testing,a}
# ...
 ===== 158 passed, 1 xfailed in 2.77 seconds ======
                     
                   tests/python/pants_test/option:testing                                          .....   SUCCESS
23:25:48 00:12     [junit]
23:25:48 00:12     [go]
23:25:48 00:12     [node]
               Waiting for background workers to finish.
23:25:49 00:13   [complete]
               SUCCESS

Solution

  • Move AddressFamily and Address resolution logic into the Spec class and its subclasses instead of doing a massive if chain.
  • Introduce the _MappedSpecs datatype holding the target Specs, and the AddressFamilys covering the directories indicated by the Specs.
  • Move all of the address resolution logic into _MappedSpecs for clarity, and replace all if type(obj) is SingleAddress chains with abstract methods implemented in Spec subclasses.
  • Add test_fails_on_nonexistent_specs() to test the issue described in the Problem section.

@cosmicexplorer cosmicexplorer requested review from stuhood , jsirois , ity and CMLivingston Sep 10, 2018

@stuhood
Copy link
Member

stuhood left a comment

Thanks!


class AddressResolutionError(Exception): pass

def all_address_target_pairs(self, address_families):

This comment has been minimized.

@stuhood

stuhood Sep 10, 2018

Member

Rather than overriding concrete methods, it would be preferable to extract this as an optional helper as you did with address_families_for_dir, that some abstract method can optionally call.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Sep 11, 2018

Contributor

This is great! Done in c4d2edc!

addresses.append(a)
included.add(a)
return matched
yield MappedSpecs(address_families, specs)

This comment has been minimized.

@stuhood

stuhood Sep 10, 2018

Member

While splitting this rule into two makes testing easier, it's basically pure overhead at runtime.

The method is called ~once per run though, so not a big deal. But if it's possible to remove the testing boilerplate by just adding a test helper method instead, that would be preferable.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Sep 11, 2018

Contributor

The rules were merged back together in a694488, and a helper method was created in e47a710!

def __new__(cls, address_families, specs):
return super(MappedSpecs, cls).__new__(cls, tuple(address_families), specs)

@memoized_property

This comment has been minimized.

@stuhood

stuhood Sep 10, 2018

Member

I don't think there is really any advantage to memoizing these on the class, but the downside is that all of these memoized properties will survive on the object between runs, because the MappedSpecs object is itself memoized.

If you were to undo the @rule split, it would be a non-issue, because the MappedSpecs object would be a local-only construct. But as it stands, you're holding on to a lot of stuff that could/should be ephemeral.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Sep 11, 2018

Contributor

I'm not too attached to the @rule split (lol). I'll try merging them back, I don't think it's necessary for this to work.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Sep 11, 2018

Contributor

Btw, this was a very helpful explanation re: memoization. Thanks.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Sep 11, 2018

Contributor

The rules have been merged back together into one, maintaining the MappedSpecs object, but not having it anywhere inside the rule graph (and therefore memoized/etc).

This comment has been minimized.

@stuhood

stuhood Sep 11, 2018

Member

Thanks, that's great. If you need to run it through CI again, could rename it to _MappedSpecs to highlight that it's not public.

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:fix-command-line-target-spec-validation branch from c867eff to e47a710 Sep 11, 2018

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

cosmicexplorer commented Sep 11, 2018

There's one remaining non-spurious CI failure, which I will address tomorrow.

@stuhood
Copy link
Member

stuhood left a comment

Thanks, looks good! Just nits.

def __new__(cls, address_families, specs):
return super(MappedSpecs, cls).__new__(cls, tuple(address_families), specs)

@memoized_property

This comment has been minimized.

@stuhood

stuhood Sep 11, 2018

Member

Thanks, that's great. If you need to run it through CI again, could rename it to _MappedSpecs to highlight that it's not public.


def setUp(self):
self._default_address_mapper = AddressMapper(JsonParser(TestTable()))
self._default_snapshot = Snapshot(DirectoryDigest('xx', 2),

This comment has been minimized.

@stuhood

stuhood Sep 11, 2018

Member

Rather than making these properties, prefer methods: that makes them easier to template later: ie self._snapshot() vs self._snapshot(with_more_stuff=True).

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Sep 12, 2018

Contributor

Done in 5801d95! Let me know if these need any explanatory comments or docstrings.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Sep 12, 2018

Contributor

Github isn't letting me reply to your other comment, but I also made _MappedSpecs private in 11bd1c6.

cosmicexplorer added some commits Sep 10, 2018

Revert "rename addresses_from_address_families -> addresses_from_mapp…
…ed_specs"

This reverts commit 81db14f7f38bb83ebfcb2f0f48e2a9f03dab2791.

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:fix-command-line-target-spec-validation branch from ea343af to 5801d95 Sep 12, 2018

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

cosmicexplorer commented Sep 12, 2018

The remaining failure was because we were dropping all the matching addresses into a frozenset and of course totally screwing up the order (which matters for e.g. calculating paths). Changing that to an OrderedSet made the test pass. This also shows that the paths integration testing does indeed cover more than the unit tests do, so I expanded the TODO in that file to point to this PR.

cosmicexplorer added some commits Sep 12, 2018

@Eric-Arellano
Copy link
Contributor

Eric-Arellano left a comment

Very cool! Thanks for this!

Show resolved Hide resolved src/python/pants/base/specs.py Outdated
Show resolved Hide resolved src/python/pants/base/specs.py Outdated
@@ -201,108 +202,99 @@ def _hydrate(item_type, spec_path, **kwargs):
return item


class _MappedSpecs(datatype([
('address_families', tuple),

This comment has been minimized.

@Eric-Arellano

Eric-Arellano Sep 12, 2018

Contributor

We don’t have a way to say a tuple of what, do we?

With type hints it’s Tuple[Foo, Foo, Bar]

At a minimum, would be helpful to add a comment for what’s in the tuple.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Sep 12, 2018

Contributor

Not yet about a tuple of what -- I would love to modify Collection in objects.py to make these more clear, and to type check the elements of the collection.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Sep 12, 2018

Contributor

(for now I'll add a comment)

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Sep 13, 2018

Contributor

Added a comment in e991a34 -- you'll note it's List[AddressFamily], the tuple part is an implementation detail -- lists aren't hashable, and this class used to be the output of an @rule, so it was hashed.

Now that I think of it, I should just change the type to list.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Sep 13, 2018

Contributor

Ok, I just tried that, and it turns out that @memoized_property is what requires inputs to be hashable, so I'm fine with leaving it as tuple, but that's an implementation detail that could maybe be fixed at some point. Because tuple is used elsewhere in the codebase for the same reason, I'm fine with this for now.

return [re.compile(pattern) for pattern in set(self.specs.exclude_patterns or [])]

def _excluded_by_pattern(self, address):
return any(p.search(address.spec) is not None for p in self._exclude_compiled_regexps)

This comment has been minimized.

@Eric-Arellano

Eric-Arellano Sep 12, 2018

Contributor

Nice! Very Pythonic 🐍

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Sep 12, 2018

Contributor

This was just stolen from elsewhere in the file so it is indeed pythonic but not quite my idea.

try:
addr_families_for_spec = spec.matching_address_families(self._address_family_by_directory)
except Spec.AddressFamilyResolutionError as e:
raise ResolveError(e)

This comment has been minimized.

@Eric-Arellano

Eric-Arellano Sep 12, 2018

Contributor

iirc, this will lose a lot of the stack trace from the original exception. We get more descriptive debugging if we use raise from syntax.

Using the Future library,

from future.utils import raise_from
raise_from(ResolveError(e), e)

Although I see after writing this that you’re already passing the original exception. So maybe this isn’t necessary?

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Sep 12, 2018

Contributor

I think raise_from looks like it makes sense here and will use it.

This comment has been minimized.

@cosmicexplorer
@jsirois

This comment has been minimized.

Copy link
Member

jsirois commented Sep 12, 2018

Plenty of feedback / reviewers on this one so I'll bow out.

@jsirois jsirois removed their request for review Sep 12, 2018

cosmicexplorer added some commits Sep 13, 2018

@cosmicexplorer cosmicexplorer merged commit d6a9490 into pantsbuild:master Sep 16, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

stuhood added a commit that referenced this pull request Sep 25, 2018

Remove usage of @memoized_property on MappedSpecs. (#6551)
### Problem

Due to some unexpected behaviour of `@memoized_property`, #6480 made spec parsing accidentally quadratic. See #6550 for a general overview of the problem.

### Solution

Because `@memoized_property` was still a good strategy for the purposes of compiled regexes and compiled tag matchers, this extracts the inputs to those properties to a datatype `SpecsMatcher`, which will have a much smaller instance. It then removes `_MappedSpecs` by inlining its two remaining methods into `addresses_from_specs` to avoid other unnecessary memoization.

### Result

For:
```
time ./pants --changed-diffspec='59d1632aa7bd4d7713d5ac18ae401adc7631e070^..59d1632' --changed-include-dependees=transitive list | wc -l
```

```
# before #6480
real	0m7.574s
user	0m8.323s
sys	0m2.848s
 
# after #6480
real	0m17.996s
user	0m18.529s
sys	0m3.000s
 
# after this patch
real	0m7.603s
user	0m8.233s
sys	0m2.786s
```

#6550 discusses auditing other usages of `@memoized_property`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment