New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor command line target spec resolution and check that all target roots exist #6480
refactor command line target spec resolution and check that all target roots exist #6480
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
src/python/pants/base/specs.py
Outdated
|
||
class AddressResolutionError(Exception): pass | ||
|
||
def all_address_target_pairs(self, address_families): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than overriding concrete methods, it would be preferable to extract this as an optional helper as you did with address_families_for_dir
, that some abstract method can optionally call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! Done in c4d2edc!
addresses.append(a) | ||
included.add(a) | ||
return matched | ||
yield MappedSpecs(address_families, specs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While splitting this rule into two makes testing easier, it's basically pure overhead at runtime.
The method is called ~once per run though, so not a big deal. But if it's possible to remove the testing boilerplate by just adding a test helper method instead, that would be preferable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def __new__(cls, address_families, specs): | ||
return super(MappedSpecs, cls).__new__(cls, tuple(address_families), specs) | ||
|
||
@memoized_property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is really any advantage to memoizing these on the class, but the downside is that all of these memoized properties will survive on the object between runs, because the MappedSpecs
object is itself memoized.
If you were to undo the @rule split, it would be a non-issue, because the MappedSpecs
object would be a local-only construct. But as it stands, you're holding on to a lot of stuff that could/should be ephemeral.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too attached to the @rule
split (lol). I'll try merging them back, I don't think it's necessary for this to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, this was a very helpful explanation re: memoization. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rules have been merged back together into one, maintaining the MappedSpecs
object, but not having it anywhere inside the rule graph (and therefore memoized/etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that's great. If you need to run it through CI again, could rename it to _MappedSpecs
to highlight that it's not public.
c867eff
to
e47a710
Compare
There's one remaining non-spurious CI failure, which I will address tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good! Just nits.
def __new__(cls, address_families, specs): | ||
return super(MappedSpecs, cls).__new__(cls, tuple(address_families), specs) | ||
|
||
@memoized_property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that's great. If you need to run it through CI again, could rename it to _MappedSpecs
to highlight that it's not public.
|
||
def setUp(self): | ||
self._default_address_mapper = AddressMapper(JsonParser(TestTable())) | ||
self._default_snapshot = Snapshot(DirectoryDigest('xx', 2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than making these properties, prefer methods: that makes them easier to template later: ie self._snapshot()
vs self._snapshot(with_more_stuff=True)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 5801d95! Let me know if these need any explanatory comments or docstrings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Github isn't letting me reply to your other comment, but I also made _MappedSpecs
private in 11bd1c6.
…ed_specs" This reverts commit 81db14f7f38bb83ebfcb2f0f48e2a9f03dab2791.
…hods added to Spec
ea343af
to
5801d95
Compare
The remaining failure was because we were dropping all the matching addresses into a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool! Thanks for this!
@@ -201,108 +202,99 @@ def _hydrate(item_type, spec_path, **kwargs): | |||
return item | |||
|
|||
|
|||
class _MappedSpecs(datatype([ | |||
('address_families', tuple), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don’t have a way to say a tuple of what, do we?
With type hints it’s Tuple[Foo, Foo, Bar]
At a minimum, would be helpful to add a comment for what’s in the tuple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet about a tuple of what -- I would love to modify Collection
in objects.py
to make these more clear, and to type check the elements of the collection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(for now I'll add a comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment in e991a34 -- you'll note it's List[AddressFamily]
, the tuple
part is an implementation detail -- lists aren't hashable, and this class used to be the output of an @rule
, so it was hashed.
Now that I think of it, I should just change the type to list
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I just tried that, and it turns out that @memoized_property
is what requires inputs to be hashable, so I'm fine with leaving it as tuple
, but that's an implementation detail that could maybe be fixed at some point. Because tuple
is used elsewhere in the codebase for the same reason, I'm fine with this for now.
return [re.compile(pattern) for pattern in set(self.specs.exclude_patterns or [])] | ||
|
||
def _excluded_by_pattern(self, address): | ||
return any(p.search(address.spec) is not None for p in self._exclude_compiled_regexps) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Very Pythonic 🐍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just stolen from elsewhere in the file so it is indeed pythonic but not quite my idea.
try: | ||
addr_families_for_spec = spec.matching_address_families(self._address_family_by_directory) | ||
except Spec.AddressFamilyResolutionError as e: | ||
raise ResolveError(e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iirc, this will lose a lot of the stack trace from the original exception. We get more descriptive debugging if we use raise from syntax.
Using the Future library,
from future.utils import raise_from
raise_from(ResolveError(e), e)
Although I see after writing this that you’re already passing the original exception. So maybe this isn’t necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think raise_from
looks like it makes sense here and will use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in a146c99!
Plenty of feedback / reviewers on this one so I'll bow out. |
### Problem Due to some unexpected behaviour of `@memoized_property`, #6480 made spec parsing accidentally quadratic. See #6550 for a general overview of the problem. ### Solution Because `@memoized_property` was still a good strategy for the purposes of compiled regexes and compiled tag matchers, this extracts the inputs to those properties to a datatype `SpecsMatcher`, which will have a much smaller instance. It then removes `_MappedSpecs` by inlining its two remaining methods into `addresses_from_specs` to avoid other unnecessary memoization. ### Result For: ``` time ./pants --changed-diffspec='59d1632aa7bd4d7713d5ac18ae401adc7631e070^..59d1632' --changed-include-dependees=transitive list | wc -l ``` ``` # before #6480 real 0m7.574s user 0m8.323s sys 0m2.848s # after #6480 real 0m17.996s user 0m18.529s sys 0m3.000s # after this patch real 0m7.603s user 0m8.233s sys 0m2.786s ``` #6550 discusses auditing other usages of `@memoized_property`.
Problem
This should fail regardless of the order of the command-line target specs (the target named
a
does not exist):Solution
AddressFamily
andAddress
resolution logic into theSpec
class and its subclasses instead of doing a massiveif
chain._MappedSpecs
datatype holding the targetSpecs
, and theAddressFamily
s covering the directories indicated by theSpecs
._MappedSpecs
for clarity, and replace allif type(obj) is SingleAddress
chains with abstract methods implemented inSpec
subclasses.test_fails_on_nonexistent_specs()
to test the issue described in the Problem section.