Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor command line target spec resolution and check that all target roots exist #6480

Conversation

cosmicexplorer
Copy link
Contributor

@cosmicexplorer cosmicexplorer commented Sep 10, 2018

Problem

This should fail regardless of the order of the command-line target specs (the target named a does not exist):

> ./pants test tests/python/pants_test/option:a
# ...
ResolveError: "a" was not found in namespace "tests/python/pants_test/option". Did you mean one of:
          :options_integration
          :testing
> ./pants test tests/python/pants_test/option:{testing,a}
# ...
 ===== 158 passed, 1 xfailed in 2.77 seconds ======
                     
                   tests/python/pants_test/option:testing                                          .....   SUCCESS
23:25:48 00:12     [junit]
23:25:48 00:12     [go]
23:25:48 00:12     [node]
               Waiting for background workers to finish.
23:25:49 00:13   [complete]
               SUCCESS

Solution

  • Move AddressFamily and Address resolution logic into the Spec class and its subclasses instead of doing a massive if chain.
  • Introduce the _MappedSpecs datatype holding the target Specs, and the AddressFamilys covering the directories indicated by the Specs.
  • Move all of the address resolution logic into _MappedSpecs for clarity, and replace all if type(obj) is SingleAddress chains with abstract methods implemented in Spec subclasses.
  • Add test_fails_on_nonexistent_specs() to test the issue described in the Problem section.

Copy link
Sponsor Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!


class AddressResolutionError(Exception): pass

def all_address_target_pairs(self, address_families):
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than overriding concrete methods, it would be preferable to extract this as an optional helper as you did with address_families_for_dir, that some abstract method can optionally call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Done in c4d2edc!

addresses.append(a)
included.add(a)
return matched
yield MappedSpecs(address_families, specs)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While splitting this rule into two makes testing easier, it's basically pure overhead at runtime.

The method is called ~once per run though, so not a big deal. But if it's possible to remove the testing boilerplate by just adding a test helper method instead, that would be preferable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rules were merged back together in a694488, and a helper method was created in e47a710!

def __new__(cls, address_families, specs):
return super(MappedSpecs, cls).__new__(cls, tuple(address_families), specs)

@memoized_property
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is really any advantage to memoizing these on the class, but the downside is that all of these memoized properties will survive on the object between runs, because the MappedSpecs object is itself memoized.

If you were to undo the @rule split, it would be a non-issue, because the MappedSpecs object would be a local-only construct. But as it stands, you're holding on to a lot of stuff that could/should be ephemeral.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too attached to the @rule split (lol). I'll try merging them back, I don't think it's necessary for this to work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, this was a very helpful explanation re: memoization. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rules have been merged back together into one, maintaining the MappedSpecs object, but not having it anywhere inside the rule graph (and therefore memoized/etc).

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's great. If you need to run it through CI again, could rename it to _MappedSpecs to highlight that it's not public.

@cosmicexplorer cosmicexplorer force-pushed the fix-command-line-target-spec-validation branch from c867eff to e47a710 Compare September 11, 2018 08:07
@cosmicexplorer
Copy link
Contributor Author

There's one remaining non-spurious CI failure, which I will address tomorrow.

Copy link
Sponsor Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good! Just nits.

def __new__(cls, address_families, specs):
return super(MappedSpecs, cls).__new__(cls, tuple(address_families), specs)

@memoized_property
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's great. If you need to run it through CI again, could rename it to _MappedSpecs to highlight that it's not public.


def setUp(self):
self._default_address_mapper = AddressMapper(JsonParser(TestTable()))
self._default_snapshot = Snapshot(DirectoryDigest('xx', 2),
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than making these properties, prefer methods: that makes them easier to template later: ie self._snapshot() vs self._snapshot(with_more_stuff=True).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 5801d95! Let me know if these need any explanatory comments or docstrings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Github isn't letting me reply to your other comment, but I also made _MappedSpecs private in 11bd1c6.

@cosmicexplorer cosmicexplorer force-pushed the fix-command-line-target-spec-validation branch from ea343af to 5801d95 Compare September 12, 2018 02:34
@cosmicexplorer
Copy link
Contributor Author

The remaining failure was because we were dropping all the matching addresses into a frozenset and of course totally screwing up the order (which matters for e.g. calculating paths). Changing that to an OrderedSet made the test pass. This also shows that the paths integration testing does indeed cover more than the unit tests do, so I expanded the TODO in that file to point to this PR.

Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! Thanks for this!

src/python/pants/base/specs.py Outdated Show resolved Hide resolved
src/python/pants/base/specs.py Outdated Show resolved Hide resolved
@@ -201,108 +202,99 @@ def _hydrate(item_type, spec_path, **kwargs):
return item


class _MappedSpecs(datatype([
('address_families', tuple),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don’t have a way to say a tuple of what, do we?

With type hints it’s Tuple[Foo, Foo, Bar]

At a minimum, would be helpful to add a comment for what’s in the tuple.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet about a tuple of what -- I would love to modify Collection in objects.py to make these more clear, and to type check the elements of the collection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(for now I'll add a comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment in e991a34 -- you'll note it's List[AddressFamily], the tuple part is an implementation detail -- lists aren't hashable, and this class used to be the output of an @rule, so it was hashed.

Now that I think of it, I should just change the type to list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I just tried that, and it turns out that @memoized_property is what requires inputs to be hashable, so I'm fine with leaving it as tuple, but that's an implementation detail that could maybe be fixed at some point. Because tuple is used elsewhere in the codebase for the same reason, I'm fine with this for now.

return [re.compile(pattern) for pattern in set(self.specs.exclude_patterns or [])]

def _excluded_by_pattern(self, address):
return any(p.search(address.spec) is not None for p in self._exclude_compiled_regexps)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Very Pythonic 🐍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just stolen from elsewhere in the file so it is indeed pythonic but not quite my idea.

try:
addr_families_for_spec = spec.matching_address_families(self._address_family_by_directory)
except Spec.AddressFamilyResolutionError as e:
raise ResolveError(e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc, this will lose a lot of the stack trace from the original exception. We get more descriptive debugging if we use raise from syntax.

Using the Future library,

from future.utils import raise_from
raise_from(ResolveError(e), e)

Although I see after writing this that you’re already passing the original exception. So maybe this isn’t necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think raise_from looks like it makes sense here and will use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in a146c99!

@jsirois
Copy link
Member

jsirois commented Sep 12, 2018

Plenty of feedback / reviewers on this one so I'll bow out.

@jsirois jsirois removed their request for review September 12, 2018 16:53
@cosmicexplorer cosmicexplorer merged commit d6a9490 into pantsbuild:master Sep 16, 2018
stuhood pushed a commit that referenced this pull request Sep 25, 2018
### Problem

Due to some unexpected behaviour of `@memoized_property`, #6480 made spec parsing accidentally quadratic. See #6550 for a general overview of the problem.

### Solution

Because `@memoized_property` was still a good strategy for the purposes of compiled regexes and compiled tag matchers, this extracts the inputs to those properties to a datatype `SpecsMatcher`, which will have a much smaller instance. It then removes `_MappedSpecs` by inlining its two remaining methods into `addresses_from_specs` to avoid other unnecessary memoization.

### Result

For:
```
time ./pants --changed-diffspec='59d1632aa7bd4d7713d5ac18ae401adc7631e070^..59d1632' --changed-include-dependees=transitive list | wc -l
```

```
# before #6480
real	0m7.574s
user	0m8.323s
sys	0m2.848s
 
# after #6480
real	0m17.996s
user	0m18.529s
sys	0m3.000s
 
# after this patch
real	0m7.603s
user	0m8.233s
sys	0m2.786s
```

#6550 discusses auditing other usages of `@memoized_property`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants