Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix some dependencies-like fields not showing up with project introspection #10923

Merged
merged 2 commits into from Oct 9, 2020

Conversation

Eric-Arellano
Copy link
Contributor

Problem

Closes #10888.

We now have several fields that act like dependencies, but are not quite it. In most code, we don't want the dependencies showing up with TransitiveTargets and DependenciesRequest; for example, with relocated_files(), it's important that the original files targets are not included when we're hydrating sources for the transitive closure.

But with project introspection, we do need these dependencies-like fields to be used. This is important for --changed-since --changed-dependees to work properly.

Solution

Add a new SpecialCasedDependencies superclass. This is not a normal field, e.g. not setting alias. It's closer in spirit to a "Field template", e.g. StringField and BoolField.

Both DependenciesRequest and TransitiveTargetsRequest get new boolean flags to determine if they should include any subclasses of SpecialCasedDependencies or not. Then, we teach the project introspection goals to use the new behavior.

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
[ci skip-rust]
[ci skip-build-wheels]
@coveralls
Copy link

Coverage Status

Coverage remained the same at 0.0% when pulling f8fb91d on Eric-Arellano:special-case-deps into ff02ef0 on pantsbuild:master.

Copy link
Contributor

@cosmicexplorer cosmicexplorer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, but I have concerns about the term "special cased", which I try to describe in comments -- I think "indirect" might be more appropriate.

More importantly (this part is blocking, unless I'm missing something), since it seems that most users will not want to retrieve these dependencies, it seems unnecessary to add this complexity into the normal DependenciesRequest class (especially considering whether we might want to add more types of "special" dependencies later). Rather, since the use cases for these two different types of dependencies requests do not seem to overlap, I would think we'd want to consider breaking it out into a separate dataclass.

Let me know what you think about those two ideas.

# `files` and `packages` fields, then we possibly include those too. We don't want to always
# include those dependencies because they should often be excluded from the result due to
# being handled elsewhere in the calling code.
special_cased: Tuple[Address, ...] = ()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this would be incredibly difficult to move to a separate rule for DependenciesWithIndirectDeps, which would append the special_cased to the result as is done here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we added a new rule, we would probably want its result type to be Addresses to avoid having to invent more wrapper dataclasse. But, that will result in graph ambiguity to use await Get(Addresses, Foo) in this rule, unless we added the new dataclass.

I don't think there's strong enough of motivation yet to factor this out and pay the cost of Yet Another Dataclass and rule. Note that we don't really expect call sites to resolve all their special deps in one single call, without their normal deps included. Instead, we expect them to resolve each distinct field separately, because each field has its own magical handling. archive is a good example of this:

package_targets, files_targets = await MultiGet(
Get(
Targets,
UnparsedAddressInputs(field_set.packages.value or (), owning_address=field_set.address),
),
Get(
Targets,
UnparsedAddressInputs(field_set.files.value or (), owning_address=field_set.address),
),
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because each field has its own magical handling

I'm not sure I understand. If this is the case, why aren't we separating the calls for Dependencies and SpecialCasedDependencies then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because there are very select places that we want all dependencies, both normal and special cased. Specifically, project introspection needs all dependencies.

It is simpler in those places to have one call await Get(Addresses, DependenciesRequest) than a call for normal deps + call for all the special cased. In particular, this is apparent with TransitiveTargets. With this current design, we don't need to implement a separate transitive targets rule that duplicates the normal rule. It sounds like you tried implementing new distinct dataclasses yesterday and found this to be the case?

My above comment was trying to explain that in almost all other contexts, you do not want all dependencies, both normal and magical, to be in the same result. You want to separate them out. I was pointing to how that it possible with that design.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like you tried implementing new distinct dataclasses yesterday and found this to be the case?

It would have been extremely trivial if #8542 had been merged, which would have allowed providing a @union base as an @rule argument. We could have then made a TransitiveTargetsRequestBase union, and it would have generated a version of that rule for each type. I have asked you and others to take a look at that PR many times to no avail. Would you consider looking at it again if I updated it?

Copy link
Sponsor Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look at #8542 and comment there

src/python/pants/engine/internals/graph.py Show resolved Hide resolved
if request.include_special_cased_deps:
wrapped_tgt = await Get(WrappedTarget, Address, request.field.address)
# Unlike normal, we don't use `tgt.get()` because there may be >1 subclass of
# SpecialCasedDependencies.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we link to #10888 here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, could we add a TODO pointing to #10924?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's any TODO we need to do here, tbh. This is idiomatic Python and idiomatic usage of the Target API. We get the functionality we need in 4 lines of code, without needing to invent any new abstractions.

The comment is solely to draw attention to why we don't use tgt.get() like normal.

Is there something you think we could improve with these 4 lines?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's just confusing to me why we would iterate over the target's fields again here instead of making that part of the original DependenciesRequest. I think it was a good decision to make sure each Field has an Address, but it feels like that's useful for when we discover new things we need to introspect about the original target. I'm not sure it's really a problem, but it feels like that might be part of why the graph cycle is happening here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really following the argument here. At this point, we had only discovered that there is a Dependencies field (or subclass). We need to check if there are any other special cased dependencies fields that are in play, and the way to do that is to iterate over all the fields to see if they subclass this marker.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, we had only discovered that there is a Dependencies field (or subclass).

What if we included the special-cased dependencies fields in the original DependenciesRequest?

@@ -1667,6 +1655,27 @@ def __iter__(self) -> Iterator[Address]:
return iter(self.dependencies)


class SpecialCasedDependencies(AsyncStringSequenceField):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would definitely prefer something like IndirectDependencies for this class. I don't feel like SpecialCased is very clear -- it also describes what we do with the field, as opposed to what it is, which I think is not what we want to be doing (please feel free to correct me if I'm wrong).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel like SpecialCased is very clear

I argue this is a feature, not a bug. There is very little unifying about this parent class, outside of a) these are like the normal dependencies field, and b) you are going to do something magical with the values, such that you didn't think it was appropriate to use the simpler, normal Dependencies field.

We don't know what the magic might be. It could be running the spiritual equivalent of ./pants package, or it could be that you want to express this solely works with files targets, etc.

My concern with IndirectDependencies is that it might be trying to generalize more than we can do in this situation. These are direct dependencies—rather than indirect—only that they have been split out from the normal Dependencies field for some reason the plugin author decided to do.

Wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't know what the magic might be. It could be running the spiritual equivalent of ./pants package, or it could be that you want to express this solely works with files targets, etc.

I said above that I would expect it to be better to describe what it is rather than what we do with it in the name, since as you describe here there are many different things we might want to do with it. I'm concerned that naming things based on how they're used leads to God Classes. I believe every other type of class in target.py follows the "what it is" naming scheme.

These are direct dependencies—rather than indirect

Ok, I think I was mostly trying to go off of this description from the OP:

or example, with relocated_files(), it's important that the original files targets are not included when we're hydrating sources for the transitive closure.

Could we consider a name that more accurately represents this distinction? The reason I thought about IndirectDependenciesRequest was because it chose to include the original files() target as well. Perhaps IndirectDependencies isn't appropriate for the field name itself -- maybe WrappedDependencies?

@@ -612,9 +612,13 @@ class TransitiveTargetsRequest:
"""

roots: Tuple[Address, ...]
include_special_cased_deps: bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't especially like how we have to have this include_special_cased_deps field wherever we want to propagate the behavior. It feels harder to search for and remove later than introducing a new dataclass which performs the desired behavior via @rules.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, forget the above -- I tried to implement a new dataclass and it was too hard.

Could we make this an Enum instance instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would the Enum be? There are only two variants: include or exclude.

While I generally like Enums for modeling (especially over strings), they add new complexity. Unless we believe there are going to be additional states beyond this binary in the future, an enum seems like unnecessary complexity imo, such as adding yet another class you need to import from target.py, which is already gargantuan.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would the Enum be? There are only two variants: include or exclude.

You said in this comment #10923 (comment):

There is very little unifying about this parent class, outside of a) these are like the normal dependencies field, and b) you are going to do something magical with the values

That seems to me to imply we are going to want to do many other things with this field in the future. An enum allows for that. boolean fields are often a code smell: see https://softwareengineering.stackexchange.com/a/148058.

such as adding yet another class you need to import from target.py, which is already gargantuan.

I think this is necessary complexity, because each element is individually cacheable. When I responded to a user in the slack yesterday who mentioned that it was a little off-putting to import so many things by mentioning how the point is to make a 100% cacheable and performant build tool, they seemed to accept that very easily.

I think if this is a concern, we could consider methods to bring elements into scope automatically, perhaps by linking classes to other classes. #10924 generalizes the inject_for/infer_from pattern with a ConvertibleFrom mixin, which makes a lot of the small machinery fall away, for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boolean fields are often a code smell:

Sure, but premature abstraction is also a code smell.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think that giving each choice of an option a name is "premature abstraction" and I'm confused as to why you've decided this is the case given that it was decided a whole separate issue needed to be created to figure out how to handle this problem of the kind of dependencies to retrieve. It's clearly not a trivial distinction. I don't care enough to continue this discussion.

src/python/pants/engine/target.py Show resolved Hide resolved
Copy link
Contributor

@cosmicexplorer cosmicexplorer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I tried to implement the solution I suggested with the separate dataclass, and it was too hard. I would like to make the include_special_cased_deps field into an Enum if possible, but otherwise this looks great!

@benjyw
Copy link
Sponsor Contributor

benjyw commented Oct 9, 2020

Thanks for the deep dive Danny, and for acking this. I think simple is better than complex here (and in most cases...)

@Eric-Arellano Eric-Arellano merged commit 7607e5f into pantsbuild:master Oct 9, 2020
@Eric-Arellano Eric-Arellano deleted the special-case-deps branch October 9, 2020 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Teach project introspection goals about special-cased dependencies fields
4 participants