Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow @goal_rules to request Targets and demo with ./v2 binary run #9345

Merged
merged 8 commits into from Mar 20, 2020

Conversation

Eric-Arellano
Copy link
Contributor

@Eric-Arellano Eric-Arellano commented Mar 19, 2020

This adds rules to allow rules to request TargetWithOrigin, Targets, and TargetsWithOrigins, then hooks it up to ./v2 binary and ./v2 run.

Result: ./v2 binary run works with custom target types

So long as those target types have the field PythonBinarySources and EntryPoint registered, we can now run ./v2 binary and ./v2 run on the targets.

Result: better error messages for invalid target types:

Before:

▶ ./v2 --no-print-exception-stacktrace run src/python/pants/util:strutil

ERROR: Type PythonTargetAdaptor is not a member of the BinaryTarget @union

After:

▶ ./v2 --no-print-exception-stacktrace run src/python/pants/util:strutil

ERROR: The run and binary goals only work with the following target types: ['python_binary']

You used src/python/pants/util:strutil with target type 'python_library'.

This improvement is made possible because of the Target API for two reasons:

  1. Field-driven API. Rules now only say what fields they need and ignore all others.
  2. We can access all registered target types by requesting the singleton RegisteredTargetTypes.

These two reasons allow us to iterate over every registered BinaryImplementation to determine which target types they work with, i.e. which target types have their required fields. We then flatten this into a single list of valid targets.

This error message works with custom target types, too!

▶ ./v2 --no-print-exception-stacktrace run src/python/pants/util:strutil

ERROR: The run and binary goals only work with the following target types: ['custom_python_binary', 'python_binary']

You used src/python/pants/util:strutil with target type 'python_library'.

Implementation: filtering targets in @goal_rules

An @goal_rule will typically request SourcesSnapshots, Addresses, AddressesWithOrigins, Targets, or TargetsWithOrigins. It hasn't yet filtered out which targets it actually cares about.

The downstream rules, meanwhile will already express which exact fields they need to operate through their rule signature, e.g. python_create_binary.py saying it needs PythonBinaryImplementation (previously PythonBinaryFields). The @goal_rule must filter out the relevant targets then pass those fields down to the rule.

--

Here, we use the same pattern used in fmt.py, lint.py, and test.py to do that filtering.

It no longer makes sense to have a union of target types, like UnionRule(BinaryTarget, PythonBinary). We no longer care about target types, but rather only the required fields for the rule to operate.

Instead, we use UnionRule(BinaryImplementation, PythonBinaryImplementation), where PythonBinaryImplementation is a subclass of BinaryImplementation that has the methods is_valid_target() and create(). For any given target, this allows us to check "are there any registered binary implementations that work with this target?" If so, we now have a standardized way of extracting the relevant Fields from that target by calling implementation.create(target) and using a union rule to call the appropriate downstream rule.

This is the equivalent of test.py having UnionRule(TestRunner, PytestRunner), lint.py having UnionRule(Linter, BanditLinter), and fmt.py having UnionRule(Formatter, IsortFormatter).

Comment on lines -21 to -24
# TODO: consider replacing this with sugar like `SelectFields(EntryPoint, PythonBinarySources)` so
# that the rule would request that instead of this dataclass. Note that this syntax must support
# both optional_fields (see the below TODO) and opt-out `SentinelField`s
# (see https://github.com/pantsbuild/pants/pull/9316#issuecomment-600152573).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO is superseded by a new TODO in binary.py. The abstraction has now moved one layer up.

@dataclass(frozen=True)
class PythonBinaryFields:
class PythonBinaryImplementation(BinaryImplementation):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep reading the PR, especially binary.py, for why I found BinaryImplementation to be much more natural than BinaryFields.

As mentioned in the PR description, another reason I think BinaryImplementation is better than BinaryFields is that it fits better with test.py, fmt.py, and lint.py, which use TestRunner, Formatter, and Linter, respectively, for this same construct.

Copy link
Member

@jsirois jsirois Mar 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both this form and the prior seem like more boilerplate than needed. Could non Optional fields be used as the signal to an external tester / creator? Given a @dataclass, the meaning of Optional and a Target instance, it seems you could have one piece of code that knows if that @dataclass can be created from the Target and can then create an instance of the @dataclass if so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given a @DataClass, the meaning of Optional and a Target instance, it seems you could have one piece of code that knows if that @DataClass can be created from the Target and can then create an instance of the @DataClass if so.

I don't follow. Do you mean factoring this out into some abstraction that removes most the boilerplate here? Or do you mean a more fundamental redesign?

If the former, almost certainly, yes, we can reduce boilerplate by refactoring this. But, I'm intentionally punting on boilerplate reduction until we finish porting all of V2 Python to the Target API so that we better understand all the usages we need to handle.

Copy link
Member

@jsirois jsirois Mar 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. The change from PythonBinaryFields / PythonBinaryImplementation just looks like a non-obvious win and the refactor looks like the more obvious win just reading this PR. Happy to ride out the thrash to get more data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change from PythonBinaryFields / PythonBinaryImplementation just looks like a non-obvious win

My bad with a confusing PR comment at the start of this thread. There are two things going on with this PR in regards to PythonBinaryFields:

  1. Rename PythonBinaryFields to PythonBinaryImplementation. This is solely a name change to reflect what ended up being far more natural in real-world usage, e.g. what sounded better in the error message in the PR description.
    • This is what the original comment in this thread was referring to.
  2. Have PythonBinaryImplementation subclass BinaryImplementation, whereas previously it was a stand-alone dataclass.
    • This removes some boilerplate. We no longer need to define is_valid_target() and valid_target_types() because the superclass gives us those.
    • This all still works with MyPy. We could probably remove even more boilerplate by putting more logic into the superclass, e.g. create(), but MyPy wouldn't be able to know what each field is, anymore.

I think #1 is stable. I don't think we'll want to change the name again. (But, we can, need be).

#2 is very fluid. I expect for there to be much churn the next week as we encounter new real-world use cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - I think #2 can be completely ditched given a @dataclass, the meaning of Optional and a Target instance. Time will tell.

f"type {repr(target.alias)}). All registered binary implementations:\n\n"
f"{all_implementations}."
)
if len(valid_binary_implementations) > 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this pre-empt some less well displayed failure from somewhere else in the pipeline? If not it seems like an artificial restrction. It seems completely reasonable to have a python_binary in a BUILD file and 1 rule that uses a PythonPEXBinary to extract fields needed to build a PEX from that target and another rule that uses a PythonIPEXBinary to extract fields needed to build an iPEX from that target and have both rules run and produce two binaries for the python_binary input target.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This invariant was previously enforced because we had UnionRule(BinaryTarget, PythonBinaryTargetAdaptor) and it's impossible for a TargetAdaptor to be more than one specific TargetAdaptor type. It either is a PythonBinaryTargetAdaptor or it's a MyCustomPythonBinaryTargetAdaptor; it can't be both.

We had a rule that went from PythonBinaryTargetAdaptor -> CreatedBinary, so we knew when calling await Get[CreatedBinary](BinaryTarget, target_adaptor) that we could, by definition, only have one single rule that would go from PythonBinaryTargetAdaptor -> CreatedBinary.

Here, we're opening up the flood gates that now it's possible for multiple binary implementations for any given target.

--

Sometimes, we like that there may be multiple implementations. For example, with lint.py, we depend on that to be able to run, say, 4 linters on the exact same target:

results = await MultiGet(
Get[LintResult](Linter, linter((adaptor_with_origin,)))
for adaptor_with_origin in adaptors_with_origins
for linter in linters
if linter.is_valid_target(adaptor_with_origin)
)

In contrast, with ./v2 test --debug, we must only have one target, so we have this validation with --debug, but not normally:

if options.values.debug:
target_with_origin = targets_with_origins.expect_single()
adaptor_with_origin = TargetAdaptorWithOrigin.create(
target_with_origin.target.adaptor, target_with_origin.origin
)
address = adaptor_with_origin.adaptor.address
valid_test_runners = [
test_runner
for test_runner in test_runners
if test_runner.is_valid_target(adaptor_with_origin)
]
if not valid_test_runners:
raise ValueError(f"No valid test runner for {address}.")
if len(valid_test_runners) > 1:
raise ValueError(
f"Multiple possible test runners for {address} "
f"({', '.join(test_runner.__name__ for test_runner in valid_test_runners)})."
)

--

Here, maybe we are okay with multiple binary implementations? I'm not sure if we'd want to run both Pex and IPex. Maybe, we'd instead want you to do something like ./v2 binary --implementation='ipex' to disambiguate (which is the current prototype for ./v2 repl). I'm not sure. (cc @gshuflin)

For this PR, I think we should punt on it and stick with the prior semantics. But rest assured that it is indeed possible to work with multiple binary implementations for a single target.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - sg for now.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be interesting to see whether the "filter targets and implementations, and limit to N" code that goals tend to need to write can be made generic (or perhaps even declarative). But walking the line of premature generalization is always challenging.

@benjyw
Copy link
Sponsor Contributor

benjyw commented Mar 19, 2020

The messaging improvement is great! Even greater would be something like:

The `run` goal requires one of the following target types: python_binary.
You provided `src/python/pants/util:strutil` which is of type python_library.

Just thinking about the end user here...

@Eric-Arellano
Copy link
Contributor Author

Even greater would be something like:

I love it! I'll include that as part of this PR.

Comment on lines +111 to +113
# TODO: this is a leaky abstraction that the error message knows this rule is being used
# by `run` and `binary`. How should this be handled? A custom goal author could depend on
# this error message too and the error would now be lying.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what to do about this TODO. We could rewrite this error message to remove the leak but still be meaningful, e.g. "Cannot create a binary for this target type...".

Although, that's not as helpful or direct as "The run goal only works with ...". It's an implementation detail that ./v2 run creates a binary to be able to then run that binary.

We could work around this by adding a new rule parameter with the error message so that call sites can customize the behavior. But, then ./v2 run couldn't leverage the work of ./v2 binary and vice versa.

Alternatively, we could somehow factor this coordinator_of_binaries rule out so that each call site can choose an error message that's meaningful to it? I'm not sure how trivial this would be.

--

Generally, I'd bias towards optimizing for the better error message and figuring out whatever gymnastics we need to support that great error message.

Copy link
Member

@jsirois jsirois Mar 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the active goal name or even the full parsed argv be installed as a root rule so that rules wanting to know about that data could just request it? If our invalidation no-opping cuts the mustard, this should be performant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe at a minimum that we could have a singleton for all invoked goals. I don't think we could ask for the "active" goal because that requires accessing mutable global state.

Copy link
Sponsor Member

@stuhood stuhood Mar 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it would be reasonably foolproof to add the context in run_goal_rules above any raised exception (when --no-print-exception-stacktrace)... it'd be on two lines, but:

Exception: While executing the `run` goal:                                   # rendered by run_goal_rule
This implementation can only be used with the following target types:        # rendered by a rule

Passing down context is fine, but then you have to rely on rule implementers to actually use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I like that! It's probably only worth rendering that message when multiple goals were given, though. If you only run ./v2 run, that context wouldn't really help for anything.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it would probably still help in that case. It adds context that differentiates this from pants "just being broken" in some way, before getting to the goal.

@Eric-Arellano Eric-Arellano merged commit 8a19737 into pantsbuild:master Mar 20, 2020
@Eric-Arellano Eric-Arellano deleted the targets branch March 20, 2020 06:39
Copy link
Sponsor Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment on lines +111 to +113
# TODO: this is a leaky abstraction that the error message knows this rule is being used
# by `run` and `binary`. How should this be handled? A custom goal author could depend on
# this error message too and the error would now be lying.
Copy link
Sponsor Member

@stuhood stuhood Mar 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it would be reasonably foolproof to add the context in run_goal_rules above any raised exception (when --no-print-exception-stacktrace)... it'd be on two lines, but:

Exception: While executing the `run` goal:                                   # rendered by run_goal_rule
This implementation can only be used with the following target types:        # rendered by a rule

Passing down context is fine, but then you have to rely on rule implementers to actually use it.

f"type {repr(target.alias)}). All registered binary implementations:\n\n"
f"{all_implementations}."
)
if len(valid_binary_implementations) > 1:
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be interesting to see whether the "filter targets and implementations, and limit to N" code that goals tend to need to write can be made generic (or perhaps even declarative). But walking the line of premature generalization is always challenging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants