Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First round of Target Field Externalization #4535

Closed
stuhood opened this issue May 1, 2017 · 22 comments

Comments

Projects
2 participants
@stuhood
Copy link
Member

commented May 1, 2017

The first round will involve implementing APIs like those described in https://docs.google.com/document/d/102EFbwk6cpM9-_4ZSYhMQYA0zL1UKbXzW-DCd8KsFeg/edit?usp=sharing , and then using them in a few places. The migration of all target parameters to this API will take a significant amount of time, so the first round should only actually update enough Targets subclasses to demonstrate that the API works.


A quick explanation of the status quo as of -March 2018- April 2019:

None of these APIs are public yet, and they're all going to need to change... they came out of the engine experiment, and they support more features than we need. So we need to figure out which bits to prune.

Currently the hierarchy of construction (bottom to top) is:

  • HydratedStruct - A Struct that has been deserialized from a BUILD file. Currently this type supports inlining other Structs via address references: see this example: https://github.com/pantsbuild/pants/blob/master/tests/python/pants_test/engine/examples/graph_test/self_contained.BUILD#L74-L84 ...where because the configurations field is expecting to receive Structs, the inline address reference there is expanded and inlined into
  • TargetAdaptor - A Struct subclass representing a target. So in the case of expanding a legacy build graph, the above step operates on concrete subclasses of TargetAdaptor. A TargetAdaptor still has sources represented as a PathGlobs object in a SourcesField wrapper: ie, the source globs haven't been expanded.
  • HydratedTarget - An object containing enough information to actually construct the legacy BuildGraph interface (with an EagerFilesetWithSpec object representing fully expanded and fingerprinted globs). As described on #4769 and in the document linked above, this is not the interface we want to expose to anyone, as it is too granular to avoid expanding sources if the usecase doesn't require them.
@stuhood

This comment has been minimized.

Copy link
Member Author

commented Jun 7, 2017

Relates to #4641: people are running into the partial coverage of the previous target API in 1.3.0.

@stuhood

This comment has been minimized.

Copy link
Member Author

commented Feb 6, 2018

Relates to #3991 ... it's possible that with v1 build file parsing "gone" in master, that approach might be more feasible than before.

@stuhood

This comment has been minimized.

Copy link
Member Author

commented Mar 29, 2018

We should reevaluate this post #5580: it's possible that @jsirois's idea to make Structs inheritable/mergeable/etc with typed fields is totally excellent once we have easier @rule APIs.

@stuhood

This comment has been minimized.

Copy link
Member Author

commented Apr 12, 2018

@jsirois : I had another thought here around how to do extensibility without subclassing (which remained an open question in the doc linked in the description).

I think that a somewhat natural way to constrain the types that are legal in a particular field while still allowing for extension, would be to have "named TypeConstraints unions". As an example: for determining which Configuration/Field types are legal on Structs, rule registration could mutate or aggregate a TypeConstraint called "LegalFieldTypes" (or "legal_field_types", depending). @rules which needed to consume "all legal field types" would use that constraint by name... meanwhile, the @rule graph would still need to have @rules installed that could satisfy each of the unioned members of the named constraint... so if you had a yield Get(HydratedField, LegalFieldTypes, x) call, each member of the LegalFieldTypes union would need to have an @rule that could provide a HydratedField.

This "named type unions" concept could also apply to other plugin extension points: @rule authors would add some possible type to a union, and then add an @rule to satisfy it.

@stuhood

This comment has been minimized.

Copy link
Member Author

commented Jun 16, 2018

This "named type unions" concept could also apply to other plugin extension points: @rule authors would add some possible type to a union, and then add an @rule to satisfy it.

It occurs to me that in order for consuming rules to consume the output of this type of union, they'd need to be oblivious to the union members... and in order for that to happen and still be useful, the union members would probably all need a shared parent class. So... this might actually look a bit like subtyping with a closed universe of subclasses?

@stuhood

This comment has been minimized.

Copy link
Member Author

commented Sep 9, 2018

This is related to #6449, because the implementation of dep inference relies on implementers being able to inject new Field types and rules to provide them.

I think that all of the sketches described above have begun to solidify in my mind around "subtyping with a closed universe of subclasses". But that will require a bit of refactoring of how unions work now, and I don't think we should start that until #5788 is completed.

So while it is a bit of a yakshave, I'm going to consider this blocked by #5788.

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

commented Sep 10, 2018

I'm probably missing something, but:

This "named type unions" concept could also apply to other plugin extension points: @rule authors would add some possible type to a union, and then add an @rule to satisfy it.

I don't understand what the named union approach provides -- the requirement of a closed universe of possible types of input is already achieved by requiring a unique path from subject to product in the rule graph for every subj/prod pair requested.

It occurs to me that in order for consuming rules to consume the output of this type of union, they'd need to be oblivious to the union members... and in order for that to happen and still be useful, the union members would probably all need a shared parent class. So... this might actually look a bit like subtyping with a closed universe of subclasses?

Rules consuming some product A are already oblivious to the path from the subject to A? if we're thinking about requiring some common parent class anyway, it seems like instead we could make A just a normal datatype containing all of the info that rules consuming A would need, and rely on the fact that there must be a single unique path from subject to A for polymorphism? In the case of A = Field, I would think a rule author can make whatever kind of type F they want to represent their fields, and then define an @rule from F to A.

each member of the LegalFieldTypes union would need to have an @rule that could provide a HydratedField

This seems like exactly the guarantee we already have by requiring a unique path from product to subject. I'm looking at the doc linked up top and I cannot understand what the explicit, closed union concept provides over the implicit closed union that the rule graph already provides.

I have no problem with the idea, I just think requiring a unique path in the rule graph is a very simple model to use and debug and I would prefer to use the polymorphism from that already than require anyone to start subclassing anything. The property "this type subclasses this other type" means "this type provides some known set of methods/attrs" -- it seems like that can be achieved by using a datatype A which provides the methods/attrs you require, and then requiring that there be a unique path from whatever type the rule author can think of to A. I don't see what utility further narrowing that has as of now, it seems to make the model more complex. I could very easily be missing something obvious.

@stuhood

This comment has been minimized.

Copy link
Member Author

commented Sep 10, 2018

Re-reading my own comments, I think that I flip-flopped a few times between discussing "output" types (like HydratedField) and "input" types (like Field / SourcesField / BundleField). It's not the case that the input types need to be subclasses of any particular type, because the @rule needn't actually consume them, and can just pass them from place to place.

Given that confusion, I think I'll probably refresh the design doc for another round of review.

@stuhood

This comment has been minimized.

Copy link
Member Author

commented Sep 10, 2018

I don't understand what the named union approach provides -- the requirement of a closed universe of possible types of input is already achieved by requiring a unique path from subject to product in the rule graph for every subj/prod pair requested.

@cosmicexplorer : It provides extensibility into existing @rules that might not be aware of all of the types of a union. Concretely, it allows replacing:

if isinstance(x, SourcesField):
  hf = yield Get(HydratedField, SourcesField, x)
elif isinstance(x, BundleField):
  hf = yield Get(HydratedField, BundleField, x)
else:
  ..

with:

hf = yield Get(HydratedField, FieldsUnionType, x)

The former has a hardcoded list of legal fields, and so you cannot add additional Field types (or @rules to satisfy them) without editing that rule.

@stuhood

This comment has been minimized.

Copy link
Member Author

commented Sep 20, 2018

I rebased the AggregationRule concept out of #6170 before landing it, because it doesn't quite fit this usecase, but was in the same area. Pushed it over here for reference: 25322b1

@stuhood stuhood added the P3 - M3 label Sep 28, 2018

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

commented Jan 18, 2019

@cosmicexplorer : It provides extensibility into existing @rules that might not be aware of all of the types of a union. Concretely, it allows replacing:

if isinstance(x, SourcesField):
  hf = yield Get(HydratedField, SourcesField, x)
elif isinstance(x, BundleField):
  hf = yield Get(HydratedField, BundleField, x)
else:
  ..

with:

hf = yield Get(HydratedField, FieldsUnionType, x)

The former has a hardcoded list of legal fields, and so you cannot add additional Field types (or @rules to satisfy them) without editing that rule.

I'm thinking about something a lot like the second snippet, e.g.:

# This type is purely used as a tag, and is never instantiated.
# The @union decorator ensures this.
@union
class FieldsUnionType: pass

@union_rule(FieldsUnionType)
class SourcesField(...):
  # ...

@rule(HydratedField, [Select(SourcesField)])
def hydrate_sources(sources_field):
  # ...
  yield HydratedField(...)

@union_rule(FieldsUnionType)
class BundlesField(...):
  # ...

@rule(HydratedField, [Select(BundlesField)])
def hydrate_bundles(bundles_field):
  # ...
  yield HydratedField(...)

@rule(HydratedTarget, [Select(TargetAdaptorContainer)])
def hydrate_target(target_adaptor_container):
  target_adaptor = target_adaptor_container.value
  # The above @union_rule decls state that the allowed types here are:
  # Exactly(SourcesField) => if satisfied, the rule graph runs `hydrate_sources()`
  # Exactly(BundlesField) => `hydrate_bundles()`
  # _ => the same error we already give if a `Get(...)` input doesn't match the
  #         declared input type
  hydrated_fields = yield [
    Get(HydratedField, FieldsUnionType, x) 
    for x in target_adaptor.field_adaptors)
  ]
  # ...
  yield HydratedTarget(...)

def rules():
  return [
    FieldsUnionType,
    SourcesField,
    # If SourcesField is already defined in another file without
    # the decorator, the below would lead to the exact same behavior:
    # union_rule(FieldsUnionType)(SourcesField)
    hydrate_sources,
    BundlesField,
    hydrate_bundles,
    hydrate_target,
  ]
@stuhood

This comment has been minimized.

Copy link
Member Author

commented Jan 18, 2019

@cosmicexplorer : That looks like it would work, yea. Nice.

The syntax is pretty reasonable, although I think that the @union vs @union_rule split could maybe be cleaned up a bit: since they'll both be going in the list of def rules(): .., having some uniformity across the types that end up in there would be good. Strawman (which I don't like at all: just an example): @union_rule and @union_member_rule.

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

commented Jan 18, 2019

Right -- I forgot that all of the v2 decorators end in rule. I like @union_rule and @union_member_rule(UnionType) actually? I feel like having decorators is preferable to doing something like returning a UnionMember(...) in the rules() method just because it makes the interface look the same, and then rules() is just one thing -- a list of exports. Is your concern with the name @union_member_rule, or something about the splitting of @union vs @union_rule logic?

@stuhood

This comment has been minimized.

Copy link
Member Author

commented Jan 18, 2019

Is your concern with the name @union_member_rule, or something about the splitting of @union vs @union_rule logic?

Just naming, and the fact that we parse all def rules(): .. into a list of class Rule instances via the RuleIndex:

class RuleIndex(datatype(['rules', 'roots'])):

@stuhood

This comment has been minimized.

Copy link
Member Author

commented Jan 18, 2019

I am also excited because I think that this representation of unions is a much better one than what we used to use TypeConstraint for (and which @illicitonion removed here: #6980), and so we could probably remove usage of TypeConstraint in favor of just TypeId everywhere. It's already unused, but this looks like the nail in the coffin.

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

commented Jan 18, 2019

we parse all def rules(): .. into a list of class Rule instances via the RuleIndex

Great, this makes sense. If I were to toss up a PR with the names @union_rule and @union_member_rule, would that be fine? We can absolutely bikeshed now or later, names matter.

remove usage of TypeConstraint in favor of just TypeId everywhere

I'm in favor of this because then we can start getting more creative with types (e.g. in a fantasy world, auto-deriving @union_rules from subclass relationships). This could be done with TypeConstraints too, but starting with a less general construct like TypeId sounds neat. This also makes it very natural to have hash tables of types, which again can be done with TypeConstraints, but I feel like we can introduce that additional structure (knowledge of subclassing/etc) into the engine later.

@stuhood

This comment has been minimized.

Copy link
Member Author

commented Jan 19, 2019

remove usage of TypeConstraint in favor of just TypeId everywhere

I'm in favor of this ... This could be done with TypeConstraints too, but starting with a less general construct like TypeId sounds neat.

On a related note: there is a TODO a few lines down in the RuleIndex link above referring to this exact awkwardness and linking #4005. It's possible that removing usage of TypeConstraint here would make it slightly easier to add this improved mechanism.

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

commented Jan 19, 2019

It's possible that removing usage of TypeConstraint here would make it slightly easier to add this improved mechanism.

Currently running with this hypothesis and turning output_constraint into a type and reviewing the carnage.

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

commented Jan 19, 2019

Currently running with this hypothesis and turning output_constraint into a type and reviewing the carnage.

Managed to remove almost a net 100 lines with #7114! Looking at #6936 next because it is useful and we can do more fun things with TypeConstraints now.

@stuhood stuhood closed this in #7116 Mar 6, 2019

Python Pipeline Porting automation moved this from To Do to Done Mar 6, 2019

stuhood added a commit that referenced this issue Mar 6, 2019

@union / UnionRule for letting the engine figure out paths to product…
…s not known in advance (#7116)

### Problem

*Resolves #4535.*

It's currently difficult for rules to tell the engine "give me an instance of `y` from this `x` according to the registered rules in the rule graph" if `x` is not a specific type known in advance (see #4535 for use cases for this). The alternative is that upstream rules have to use conditional logic outside of the engine, which is very difficult to write in a way that combines with arbitrary rulesets from downstream plugins, and code written to get around this can be error-prone. This is sad because the rule graph is generated by combining rules in pants core as well as in plugins, so all the necessary information is there, we just can't make use of it.

This PR introduces the `@union` decorator and `UnionRule` type to allow for static resolution of "union" types in `yield Get(Product, UnionType, subject)` statements as per [this comment in #4535 and below](#4535 (comment)).

### Solution

#### Python
- Add a `subject_declared_type` field to `pants.engine.selectors.Get` instead of inferring it from the subject type (the 2-argument form of `Get` is still supported).
- Introduce `Get.create_statically_from_rule_graph()` classmethod to make it clear when the `Get` subject is populated or not.
- Introduce `@union` and `UnionRule` to describe union types which can be requested with a `yield Get(Product, UnionType, subject)` in rule bodies (as per [this comment on #4535](#4535 (comment))).
  - **Note that `@union` classes are not registered in `def rules(): ...` -- this distinction seems to make sense as union classes are never instantiated.**
- Create a really simple `union_rules` dict field in `RuleIndex` which registers `@union_rule()` decorators as a map from `union base type -> [union members]`.
- Propagate the `union_rules` dict to the scheduler, and when adding `Get` edges (in `_register_task()` in `scheduler.py`), check if the `subject_declared_type` is a union base, and if so add edges to all union members.
- Create a `HydrateableField` union base which is used to resolve (for now) either `SourcesField` or `BundlesField` in a `yield Get()`.

### Result

Users can now dynamically add union members in plugins and backends to be processed by upstream rules using `yield Get(...)` which don't know anything about them, and with a static universe of known union members which the engine uses uses to type-check the subject of a  `yield Get(...)` at rule execution time.

stuhood added a commit that referenced this issue Mar 6, 2019

convert usages of TypeConstraint to TypeId for rule products in the e…
…ngine (#7114)

### Problem

See #4535 and #4005, in particular [this comment on #4535](#4535 (comment)). `TypeConstraint` is a pretty general construct that we would like to do more with, for example #6936, and as of [the current discussion in #4535](#4535 (comment)) we realize we can emulate union types in `@rule`s without it, matching just against a specific type.

### Solution

- Convert `output_constraint` in the `Rule` subclasses in `rules.py` into `output_type`, and add a `SubclassesOf(type)` type check in `datatype` fields in those classes to ensure this.
- Remove `satisfied_by()` and `satisfied_by_type()` externs, and add a `product_type()` extern used to intern a python `type` as a `TypeId`.
- Convert all `TypeConstraint`s passed to the engine for intrinsics (e.g. `Snapshot`) into `TypeId`s.
- Match whether a rule's result matches its declared output type by simply using `==`!
- Manually implement `fmt::Debug` for `TypeId` to be the same as `Display` (we may want to do these differently in the future, but it is very useful to see the actual type name when debugging).

### Result

Everything is the same, but now we don't have the additional complexity of `TypeConstraint` down in the engine when we can do simple `TypeId` equality. This lets us get more creative with `TypeConstraint` on the python side, while type checking on the rust side is a little less complex (and probably more efficient to some degree).

stuhood added a commit that referenced this issue Apr 11, 2019

Fuse hydrated and unhydrated Struct parsing (#7523)
### Problem

As described in #4535, there are a bunch of different layers to our BUILD file parsing currently, mostly because we were working up from a set of powerful experiments that @jsirois started, and down from the existing (legacy) `BuildGraph` model.

Now that `@union` and `union_rule` are in place, it's almost time to take a holistic look at what our target API should look like. Before doing that, we can remove at least one of the parsing layers to simplify things a bit.

### Solution

Merge the first two layers described in #4535 (`UnhydratedStruct` and `HydratedStruct` (nee `TargetAdaptorContainer`)) into `HydratedStruct`. This removes one datatype and one rule, and prunes some implementation-specific test code.

### Result

Fewer rules/nodes, one less concept, and slightly simpler code.

stuhood added a commit that referenced this issue May 16, 2019

Use @union to make the v2 test runner generic (#7661)
### Problem

Fixing #4535 moved us closer to supporting a `Target` API for v2, but did not begin to use `@union` and `UnionRule` to make the v2 test `@console_rule` generic.

### Solution

Add `@union TestTarget`, and consume it in the v2 `test` `@console_rule`. The result is that a language implementer that wants to add support for testing a target type `MyTestTarget` can declare a `UnionRule(TestTarget, MyTestTarget)` in their registered rules, which would in turn require a declared `@rule` to produce a `TestResult` for `MyTestTarget`.

In order to use `PythonTestAdaptor` as a member of the `TestTarget` union, the bottom commit refactors the `SymbolTable` manipulation that we do to preserve the concrete `TargetAdaptor` classes.

### Result

Although we're not quite ready to "bless" `TargetAdaptor` by requiring it in the `SymbolTable` or coupling it to the v1 `Target` class, this change represents an incremental step in the direction of using (a likely renamed) `TargetAdaptor` incarnation as the v2 Target class.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.