New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@rules as coroutines #5580

Merged
merged 7 commits into from Apr 6, 2018

Conversation

@stuhood
Copy link
Member

stuhood commented Mar 9, 2018

Problem

The "complex" Selector types used to select @rule inputs for subjects other than the current subject of an @rule have proven to be entirely too complicated. Additionally, their lack of flexibility meant that we needed to introduce many intermediate types to provide links between @rules in cases where imperative logic was needed for filtering or conditionals.

The only remaining motivating factor for these complex selectors was that they allow for closed-world compilation of the @rule graph, which avoids the need for error checking in rule bodies and provides for easier composition of sets of @rules.

Solution

After experimenting with using coroutines (ie: functions containing yield statements that receive a result on their left hand side), it became clear both that:

  1. Coroutines were universally easier to understand than the Selector DSL
  2. The dependency requests made by coroutines could still easily be statically checked via a small amount of runtime introspection.

And so, this change removes all usage of SelectProjection and most usage of SelectDependencies in favor of @rule coroutines. It does not yet remove the implementation of those Selectors (although doing so will shed a massive amount of complexity from the codebase).

Result

@rules are still statically checked, but are now much easier to write. Performance is unaffected.

@stuhood stuhood requested review from baroquebobcat , illicitonion and kwlzn Mar 9, 2018

Select(Address)])
def resolve_unhydrated_struct(address_mapper, address_family, address):
@rule(UnhydratedStruct, [Select(AddressMapper), Select(Address)])
def resolve_unhydrated_struct(address_mapper, address):

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Mar 9, 2018

Contributor

I was just looking at this exact rule today to try to understand how to use SelectProjection and this change is RIDICULOUSLY cool imo

"""Given an Address and its AddressFamily, resolve an UnhydratedStruct.
Recursively collects any embedded addressables within the Struct, but will not walk into a
dependencies field, since those are requested explicitly by tasks using SelectDependencies.
"""

address_family = yield Get(AddressFamily, Dir(address.spec_path))

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Mar 9, 2018

Contributor

If no Dir -> AddressFamily path exists in the rule graph, does it error at this line? Would the previous use of SelectProjection have errored out earlier, during the rule graph construction?

This comment has been minimized.

@stuhood

stuhood Mar 9, 2018

Member

Yep, exactly. And that's the primary downside of this change as it stands: we would lose the validation that all rules are satisfiable.

Have been thinking more about this though, and I think that with some very, very basic parsing (essentially, "capture the RHS of yield statements"), we could get back the static checking... which might be justified.

This comment has been minimized.

@stuhood

stuhood Mar 9, 2018

Member

...and also, as it stands the exception would be created in the rust code rather than literally on this line. It looks like it is easy to send an exception to the generator though with generator.throw().

EDIT: Hmm, but actually: the downside of this would be that it would introduce the possibility of people accidentally/intentionally implementing error checking. And frankly, I'd rather get the static checks fixed and discourage all error handling in @rules.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Mar 9, 2018

Contributor

I think "capture the RHS of yield statements" is a highly reasonable goal, but I don't know if it needs to be done by parsing -- that being said I remember python's ast module is supposed to be not awful? I can't see what the exact code would look like right now, but since we know we can get the desired info we want before invoking the python rule definition (with the previous example using SelectProjection), I think there's hopefully a middle ground where we can specify rule graphs without too much ceremony (probably exactly the same in spirit as parsing for yield statements). I would love to parse some python for yield statements if it gets us there, but I think there might be a neater way -- will think on it.

This comment has been minimized.

@stuhood

stuhood Mar 28, 2018

Member

This is now resolved: all Get statements in the body of an @rule are statically checked at RuleGraph construction time.

@stuhood

This comment has been minimized.

Copy link
Member

stuhood commented Mar 10, 2018

Parsing yield statements from @rules appears to be straightforward: have added one more commit on top that demonstrates it in a single file.

So I think that with one relatively minor restriction (which technically already existed) we can preserve the static checks; namely: the request/Get/yield would need to include both the product and subject type explicitly. An example syntax might be:

# verbose form: Get(product_type, subject_type, subject)
classpath = yield Get(Classpath, ScalaSources, sources)
# potential shorthand: Get(product_type, subject_type(subject))
classpath = yield Get(Classpath, ScalaSources(snapshot))

EDIT: And after thinking about it for a few minutes more: we'd probably want to parse the calls to the Get/"whatever" constructor rather than the yield statements themselves... that would more naturally allow for (for example) building up a mutable list of requests and then yielding them. And what we need are the arguments to the constructor anyway.
EDIT2: Did that. Looks good.

@stuhood stuhood force-pushed the twitter:stuhood/enginerator branch 2 times, most recently from ec40733 to 051c0e2 Mar 10, 2018

@illicitonion

This comment has been minimized.

Copy link
Contributor

illicitonion commented Mar 12, 2018

I like it, much cleaner than SelectProjection imo :)

@stuhood stuhood force-pushed the twitter:stuhood/enginerator branch from 051c0e2 to c568241 Mar 20, 2018

@stuhood stuhood referenced this pull request Mar 27, 2018

Closed

@rules to merge Snapshots #5502

@stuhood stuhood force-pushed the twitter:stuhood/enginerator branch 2 times, most recently from 3c86fa8 to 73f73b5 Mar 27, 2018

@stuhood

This comment has been minimized.

Copy link
Member

stuhood commented Mar 28, 2018

I've implemented the static checks for Get calls, so this is almost ready to graduate from experiment to recommended API.

The next order of business: bikeshedding. What should Get's final name be?

@illicitonion

This comment has been minimized.

Copy link
Contributor

illicitonion commented Mar 28, 2018

Very cool :) Though I'm not sure I entirely follow the fine details of everything going on.

I'm happy to keep the name Get - we can always change it in the future if we need to (I'm assuming this will be an "unstable" API for a bit :))

My next request would be a few Python unit tests which should this actually in use

@@ -356,7 +372,7 @@ impl Select {
product: self.product().clone(),
variants: self.variants.clone(),
task: task,
entry: entry.clone(),
entry: Arc::new(entry.clone()),

This comment has been minimized.

@illicitonion

illicitonion Mar 28, 2018

Contributor

Seems a little strange for entry to be both Arc'd and clone'd here. My instinct (which may be wrong) is that either it's a cheap to clone type, and we shouldn't be Arcing it, or it's an expensive to clone type, and we should always be Arcing it...

This comment has been minimized.

@stuhood

stuhood Mar 28, 2018

Member

Yea, I have a separate review that attempts to Arc up the rule-graph, since it is full of tiny Vecs. On the other hand, it does not appear to impact performance in a significant way... so perhaps this would not either. It's very surprising to me that all of the allocation doesn't have a noticeable impact.

@stuhood stuhood force-pushed the twitter:stuhood/enginerator branch 2 times, most recently from c4b40a3 to b51dae5 Mar 29, 2018

@stuhood stuhood requested a review from benjyw Mar 29, 2018

@stuhood stuhood force-pushed the twitter:stuhood/enginerator branch 3 times, most recently from 9e22ec5 to 2b70546 Mar 29, 2018

@stuhood stuhood changed the title [experiment] @rule generators @rules as coroutines Mar 31, 2018

@stuhood stuhood requested review from jsirois and dotordogh Mar 31, 2018

@stuhood

This comment has been minimized.

Copy link
Member

stuhood commented Mar 31, 2018

I haven't written any new unit tests, but I updated the documentation in the topmost commit and added validation to the Get constructor, so I believe that this is now reviewable.

@stuhood stuhood force-pushed the twitter:stuhood/enginerator branch from 2b70546 to 56dd7bd Mar 31, 2018

@stuhood

This comment has been minimized.

@jsirois

This comment has been minimized.

Copy link
Member

jsirois commented Apr 2, 2018

I like the direction, but I'm not in love with the new syntax / concept / mixed-point dependency injection. Perhaps best to explain with a summary example. From the current change we now have:

class Get(datatype('Get', ['product_type', 'subject'])):
  pass

@rule(UnhydratedStruct, [Select(AddressMapper), Select(Address)])
def resolve_unhydrated_struct(address_mapper, address):
  address_family = yield Get(AddressFamily, Dir(address.spec_path)
  ...

So we have dependency injection through function arguments and, now, co-routine results.

This could alternatively be modeled with just dependency injection through function arguments if we allow for both terminal values and rules (functions) to be injected:

class Rule(datatype('Rule', ['product_type', 'subject_type'])):
  def __call__(subject):
    pass  # IE: Get
  
  def map(subjects):
    pass  # IE: GetMulti

@rule(UnhydratedStruct, [Select(AddressMapper), Select(Address), Rule(AddressFamily, Dir)])
def resolve_unhydrated_struct(address_mapper, address, address_family_rule):
  address_family = address_family_rule(address.spec_path)
  ...

IOW, I'm a Rule author who knows how to operate on certain selected values but I need to enlist the help of other rules imperatively. The proposed syntax change says this more directly I think and allows eliminating AST gymnastics.

Taken further (but importantly ignoring variant selects), this just means @rule becomes nothing but type annotations for a python function; ie: things eventuially could just boil down to:

@rule(UnhydratedStruct, [AddressMapper, Address, Rule(AddressFamily, Dir)])
def resolve_unhydrated_struct(address_mapper, address, address_family_rule):
@stuhood

This comment has been minimized.

Copy link
Member

stuhood commented Apr 2, 2018

Another advantage that @illicitonion might point out is that avoiding coroutines would mean that we could more easily use Skylark at some point (which doesn't support them, afaik).

EDIT: Then again, skylark probably doesn't support decorators either.

@jsirois

This comment has been minimized.

Copy link
Member

jsirois commented Apr 2, 2018

I may be missing something here, but couldn't the implementation of the call into the engine be masked, deferring the optimizations for behind-the-scenes work?:

>>> def coroutine(value):
...   result = yield rust(value)
...   yield result
... 
>>> def rust(value):
...   return value * 2
... 
>>> def using_coroutine(value):
...   c = coroutine(value)
...   for r in c:
...     if r:
...       return r
... 
>>> using_coroutine(3)
6

Where using_coroutine is what is passed into the example above as the Rule implementation stub.

@stuhood

This comment has been minimized.

Copy link
Member

stuhood commented Apr 2, 2018

I may be missing something here, but couldn't the implementation of the call into the engine be masked, deferring the optimizations for behind-the-scenes work?:

result = yield rust(value) is still a "python calls rust" situation... so the coroutine would not actually be doing anything useful with the yield. In this PR the @rule yields to the rust code which sends back the value. Again, really not certain that it buys us less time under the GIL. But it's recursion safe at least?

@jsirois

jsirois approved these changes Apr 3, 2018

Copy link
Member

jsirois left a comment

Finished some discussion about coroutines vs rule stubs injected as values like all other values in the function arguments in slack #engine. I'd still prefer the latter on behalf of the average rule writer, but my instincts may be wrong and I don't strongly object.

@stuhood

This comment has been minimized.

Copy link
Member

stuhood commented Apr 3, 2018

After dancing around the realization a bit, it occurred to us this morning that if we have synchronous calls back and forth between python and rust, we can't have async io. While fully switching to synchronous io via rayon would be a possibility, it would be a pretty massive refactor... and there is a lot of good stuff coming down the pipe for async in rust.

@stuhood stuhood force-pushed the twitter:stuhood/enginerator branch from 56dd7bd to 9879ff2 Apr 5, 2018

@stuhood

This comment has been minimized.

Copy link
Member

stuhood commented Apr 5, 2018

The top commit now adds a test utility function that makes it easier to test @rules that make Get requests.

@stuhood stuhood force-pushed the twitter:stuhood/enginerator branch from 9879ff2 to f62743a Apr 5, 2018

@baroquebobcat
Copy link
Contributor

baroquebobcat left a comment

I really like the direction this is going.

The yield syntax in rules is a lot more understandable than the projecting selects.

I'd like to think a bit more about how we can validate that rules construct the Gets in expected ways going forward, but I don't think that blocks getting this in.

I do have a few minor requests about the doc updates, but I don't think they're blocking. I think doing a couple copy edit passes on the docs might be worthwhile as a separate task.

@rule(StringType, [Select(IntType)])
def int_to_str(an_int):
return str(an_int)
The first argument to the `@rule` decorator is the Product (ie, return) type for the @rule. The

This comment has been minimized.

@baroquebobcat

baroquebobcat Apr 5, 2018

Contributor

nit: second @rule needs back ticks

it will next look for any other registered Rules that can compute an IntType Product for the
the subject is already of `type(subject) == IntType`, then the @rule will be satisfiable without
any other depenencies. On the other hand, if the type _doesn't_ match, the engine doesn't give up:
it will next look for any other registered @rules that can compute an IntType Product for the
Subject (and so on, recursively.)

This comment has been minimized.

@baroquebobcat

baroquebobcat Apr 5, 2018

Contributor

There's a couple more @rule references here without backticks. Also s/depenencies/dependencies/

@rule(FormattedInt, [Select(IntType)])
def int_to_str(an_int):
return FormattedInt('{}'.format(an_int))
```

This comment has been minimized.

@baroquebobcat

baroquebobcat Apr 5, 2018

Contributor

I think it'd make sense to add a paragraph here explaining the FormattedInt, and maybe providing an example of what a subject would be that would cause it to be evaluated. I think discussing subjects here would help the next section do its job of explaining how to change subjects.

In cases where this is necessary, `@rule`s may be written as coroutines (ie, using the python
`yield` statement) that yield "`Get` requests" that request products for other subjects. Just like
`@rule` parameter Selectors, `Get` requests instatiated in the body of an `@rule` are statically
checked to be satisfiable in the set of installed `@rules`.

This comment has been minimized.

@baroquebobcat

baroquebobcat Apr 5, 2018

Contributor

nit: s/instatiated/instantiated/

pass
This @rule declares that: "for any Subject for which we can compute `Files`, we can also compute
`ConcattedFiles`". Each yielded `Get` request results in FileContent for a different File Subject
from the Files list.

This comment has been minimized.

@baroquebobcat

baroquebobcat Apr 5, 2018

Contributor

This feels like it could use a little more explanation in it. Maybe something like

This `@rule` declares that: "for any Subject for which we can compute `Files`, we can also compute
`ConcattedFiles`". In order to do that, it needs the `FileContent` for each of the files in the selected 
`Files`. It gets those contents by yielding `Get` requests that ask for `FileContent`s for with each 
file as a Subject. Once those have been collected, it concatenates the contents and yields the result.
if isinstance(node, ast.FunctionDef) and node.name == func.__name__:
rule_visitor = _RuleVisitor()
rule_visitor.visit(node)
gets.extend(Get(resolve_type(p), resolve_type(s)) for p, s in rule_visitor.gets)

This comment has been minimized.

@baroquebobcat

baroquebobcat Apr 5, 2018

Contributor

This is pretty slick. After we nail down the API further, or as part of doing so, I'd like to see a bunch of tests that cover different ways that Gets are defined and the error cases defined here.

return (product_type.id, subject_type.id)
else:
raise Exception('Invalid {}; expected either two or three args, but '
'got: ({})'.format(Get.__name__, render_args()))

This comment has been minimized.

@baroquebobcat

baroquebobcat Apr 5, 2018

Contributor

Perhaps these should raise ValueError?

@stuhood stuhood force-pushed the twitter:stuhood/enginerator branch from f62743a to febb531 Apr 5, 2018

@stuhood stuhood merged commit 99c2a50 into pantsbuild:master Apr 6, 2018

1 check was pending

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details

@stuhood stuhood deleted the twitter:stuhood/enginerator branch Apr 6, 2018

@stuhood stuhood referenced this pull request Apr 6, 2018

Merged

Remove SelectProjection #5672

stuhood added a commit that referenced this pull request Apr 6, 2018

Remove SelectProjection. (#5672)
`SelectProjection` is entirely superseded by the new `@rule` coroutine syntax from #5580.

@stuhood stuhood added this to To Do in Python Pipeline Porting via automation Apr 25, 2018

@stuhood stuhood moved this from To Do to Done in Python Pipeline Porting Apr 25, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment