New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge subjects and variants into Params, and remove Noop #6170

Merged
merged 13 commits into from Sep 20, 2018

Conversation

Projects
None yet
4 participants
@stuhood
Copy link
Member

stuhood commented Jul 17, 2018

Problem

As described in #5788: @rules need a way to rely on values that are provided at request time, but which do not necessarily participate in the signatures of all @rules between the root and the consuming @rule. This is related to the existing concept of "variants", but requires an implementation that does not introduce variants into Node identities in subgraphs where they are not required.

Additionally, as described a while back in #4304, it should be possible to generate concrete subgraphs by removing ambiguity from the RuleGraph... but ambiguity is currently a "feature" required for composability of @rules that do not know about one another.

Solution

This change merges Variants and "subjects" into Params, and statically determines which Params are required in each subgraph. In order to handle cases where multiple providers of an @rule type dependency are available with different required input Params, the change "monomorphizes" (duplicates) RuleGraph entries per used parameter set. This allows us to remove runtime Noops, because every RuleGraph entry (and thus Node) has exactly one @rule provider for each of its declared dependencies.

Result

Lays groundwork for #4020 and #5788. Fixes #4304 by monomorphizing RuleGraph entries and removing Noop. Fixes #4027 by... deleting that code.

This change does not yet expose any sort of UX for providing more than one Param in a Get or root request, but it was already way too large, so I've opened #6478 for followup.

@illicitonion
Copy link
Contributor

illicitonion left a comment

Looks like this is getting a lot simpler :) Thanks!

A few questions from having a look through (definitely not close review, as it's WIP :))...

/// as sorted string tuples.
/// Params represent a TypeId->Key map.
///
/// For efficiency and hashability, they're stored as sorted Keys (with distinct TypeIds), and

This comment has been minimized.

@illicitonion

illicitonion Jul 20, 2018

Contributor

This description made me assume this would be a Cow<BTreeSet<Key>> rather than an Arc<Vec<Key>>...

This comment has been minimized.

@stuhood

stuhood Jul 25, 2018

Member

If we were to use a BTreeSet, we'd still need to verify unique typeids, so would need a wrapper around Key to redefine Ord in terms of typeids.

It's possible that the Arc is overkill here though (since Key is so small).

impl Params {
pub fn new(mut params: Vec<Key>) -> Params {
params.sort_by(|l, r| l.type_id().cmp(r.type_id()));
params.dedup_by(|l, r| l.type_id() == r.type_id());

This comment has been minimized.

@illicitonion

illicitonion Jul 20, 2018

Contributor

Feels like it should be an error for this line to actually do anything?

This comment has been minimized.

@stuhood

stuhood Jul 25, 2018

Member

Agreed. Will add.

This comment has been minimized.

@illicitonion

illicitonion Sep 10, 2018

Contributor

TODO?

entries: edges.entries_for(&select_key),
}
pub fn new(product: TypeConstraint, params: Params, edges: &rule_graph::RuleEdges) -> Select {
Self::new_with_selector(selectors::Select::without_variant(product), params, edges)

This comment has been minimized.

@illicitonion

illicitonion Jul 20, 2018

Contributor

Is the plan for Select to disappear in favour of just a TypeConstraint (or maybe a wrapper-type around it), and for tasks_add_select_variant to disappear?

This comment has been minimized.

@illicitonion

illicitonion Jul 25, 2018

Contributor

Yeah, now that without_variant doesn't contrast with anything, this feels like a weird API...

This comment has been minimized.

@stuhood

stuhood Jul 25, 2018

Member

Agreed: I'll clean these up today.

use std::io;

use core::{Function, Key, TypeConstraint, TypeId, Value, ANY_TYPE};
use externs;
use selectors::{Get, Select};
use tasks::{Intrinsic, Task, Tasks};

type ParamTypes = BTreeSet<TypeId>;

This comment has been minimized.

@illicitonion

illicitonion Jul 20, 2018

Contributor

What's the big difference between this and the Params?

This comment has been minimized.

@stuhood

stuhood Jul 20, 2018

Member

The key difference is Params contains Keys (ie, actual python values). But also, in theory performance is less critical in this case because the vast majority of Entries should be created and destroyed during RuleGraph initialization.

@stuhood stuhood force-pushed the twitter:stuhood/subsingliant branch 2 times, most recently from 5378da5 to 438e607 Jul 25, 2018

@stuhood stuhood changed the title WIP: Merge subjects and variants into Params Merge subjects and variants into Params, and remove Noop Jul 25, 2018

@stuhood

This comment has been minimized.

Copy link
Member

stuhood commented Jul 25, 2018

This is now reviewable.

It contains a ton of TODOs that I'll either open issues for or resolve, and still has some (trivially, I think) failing tests. But it shouldn't need any fundamental changes.

The Compute used parameters... and Add "AggregationRule"... commits are highly overlapping, and should probably be reviewed together. The rest are relatively independent. Sorry for the big mess!

@illicitonion
Copy link
Contributor

illicitonion left a comment

Where are Aggregations actually used? Can you point at an example (which is tested)?

I'm not convinced all of the tests you're deleting should be deleted... Are some of those to be restored?

I think this could do with a decent prosaic description of what "kinds" of rules we support. Would you be able to put one together (or does one already exist)?

I also have some outstanding comments from my first pass which I think still stand.

entries: edges.entries_for(&select_key),
}
pub fn new(product: TypeConstraint, params: Params, edges: &rule_graph::RuleEdges) -> Select {
Self::new_with_selector(selectors::Select::without_variant(product), params, edges)

This comment has been minimized.

@illicitonion

illicitonion Jul 25, 2018

Contributor

Yeah, now that without_variant doesn't contrast with anything, this feels like a weird API...

///
fn gen_nodes(&self, context: &Context) -> Vec<NodeFuture<Value>> {
/// TODO: This could take `self` by value and avoid cloning.

This comment has been minimized.

@illicitonion

illicitonion Jul 25, 2018

Contributor

Sounds good

This comment has been minimized.

@stuhood

stuhood Jul 25, 2018

Member

Was planning to leave this TODO here long enough to land this, and then inline this method into Select, as the TODO near the end of the method suggests. Reasonable?

.core
.rule_graph
.edges_for_inner(&self.entry)
.expect("Expected edges to exist for Aggregation.");

This comment has been minimized.

@illicitonion

illicitonion Jul 25, 2018

Contributor

Include a debug output to say what aggregation it is?

This comment has been minimized.

@stuhood

stuhood Jul 25, 2018

Member

Will do. And will try to move most of edges_for_inner(..).expect calls into RuleGraph, as dependency edges not existing would represent a bug there rather than here.

let externs::Get(product, subject) = get;
let entries = context
.map(|externs::Get(product, subject)| {
// TODO: The subject of the get is a new parameter, but params from the context should be

This comment has been minimized.

@illicitonion

illicitonion Jul 25, 2018

Contributor

Definitely needs a ticket :)

This comment has been minimized.

@stuhood

stuhood Jul 25, 2018

Member

Agreed. I may do this before landing, but will open a ticket otherwise.

@@ -20,24 +24,27 @@ impl UnreachableError {
UnreachableError {
task_rule: task_rule,
diagnostic: Diagnostic {
subject_type: ANY_TYPE,
params: Default::default(),

This comment has been minimized.

@illicitonion

illicitonion Jul 25, 2018

Contributor

My pet peeve: ParamTypes::default() :)

This comment has been minimized.

@stuhood

stuhood Jul 25, 2018

Member

Will fix.

return Ok(ConstructGraphResult::Unfulfillable);
}
};
let simplified_entries_only = simplified_entries

This comment has been minimized.

@illicitonion

illicitonion Jul 25, 2018

Contributor

simplified_entries.keys().cloned().collect()?

This comment has been minimized.

@stuhood
let mut diagnostics = Vec::new();
for available_params in params_powerset {
let available_params = available_params.into_iter().collect();
// If a subset of these parameters is already satisfied, skip. This has the effect of

This comment has been minimized.

@illicitonion

illicitonion Jul 25, 2018

Contributor

I don't follow this... Is this required? Or an optimisation?

This comment has been minimized.

@stuhood

stuhood Jul 25, 2018

Member

It is required. This dedupes entries that use subsets of one-another's parameters. IE, if an @rule is satisfiable with an Address and Address+OtherThing, we prefer the subset. This also differentiates between the rule that would use only OtherThing and the rule that would use only Address.

But I think it needs unification with the logic in choose_dependency that does a similar thing... and choose_dependency likely needs some sort of unification with rhs. Will look into what can be deduped here before landing.

This comment has been minimized.

@illicitonion

illicitonion Sep 10, 2018

Contributor

Did you follow up on this? Did it turn out no unification could happen, or should there be a ticket to follow up?

This comment has been minimized.

@stuhood

stuhood Sep 13, 2018

Member

I followed up, and didn't see an obvious unification. It's possible that when I dive into the tests tomorrow I can tease apart a unification, but I'm not optimistic.

return Ok(None);
}
1 => {
combination.push((key, *chosen_entries.last().unwrap()));

This comment has been minimized.

@illicitonion

illicitonion Jul 25, 2018

Contributor

combination.push((key, *chosen_entries[0]));?

This comment has been minimized.

@stuhood

This comment has been minimized.

@illicitonion

illicitonion Sep 10, 2018

Contributor

TODO


// Prefer a Singleton, then a Param, then the non-ambiguous rule with the smallest set of
// input Params.
// TODO: We should likely prefer Rules to Params.

This comment has been minimized.

@illicitonion

illicitonion Jul 25, 2018

Contributor

Why should we prefer Rules to Params? In fact; why are these directly comparable at all? They feel like they're separate; what's an @rule signature with an example of a Params and Rules where we need to resolve between the two?

This comment has been minimized.

@stuhood

stuhood Jul 25, 2018

Member

Good question. This needs unification with rhs, but: the non-intuitive thing I discovered while implementing this is that in order to keep Node identities small (ie, containing the fewest Params possible), we want to prefer Entries that need fewer parameters. This biases toward avoiding propagating information from callers to callees unless they really need it to be executed (and avoids the case where absolutely anything used anywhere in the graph is potentially a parameter used at the root of the graph: that will continue to be bounded by RootRule).

In short: we would generally prefer a @rule to a Param, because depending on a rule does not affect our identity, while using a Param does (because it comes from the caller).

This comment has been minimized.

@illicitonion

illicitonion Sep 10, 2018

Contributor

Maybe add this comment as a code comment?

rules_by_kind
.entry(wd.simplified(BTreeSet::new()))
.and_modify(|e| {
// TODO: Param set size isn't sufficient to fully differentiate alternatives: would

This comment has been minimized.

@illicitonion

illicitonion Jul 25, 2018

Contributor

Why do we prefer larger not smaller?

This comment has been minimized.

@stuhood

stuhood Jul 25, 2018

Member

This comment is stale: this code is actually prefering the smaller set. As mentioned above, needs some unification.

@stuhood
Copy link
Member

stuhood left a comment

Where are Aggregations actually used? Can you point at an example (which is tested)?

They were used in (and necessary for) planners.py, but there was at least one more source of ambiguity in those rules (related to #4005), and I couldn't justify adding more features here.

I plan to heavily extend tests/python/pants_test/engine/test_rules.py today to cover use of AggregationRule, and the new graph ambiguity errors.

I'm not convinced all of the tests you're deleting should be deleted... Are some of those to be restored?

All of the deleted tests consumed planners.py, and about 50% of them were already skipped as we deleted expensive features that they depended on. While I think that there was only one additional feature needed in this diff (a resolution to #4005) to support them, the tests in test_rules.py and etc are significantly easier to maintain, while providing more useful examples of behaviour.

If we feel strongly that we should keep them, I think I'd need to dive in on #4005 to allow planners.py to continue to use the weird ducktyping it is using currently. But I believe that that would actually be redundant with #4535, which proposes to use a more principled approach to extension of a union type.

I think this could do with a decent prosaic description of what "kinds" of rules we support. Would you be able to put one together (or does one already exist)?

That exists in https://github.com/pantsbuild/pants/blob/master/src/python/pants/engine/README.md : I will definitely extend it here to account for AggregationRule.

I also fully expect to be able to remove SingletonRule as the UX for Params is fleshed out (since a singleton is simply a zero-Param @rule).

/// as sorted string tuples.
/// Params represent a TypeId->Key map.
///
/// For efficiency and hashability, they're stored as sorted Keys (with distinct TypeIds), and

This comment has been minimized.

@stuhood

stuhood Jul 25, 2018

Member

If we were to use a BTreeSet, we'd still need to verify unique typeids, so would need a wrapper around Key to redefine Ord in terms of typeids.

It's possible that the Arc is overkill here though (since Key is so small).

impl Params {
pub fn new(mut params: Vec<Key>) -> Params {
params.sort_by(|l, r| l.type_id().cmp(r.type_id()));
params.dedup_by(|l, r| l.type_id() == r.type_id());

This comment has been minimized.

@stuhood

stuhood Jul 25, 2018

Member

Agreed. Will add.

entries: edges.entries_for(&select_key),
}
pub fn new(product: TypeConstraint, params: Params, edges: &rule_graph::RuleEdges) -> Select {
Self::new_with_selector(selectors::Select::without_variant(product), params, edges)

This comment has been minimized.

@stuhood

stuhood Jul 25, 2018

Member

Agreed: I'll clean these up today.

///
fn gen_nodes(&self, context: &Context) -> Vec<NodeFuture<Value>> {
/// TODO: This could take `self` by value and avoid cloning.

This comment has been minimized.

@stuhood

stuhood Jul 25, 2018

Member

Was planning to leave this TODO here long enough to land this, and then inline this method into Select, as the TODO near the end of the method suggests. Reasonable?

.core
.rule_graph
.edges_for_inner(&self.entry)
.expect("Expected edges to exist for Aggregation.");

This comment has been minimized.

@stuhood

stuhood Jul 25, 2018

Member

Will do. And will try to move most of edges_for_inner(..).expect calls into RuleGraph, as dependency edges not existing would represent a bug there rather than here.

let mut diagnostics = Vec::new();
for available_params in params_powerset {
let available_params = available_params.into_iter().collect();
// If a subset of these parameters is already satisfied, skip. This has the effect of

This comment has been minimized.

@stuhood

stuhood Jul 25, 2018

Member

It is required. This dedupes entries that use subsets of one-another's parameters. IE, if an @rule is satisfiable with an Address and Address+OtherThing, we prefer the subset. This also differentiates between the rule that would use only OtherThing and the rule that would use only Address.

But I think it needs unification with the logic in choose_dependency that does a similar thing... and choose_dependency likely needs some sort of unification with rhs. Will look into what can be deduped here before landing.

return Ok(None);
}
1 => {
combination.push((key, *chosen_entries.last().unwrap()));

This comment has been minimized.

@stuhood

// Prefer a Singleton, then a Param, then the non-ambiguous rule with the smallest set of
// input Params.
// TODO: We should likely prefer Rules to Params.

This comment has been minimized.

@stuhood

stuhood Jul 25, 2018

Member

Good question. This needs unification with rhs, but: the non-intuitive thing I discovered while implementing this is that in order to keep Node identities small (ie, containing the fewest Params possible), we want to prefer Entries that need fewer parameters. This biases toward avoiding propagating information from callers to callees unless they really need it to be executed (and avoids the case where absolutely anything used anywhere in the graph is potentially a parameter used at the root of the graph: that will continue to be bounded by RootRule).

In short: we would generally prefer a @rule to a Param, because depending on a rule does not affect our identity, while using a Param does (because it comes from the caller).

rules_by_kind
.entry(wd.simplified(BTreeSet::new()))
.and_modify(|e| {
// TODO: Param set size isn't sufficient to fully differentiate alternatives: would

This comment has been minimized.

@stuhood

stuhood Jul 25, 2018

Member

This comment is stale: this code is actually prefering the smaller set. As mentioned above, needs some unification.

@@ -517,9 +974,9 @@ impl RuleGraph {
}

pub fn find_root_edges(&self, subject_type: TypeId, select: Select) -> Option<RuleEdges> {
// TODO return Result instead
// TODO: Support more than one root parameter... needs some API work.

This comment has been minimized.

@stuhood

stuhood Jul 25, 2018

Member

Will open a followup ticket for multiple-parameter-Get UX.

@baroquebobcat
Copy link
Contributor

baroquebobcat left a comment

Did a first pass at this. Most of my comments are for small nitpicks.

I like this change quite a bit. I think that, conceptually, param is a much easier thing to explain than subject and variant. And, I think param aligns better with what those things were used for.

"""A rule that receives the results of all other rules for a product to aggregate them.
An AggregationRule supports composability of @rules by allowing additional rules to be installed
to provide some type without removing or otherwise modifying the installed rules.

This comment has been minimized.

@baroquebobcat

baroquebobcat Jul 26, 2018

Contributor

instead of to provide some type, maybe that provide the same type?

// The Entry was satisfiable without waiting for any additional nodes to be satisfied. The result
// contains copies of the input Entry for each set subset of the parameters that satisfy it.
Fulfilled(Vec<EntryWithDeps>),
// The Entry was not satifiable with installed rules.

This comment has been minimized.

@baroquebobcat

baroquebobcat Jul 26, 2018

Contributor

nit: sp satisfiable

pub fn sub_graph(&self, subject_type: TypeId, product_type: &TypeConstraint) -> RuleGraph {
if let Some(beginning_root) = self.gen_root_entry(subject_type, product_type) {
self._construct_graph(vec![beginning_root])
pub fn sub_graph(&self, subject_type: &TypeId, product_type: &TypeConstraint) -> RuleGraph {

This comment has been minimized.

@baroquebobcat

baroquebobcat Jul 26, 2018

Contributor

nit: rename subject_type to param_type?

@ity
Copy link
Member

ity left a comment

reading through this was a learning experience for me! thank you for the absolutely useful changes with a lot of complexity removed. And, would love to see smaller/more digestible changes in the future :)

@stuhood

This comment has been minimized.

Copy link
Member

stuhood commented Sep 10, 2018

I've rebased this and applied review feedback: only the topmost commit contains anything new.

Having thought a bit more about what rule graph extensibility should look like, I decided to rebase AggregationRule out of this branch, and to stash it so that it can be considered in the context of implementing #4535 (which needs something that is shaped slightly differently).

I'd like to clean up the tests and land this tomorrow if reviewers agree.

@illicitonion
Copy link
Contributor

illicitonion left a comment

Handful of todos left to go, then let's get this thing merged :D

@stuhood stuhood force-pushed the twitter:stuhood/subsingliant branch from ac9c002 to ae9a71c Sep 19, 2018

@stuhood stuhood force-pushed the twitter:stuhood/subsingliant branch from ae9a71c to 3eb21d1 Sep 20, 2018

@stuhood stuhood merged commit c99cb60 into pantsbuild:master Sep 20, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@stuhood stuhood deleted the twitter:stuhood/subsingliant branch Sep 20, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment