Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert usages of TypeConstraint to TypeId for rule products in the engine #7114

Merged
merged 10 commits into from Mar 6, 2019

Conversation

Projects
None yet
2 participants
@cosmicexplorer
Copy link
Contributor

commented Jan 19, 2019

Problem

See #4535 and #4005, in particular this comment on #4535. TypeConstraint is a pretty general construct that we would like to do more with, for example #6936, and as of the current discussion in #4535 we realize we can emulate union types in @rules without it, matching just against a specific type.

Solution

  • Convert output_constraint in the Rule subclasses in rules.py into output_type, and add a SubclassesOf(type) type check in datatype fields in those classes to ensure this.
  • Remove satisfied_by() and satisfied_by_type() externs, and add a product_type() extern used to intern a python type as a TypeId.
  • Convert all TypeConstraints passed to the engine for intrinsics (e.g. Snapshot) into TypeIds.
  • Match whether a rule's result matches its declared output type by simply using ==!
  • Manually implement fmt::Debug for TypeId to be the same as Display (we may want to do these differently in the future, but it is very useful to see the actual type name when debugging).

Result

Everything is the same, but now we don't have the additional complexity of TypeConstraint down in the engine when we can do simple TypeId equality. This lets us get more creative with TypeConstraint on the python side, while type checking on the rust side is a little less complex (and probably more efficient to some degree).

@cosmicexplorer cosmicexplorer requested review from stuhood and illicitonion Jan 19, 2019

@cosmicexplorer cosmicexplorer changed the title convert usages of TypeConstraint to TypeId for rule products in the engine [WIP] convert usages of TypeConstraint to TypeId for rule products in the engine Jan 20, 2019

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor Author

commented Jan 20, 2019

I'm marking this WIP because I'm seeing failures locally induced by things in the native backend not being tuples (among other things) and would love to get #7115 in first so I can avoid that mishap. The code here works. One thing I noticed is that I would get:

# ...
From cffi callback <bound method _FFISpecification.extern_identify of <pants.engine.native._FFISpecification object at 0x7f66ef26c190>>:
Traceback (most recent call last):
  File "/home/cosmicexplorer/tools/pants/src/python/pants/engine/native.py", line 296, in extern_identify
    hash_ = hash(obj)
  File "/home/cosmicexplorer/tools/pants/src/python/pants/util/objects.py", line 108, in __hash__
    return super(DataType, self).__hash__()
  File "/home/cosmicexplorer/tools/pants/src/python/pants/util/objects.py", line 108, in __hash__
    return super(DataType, self).__hash__()
TypeError: unhashable type: 'list'
# ...
Exception message: 1 Exception encountered:
Computing Select(ToolchainVariantRequest(toolchain<=NativeToolchain>=<pants.backend.native.subsystems.native_toolchain.NativeToolchain object at 0x7c33821fd390>, variant<=ToolchainVariant>=ToolchainVariant(descriptor=llvm)), CToolchain)
  Computing Task(select_c_toolchain, ToolchainVariantRequest(toolchain<=NativeToolchain>=<pants.backend.native.subsystems.native_toolchain.NativeToolchain object at 0x7c33821fd390>, variant<=ToolchainVariant>=ToolchainVariant(descriptor=llvm)), CToolchain, true)
    Computing Task(select_llvm_c_toolchain, <pants.backend.native.subsystems.native_toolchain.NativeToolchain object at 0x7c33821fd390>, LLVMCToolchain, true)
      Computing Task(select_llvm_linker, <pants.backend.native.subsystems.native_toolchain.NativeToolchain object at 0x7c33821fd390>, LLVMLinker, true)
        Throw(Function(Key { id: 26, type_id: function }) returned a result value LLVMLinker(linker<=Linker>=Linker(path_entries=[u'/home/cosmicexplorer/.cache/pants/bin/binutils/linux/x86_64/2.30/binutils/bin'], exe_filename=ld, library_dirs=[], linking_library_dirs=[], extra_args=[], extra_object_files=[])) (type Any) that did not satisfy its constraints: LLVMLinker)
          Traceback (no traceback):
            <pants native internals>
          Exception: Function(Key { id: 26, type_id: function }) returned a result value LLVMLinker(linker<=Linker>=Linker(path_entries=[u'/home/cosmicexplorer/.cache/pants/bin/binutils/linux/x86_64/2.30/binutils/bin'], exe_filename=ld, library_dirs=[], linking_library_dirs=[], extra_args=[], extra_object_files=[])) (type Any) that did not satisfy its constraints: LLVMLinker

So I believe the fact that we are running these datatypes from the native backend through the extern_identify() function, and because they contain lists, they can't be hashed, and that never came up before because we were using the now-removed extern_satisfied_by() method, which didn't try to hash its input. So this would be a lot easier to get in after #7115 makes it easier to tell datatypes to convert fields into tuples.

@stuhood

This comment has been minimized.

Copy link
Member

commented Jan 24, 2019

So I believe the fact that we are running these datatypes from the native backend through the extern_identify() function, and because they contain lists, they can't be hashed, and that never came up before because we were using the now-removed extern_satisfied_by() method, which didn't try to hash its input. So this would be a lot easier to get in after #7115 makes it easier to tell datatypes to convert fields into tuples.

Hm... it should not be necessary to hash the return value of an @rule. I expect that the failing call shown in that stack should be converted into a call to externs::satisfied_by, which doesn't need to make a Key out of the Value in order to confirm that it satisfies a type.

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor Author

commented Jan 28, 2019

I expect that the failing call shown in that stack should be converted into a call to externs::satisfied_by

Yes, that is the issue -- externs::satisfied_by was removed in this PR in order to convert TypeConstraint to TypeId. I believe this is the correct move, exchanging the ability to check for types satisfying constraints without hashing for a much simpler type checking model which runs much less python code to check whether a constraint is satisfied. If this is blocking other PRs then making datatypes hashable isn't the most difficult thing to do, I just thought it was a good excuse to work on something else for a bit.

@stuhood
Copy link
Member

left a comment

I expect that the failing call shown in that stack should be converted into a call to externs::satisfied_by

Yes, that is the issue -- externs::satisfied_by was removed in this PR in order to convert TypeConstraint to TypeId. I believe this is the correct move, exchanging the ability to check for types satisfying constraints without hashing for a much simpler type checking model which runs much less python code to check whether a constraint is satisfied.

So, assuming that this PR were to swap externs::satisfied_by from taking a TypeConstraint to taking a TypeId (and if it were renamed to externs::type_equals, or something), it should be an equivalent amount of python code running... possibly less.

Also, note that there is a dict of interned types on the python side that doesn't appear to have been changed here: https://github.com/pantsbuild/pants/blob/master/src/python/pants/engine/native.py#L499-L500.


Having said that, since externs::product_type doesn't hash the object instance, I don't see where any additional hashing of object instances is being introduced, so it's not clear why LLVMLinker would go from not-hashed to hashed.

So perhaps there is some accidental additional hashing happening here? If so, it would be good to remove the additional hashing rather than blocking this on #7115.

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor Author

commented Jan 28, 2019

Having said that, since externs::product_type doesn't hash the object instance, I don't see where any additional hashing of object instances is being introduced, so it's not clear why LLVMLinker would go from not-hashed to hashed.
So perhaps there is some accidental additional hashing happening here? If so, it would be good to remove the additional hashing rather than blocking this on #7115.

Thank you for diving into this! That seems correct.

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:remove-type-constraints branch from 54051b7 to 95de1ba Mar 3, 2019

@cosmicexplorer cosmicexplorer changed the title [WIP] convert usages of TypeConstraint to TypeId for rule products in the engine convert usages of TypeConstraint to TypeId for rule products in the engine Mar 3, 2019

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor Author

commented Mar 3, 2019

I tried to repro the errors I was seeing earlier, but they are currently not happening locally (and I didn't make anything else into tuples in the meantime, so ???). Once I can get this through Travis this should be reviewable.

@stuhood

This comment has been minimized.

Copy link
Member

commented Mar 3, 2019

Very excited for this one! Thanks.

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor Author

commented Mar 3, 2019

We are seeing the error again, e.g. in this travis run -- will refer to the discussion above to diagnose and fix.

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor Author

commented Mar 3, 2019

@stuhood The error was because like you mentioned above, externs::identify is overloaded to return both type and hash, and we only need the type when we check that the return type is equal to the stated product type. The most recent commit should fix this.

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:remove-type-constraints branch from fd2f4bd to 1cc87cb Mar 4, 2019

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor Author

commented Mar 4, 2019

@stuhood Green means go!

@stuhood

stuhood approved these changes Mar 4, 2019

Copy link
Member

left a comment

Thanks!

Some optional tweaks to the FFI boundary.

///
#[repr(C)]
pub struct Ident {
pub struct ProductTypeId {
pub hash: i64,

This comment has been minimized.

Copy link
@stuhood

stuhood Mar 4, 2019

Member

It probably isn't strictly necessary to have a hash on ProductTypeId... it should be ok to just use the derived hashcode (based on hashing the TypeId, which is distinct).

And given that, I'm not sure that ProductTypeId needs to exist... could just be TypeId?

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Mar 5, 2019

Author Contributor

Was able to convert this to just TypeId by hashing it manually -- let me know if the PRODUCT_TYPE_ID_HASH_BUILDER lazy static in interns.rs looks right: https://github.com/pantsbuild/pants/pull/7114/files#diff-8ab5c4db2ca5c0325da0cb5aecefaff7R43

This comment has been minimized.

Copy link
@stuhood

stuhood Mar 6, 2019

Member

I meant just #[derive(Hash)], I think. But I might have been missing something. Regardless, not worth another CI burn.

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Mar 6, 2019

Author Contributor

Oh...well TypeId already derives Hash? This was just my best attempt at turning the TypeId into an i64 for the InternKey (I was actually extremely confused about this part and almost messaged the engine channel), and since we were already using the FNV hash it seemed reasonable to use again. I'm not sure if the as i64 affects any of the hashing properties, or if there's an easier way to get an i64 from an instance of a hashable type.

@@ -43,14 +43,32 @@ impl Interns {
Interns::default()
}

pub fn insert_product(&mut self, v: Value) -> Key {

This comment has been minimized.

Copy link
@stuhood

stuhood Mar 4, 2019

Member

Does this need to be a separate method, or could just using insert work? AFAIK, the type(some_type) should just be type, so calling insert should work.

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Mar 5, 2019

Author Contributor

This was added in a hurry when I didn't know why identify() was causing errors, this edit sounds right.

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Mar 5, 2019

Author Contributor

Well, we do want to make sure the type is stored in self._types in native.py, right? Are you thinking of using insert without removing extern_product_type() somehow?

This comment has been minimized.

Copy link
@stuhood

stuhood Mar 5, 2019

Member

Hm. Should the generator code (in https://github.com/pantsbuild/pants/blob/3a81e86eca33bcd16ca68838052ffeb25fdcc803/src/python/pants/engine/native.py#L417-450) memoize these in self._types before returning them?

We have TypeIdBuffer, although it doesn't look like it's being used anywhere:

// Points to an array of TypeIds.
#[repr(C)]
pub struct TypeIdBuffer {
ids_ptr: *mut TypeId,
ids_len: u64,
// A Handle to hold the underlying array alive.
handle_: Handle,
}
impl TypeIdBuffer {
pub fn to_vec(&self) -> Vec<TypeId> {
with_vec(self.ids_ptr, self.ids_len as usize, |vec| vec.clone())
}
}

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Mar 6, 2019

Author Contributor

I was also thinking that we could add another argument to extern_identify() which says to consider the handle as a type and intern that instead of taking the type() of the handle.

But I think the solution you're suggesting here with memoizing in extern_generator_send() sounds like we would end up being able to remove all of the ffi calls in interning.rs? Because that sounds pretty neat.

This comment has been minimized.

Copy link
@stuhood

stuhood Mar 6, 2019

Member

I've pulled this thread in another PR: should have it out in a few minutes.

This comment has been minimized.

Copy link
@stuhood

stuhood Mar 6, 2019

Member

#7318

Regarding completely eliminating the need to call back to python in interning.rs: I think that you could almost do it, but we'd probably need to make the PyGeneratorResponse type more complicated, because we only hash the keys in Gets, and not in Breaks.

Show resolved Hide resolved src/rust/engine/src/interning.rs Outdated
Show resolved Hide resolved src/rust/engine/src/lib.rs Outdated
@@ -812,6 +807,8 @@ impl Task {
.expect("edges for task exist.")
.entry_for(&select_key)
.unwrap_or_else(|| {
// TODO: convert this to a Result, as well as all other panic!()s and .expects() that
// assert graph logic which is not clearly going to be correct in all future cases.

This comment has been minimized.

Copy link
@stuhood

stuhood Mar 4, 2019

Member

While I agree in general, the fix here is probably to improve the RuleGraph's enums and types to make these operations infalliable, rather than turning this into a user-visible error. A failure here would be a bug in the RuleGraph, rather than in a user's rules.

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Mar 5, 2019

Author Contributor

This sounds good, will edit the comment to match.

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Mar 5, 2019

Author Contributor

Turned into a comment about returning enums to avoid flattening failure modes, let me know if that captures the intent.

Show resolved Hide resolved src/rust/engine/src/nodes.rs Outdated
Show resolved Hide resolved src/rust/engine/src/rule_graph.rs Outdated

class RuleIndex(datatype(['rules', 'roots'])):
"""Holds a normalized index of Rules used to instantiate Nodes."""

@classmethod
def create(cls, rule_entries):
"""Creates a RuleIndex with tasks indexed by their output type."""
# TODO: make a defaultdict-like wrapper for OrderedDict (and other types?)!

This comment has been minimized.

Copy link
@stuhood

stuhood Mar 4, 2019

Member

Sortof off-topic I think.

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Mar 5, 2019

Author Contributor

Ended up making #7311 and moving it down to the part where we index into an OrderedDict since if we do more stuff like in #7116 we probably do want to do something like that.

cosmicexplorer added some commits Jan 19, 2019

convert usages of TypeConstraint to TypeId for rule products
- remove `extern_satisfied_by()`
- remove `extern_satisfied_by_type()`
- add `extern_product_type()`

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:remove-type-constraints branch from 12a5152 to 82e04cc Mar 6, 2019

@stuhood

stuhood approved these changes Mar 6, 2019

Copy link
Member

left a comment

Thanks!

@stuhood stuhood merged commit 27a13d4 into pantsbuild:master Mar 6, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

stuhood added a commit that referenced this pull request Mar 6, 2019

Have extern::generator_send directly produce TypeIds for product types (
#7318)

### Problem

#7114 set us up to use lighter-weight identification of product types in `externs::generator_send`. We can additionally avoid calling back to python to memoize product types by having methods return `TypeIds` rather than `Values` representing types. 

### Solution

Rather than a `ValueBuffer` containing type instances, memoize the types and return a `TypeIdBuffer`.

### Result

Fewer calls back to python.

stuhood added a commit that referenced this pull request Apr 10, 2019

Improve Get errors for unhashable or mismatched types (#7502)
### Problem

The error for unhashable types used as `Get` params is unfortunate, because the call to `identify` fails and returns a null pointer to rust, which triggers an error... somewhere else. See #7499.

### Solution

As suggested in #7499, have `extern_generator_send` directly `identify` `Get` params, meaning that failures to hash take advantage of the fact that `extern_generator_send` is already fallible. This avoids a significant number of calls back to python to identify `Values` (an optimization discussed in #7318 and #7114).

### Result

Improved usability. Fixes #7499.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.