Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a DSL for generating layouts for the object model #40

Merged
merged 66 commits into from
Mar 4, 2016
Merged

Add a DSL for generating layouts for the object model #40

merged 66 commits into from
Mar 4, 2016

Conversation

chrisseaton
Copy link
Contributor

Ruby has a relatively Java-style class hierarchy, and many complex built-in classes which we implement in Java and need many fields to store state. The OM is more suited to simpler prototypical objects with few internal properties, so we found it a little verbose to use in JRuby. This DSL allows the declaration of Java-style classes, but backed using the OM. In JRuby this made it possible to move to an OM-only object style, where previously we had our own classes for internal fields and so two objects for every Ruby object (the OM object and the instance of our class).

This PR isn't ready for merge yet as the implementation of the generator is not yet high enough quality, but I'd like a review of the API and its semantics before I spend time refining the implementation, so please take a look at the com.oracle.truffle.api.object.dsl and com.oracle.truffle.api.object.dsl.test packages. Please ignore the implementation in the com.oracle.truffle.api.object.dsl.generator packages.

The API already works very well for JRuby.

@chumer @jtulach please review for API, and @woess please check I'm using the OM correctly. Please don't merge (unless you're happy with the poor implementation of the generator).

stream.printf(" return ((AtomicReference<%s>) %s_PROPERTY.get(object, true)).get();%n", property.getType(), NameUtils.identifierToConstant(property.getName()));
}
} else {
stream.printf(" return (%s) %s_PROPERTY.get(object, true);%n", property.getType(), NameUtils.identifierToConstant(property.getName()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition should not be true, but probably is%s(object), where %s is layout.getName(). (applies to all the gets)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume ideally it would be a boolean data edge from the earlier is%s(object) call that the user is presumably doing? But we don't have access to that in the DSL.

Can you tell me about the difference between doing PROPERTY.get(object, true), PROPERTY.get(object, false) and PROPERTY.get(object, isFoo(object)), and PROPERTY.get(object, object.getShape())? I think there are some subtleties here that I may not be aware of.

Do we expect Graal to be able to GVN an earlier call to isFoo(object) with one that I do in my DSL so that there should be no cost if I do it again later?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but Graal can GVN to an earlier isFoo guard.

PROPERTY.get(object, condition):

  • false: fixed in control flow, always safe
  • true: floating unguarded (unsafe)
  • isFoo(object) or result of an earlier guard: dependency edge on the guard (GVN-able)
  • shape: equivalent to object.getShape() == shape

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@chumer
Copy link
Member

chumer commented Feb 9, 2016

In general this is a good idea.

One problem of this approach is that layout implementations cannot speculate because they are not implemented in the context of a Node. Or do I miss something?

For the object model accesses I think I'd prefer standardized but still flexible nodes that languages reuse to perform certain operations like reading writing, iterating.

I had in mind some abstract datatype pattern for OM, similar to interop but simpler, that lets you create nodes instead of calling methods directly. We could still use a DSL to generate the node constructors invocations, but a bit different like this:

   public final class BasicTestLayoutImpl implements BaseTestLayout {

        AllocateBasicTestNode createBasicTest(int value);

        public PropertyGetNode getValue(DynamicObject object) {
            return PropertyGetNode.create(object, "value")
       ]

        public PropertySetNode setValue(DynamicObject object, int value) {
           return PropertySetNode.create(object, "value")
      }

      public abstract static class AllocateBasicTestNode extends DynamicObjectConstrutor {
            @Child PropertySetNode setValue;

            public DynamicObject allocate(int value) {
               DynamicObject object = super.allocate();
               setValue.executeSet(object, value);
               return object;
            }
       }   


    }

      public abstract static class PropertyGetNode extends Node {

         public abstract Object executeGet(DynamicObject object)
          ...
       }   

public abstract static class PropertySetNode extends Node {

         public abstract Object executeSet(DynamicObject object, Object value)
          ...
       }   

@chrisseaton What do you think?

@chrisseaton
Copy link
Contributor Author

The main motivation here is for something that works without nodes and doesn't need to speculate. I could just store all my implementation fields (by which I mean things like the byte[] for a String - things the user never sees but I need to store in objects) like normal Ruby instance variables but with special names, and use normal instance variable field nodes to read and write them. That would have two problems.

First of all, it would mean an explosion of nodes and code. All the places where I currently get the size of an array, or its storage field, or the bytes in a string, go from being one line to being at least another @Cached parameter.

Imagine this code using nodes instead of Layouts.ARRAY.get...: https://github.com/jruby/jruby/blob/c2787507d3f07ee9c847f6b89cf2174fc5241a56/truffle/src/main/java/org/jruby/truffle/core/array/ArrayAppendManyNode.java#L142-L158. That's not even a @Specialization, so the nodes would either have to be fields (more code) or passed in as parameters (yet more code).

This one method alone would need nine new nodes (or one slow, polymorphic one) https://github.com/jruby/jruby/blob/ff9e6968d47127a0f5bc4beccb032b9f75074e6a/truffle/src/main/java/org/jruby/truffle/core/proc/ProcNodes.java#L141-L149. If I have lots of fields why not store them in a single object? Because then that's another object to allocate.

Secondly, I can't execute nodes in code behind a @TruffleBoundary. Imagine have to somehow hoist code like this out from behind a boundary in order to be able to execute a property read node: https://github.com/jruby/jruby/blob/28a97d92915a90edf71255d0985c2b6d7eaaaba9/truffle/src/main/java/org/jruby/truffle/core/encoding/EncodingNodes.java#L193-L193. I'd have to split the method into two - one that did property reads, and other that was the boundary method. I'm not sure that would be practical in many cases.

Graal.js already uses the same approach as this DSL, just manually written out (see the implementation of JSString and the Property members). Graal.js would also have the same problems with the approach you are suggesting - such as property reads behind boundaries.

So I don't think using nodes for implementation fields is tractable.

What's more, I believe that my current implementation wouldn't even benefit from any speculation, as everything is as static as the OM allows and the code should compile to not much more than a Java field or array access. Maybe @woess can confirm or deny?

@nirvdrum
Copy link
Contributor

nirvdrum commented Feb 9, 2016

I'm perhaps biased by having used this in JRuby, but I'd like to be able to access these fields easily without having to rely on passing nodes around. From what I've seen, the static helpers compile down to simple field loads. I don't know if doing that in a node would allow speculative optimizations, but we're mostly representing mutable state in a guest language object . . . it seems unlikely that we'll see values that are constant enough to even speculate on. JRuby has a handful of these (e.g., a String's encoding), but we always end up using them as guards in specializations.

Having said that, if the node approach can make things faster I don't think there's a problem in also generating helper nodes. I don't see the two approaches as being at odds with one another.

private static final BaseLayout BASE_LAYOUT = BaseLayoutImpl.INSTANCE;
private static final MiddleLayout MIDDLE_LAYOUT = MiddleLayoutImpl.INSTANCE;
private static final TopLayout TOP_LAYOUT = TopLayoutImpl.INSTANCE;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like all the implementations in one package to be generated into single class, with a default name, for example Layouts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean just the instance fields, so it would be Layouts.BASE or something like that? I could do that. Why do you prefer it though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this and can't see any way to implement it correctly. I don't have any control over when my processing rounds are run, or when new source files with @Layout annotations in them may be generated by other processors that I don't know about, so I can't see any way to generate one file for all layouts in a package at the same time.

Do you know a way to make it possible?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I know. One always needs to process the whole package. I thought I even wrote a blog post about that, but I cannot find it. I used the code at http://hg.netbeans.org/html4j/file/ed4b25eb66f3/boot/src/main/java/org/netbeans/html/boot/impl/JavaScriptProcesor.java which also needs to generate single $JsCallbacks$ file per package.

@chumer
Copy link
Member

chumer commented Feb 9, 2016

@chrisseaton There is a solution for property reads (or nodes) behind boundaries in FastR
https://github.com/graalvm/fastr/blob/master/com.oracle.truffle.r.nodes/src/com/oracle/truffle/r/nodes/access/vector/ExtractVectorNode.java#L147-L172

Regarding:
https://github.com/jruby/jruby/blob/c2787507d3f07ee9c847f6b89cf2174fc5241a56/truffle/src/main/java/org/jruby/truffle/core/array/ArrayAppendManyNode.java#L142-L158

This is a case where you redo the caching in the operation instead of the property accesses. This way you could probably express this code with a generic cache without most of the specializations. I can offer doing a sketch if needed.

BTW.: As a rule of thumb: Many specializations that all forward to the same method indicates that a generic cache might be useful.

Having nodes for property accesses might be useful also for instrumenting the AST (eg instrument all object writes).

Another interesting thing is that without nodes you cannot speculate on constant fields. We might want to decide to do such kind of optimizations in Ruby in the future.

Regarding footprint: You can reuse the standard single read and write node implementation for property accesses. You can and probably should preinitialize the node with the specialization for the shape and attribute on creation. So the call to the node can become monomorph in the interpreter. The class can even be shared across language implementations. Additional nodes need to be allocated in this case which is a clear downside with the nodes approach.

So I don't think using nodes for implementation fields is tractable.

To sum up:

  • On the use side its an additional child field or @cached attribute for the property access nodes.
  • You can read a property from any object without type check which enables more generic implementations of some operations closer to Ruby code.
  • Property accesses become instrumentable
  • You can use TruffleBoundaryNodes to access files behind boundaries.
  • Yes nodes need to be allocated and require additional space.
  • Using generic field access nodes hides more details about the OM. No need to know about shapes at all.

I think the most important issue is that we can hide bigger parts of the OM implementation.

You have good points, but I think regarding future evolution generic attribute access nodes might be a better choice. I also don't agree with the code that is in graal.js in that regard.

@chrisseaton
Copy link
Contributor Author

If the proposed benefits of your approach, such as speculative constants and instrumentation, are nice-to-have rather than nessecary immediately for correctness or performance, then maybe the DSL could provide my static methods now and in the future, but add node getters and setters later on? Then we could transition code gradually.

You could get a FooLayoutImpl.INSTANCE.getBar() and a FooLayoutImpl.BarGetterNode.

@chumer
Copy link
Member

chumer commented Feb 9, 2016

Yes that could be a way forward. We should look on the R side if we can use this as well. I will have another look at the implementation tomorrow.

@woess
Copy link
Member

woess commented Feb 9, 2016

maybe the DSL could provide my static methods now and in the future, but add node getters and setters later on

I think that's a good compromise. Currently, there does not seem to be much benefit in channeling all accesses through nodes.

@woess
Copy link
Member

woess commented Feb 9, 2016

What's more, I believe that my current implementation wouldn't even benefit from any speculation, as everything is as static as the OM allows and the code should compile to not much more than a Java field or array access.

I agree, although type checks could benefit from specialization.

What I like most about the DSL is implementation detail hiding and less boilerplate code.

@woess
Copy link
Member

woess commented Feb 9, 2016

@chrisseaton I see that you have commented code in various places. We should try to avoid that.

@chrisseaton
Copy link
Contributor Author

Yes I'll re-do the implementation (where the comments are) from scratch with CodeTree.

*/
@Target(ElementType.PARAMETER)
@Retention(RetentionPolicy.CLASS)
public @interface Volatile {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retention "class"? That's surprising. There is an annotation processor, so "source" ought to be enough, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, looks like it works as SOURCE (same for Nullable and Layout).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@mlvdv
Copy link
Contributor

mlvdv commented Feb 9, 2016

I vote for any approach that allows straightforward instrumentation of all assignments. This would permit a reasonable implementation of debugger watchpoints, for example, with the same performance advantages that we have for breakpoints. Thomas asked me once, quite a while back, whether we could ever do "value tracing"; I hadn't thought about it since then, but this would also be the most practical way to approach that as well. This would be a pretty impressive capability of the Truffle platform.

@chrisseaton
Copy link
Contributor Author

I vote for any approach that allows straightforward instrumentation of all assignments.

But these aren't source-level assignments. They're internal, implementation fields. They aren't visible to the person writing a Ruby program. It's like wanting to set a breakpoint on the header word in a Java object. That may be interesting to us as VM implementors, but it's a failure of encapsulation otherwise.

@chrisseaton
Copy link
Contributor Author

The snippets class has been moved to top-level and package-private at the end of the file containing the Layout interface.

Final opinions, please. I'll merge at close of play tomorrow unless there are objections.

}

for (AnnotationMirror annotationMirror : layoutElement.getAnnotationMirrors()) {
if (annotationMirror.getAnnotationType().toString().equals(Layout.class.getCanonicalName())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are lucky that this works. toString is undefined for TypeMirror and in fact works different between ECJ and the eclipse compiler. See Truffle DSL ElementUtils where all the bad hacks including getQualifiedName(TypeMirror) is hidden.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing this. What is the correct way to get a Name as a String? toString isn't specifically defined there either.

@chumer
Copy link
Member

chumer commented Mar 2, 2016

CH1) I'd like to see the hardcoded addAllowedImplicitCast somehow fixed (no good idea how though).

CH2) Also I don't feel very good with TypeMirror#toString uses in the generator. Maybe you can copy getQualifiedName(TypeMirror) from ElementUtils?

CH3) (can be done later as well) We introduce a new distribution for this DSL that needs to be configured and distributed. Maybe we should put this processor into the TRUFFLE_DSL_PROCESSOR distribution as well? So we always have one truffle-api jar and one processor jar in the end. For a guest language developer granularity this fine does not make much sense. @chrisseaton What do you think?

CH4) (definitely for later) recast the generator with CodeTreeBuilder. Maybe we should track this with an issue?

Besides these small issues, I think this is a significant simplification compared to using the object model API directly.
If the issues are resolved Accept.

@chrisseaton
Copy link
Contributor Author

CH1 resolved in df3c888ca86f5d1adc34e4dec941488992243d39

CH2 resolved in 90424f3

@jtulach
Copy link
Contributor

jtulach commented Mar 3, 2016

Re. CH3 - I must have been blind to not notice this myself. The vision so far was to have one truffle-api.jar and one truffle-dsl-processor.jar - originally the DSL, but also all interop, TruffleLanguage and co. processors are in the second one. I believe there is no need to increase the number of distributions for Truffle - two is enough. Please include the new processor into the existing truffle-dsl-processor.jar.

Btw. should the new processor be in a separate JAR, we need to control imported/exported API between the new processor and existing truffle-dsl-processor.jar as they can be distributed separately and it could happen one is using 0.11 of truffle-dsl-processor.jar with 0.12 of the new one, or even some wilder combinations. That is a lot of headache which we can avoid by packaging all the processors into one distribution.

@woess
Copy link
Member

woess commented Mar 3, 2016

I feel like it's in a good shape and ready for inclusion.
Two more comments, though:

There seems to be a problem with generating the factory method (create*) for the (Inherited)ShapePropertiesTest when the processor is run from eclipse (4.4).

Do you think it could make sense to create the *LayoutImpl with optionally, say, a Layout parameter in the future? e.g.,
CAST_LAYOUT = CastLayoutImpl.INSTANCE
could then potentially be:
CAST_LAYOUT = CastLayoutImpl.create(Layout.newLayout().addAllowedImplicitCast(IntToLong).build())
I don't really see a need for it right now (your solution for CH1 is fine), but I'm thinking that maybe we should still use an instantiation method instead of a field from an API flexibility perspective. Initially it would simply return INSTANCE (private field). We could then more easily change it later to not return a singleton. Feel free to ignore or defer it, though.

@chrisseaton
Copy link
Contributor Author

CH3 resolved in 8967b83.

Can't test with JRuby until the iterate frames discussion is resolved.

@chumer
Copy link
Member

chumer commented Mar 3, 2016

CH1, CH2, and CH3 resolved. Thanks. I am good with the change now.

@woess
Copy link
Member

woess commented Mar 3, 2016

Please have a look at the eclipse problem, though.

@chrisseaton
Copy link
Contributor Author

@woess I did see the problem you're describing before, but not after I fixed CH2. Tell me if you see it again.

chrisseaton added a commit that referenced this pull request Mar 4, 2016
Add a DSL for generating layouts for the object model
@chrisseaton chrisseaton merged commit b9d26cf into oracle:master Mar 4, 2016
@chrisseaton chrisseaton deleted the om-dsl branch March 4, 2016 01:34
@woess
Copy link
Member

woess commented Mar 4, 2016

Yes, it's still there (eclipse 4.4.2).
expected (after mx build):

    public DynamicObject createShapeBase(DynamicObjectFactory factory) {
        assert factory != null;
        CompilerAsserts.partialEvaluationConstant(factory);
        assert createsShapeBase(factory);
        return factory.newInstance();
    }

actual (after cleaning the object.dsl.test project in eclipse):

    public DynamicObject createShapeBase(
            DynamicObjectFactory factory,
            com.oracle.truffle.api.object.DynamicObjectFactory factory,
        assert factory != null;
        CompilerAsserts.partialEvaluationConstant(factory);
        assert createsShapeBase(factory);
        assert factory.getShape().hasProperty(FACTORY_IDENTIFIER);
        assert factory != null;
        return factory.newInstance(
            factory,
    }

@chrisseaton
Copy link
Contributor Author

Yes sorry it broke all the builders as well :( I'll revert.

@chrisseaton chrisseaton restored the om-dsl branch March 4, 2016 01:48
dougxc pushed a commit that referenced this pull request May 18, 2016
…/truffle:DebuggerPause to master

* commit 'f85e4933b5e6b1f0cf63868da6984493da5ffd7e':
  Removed unnecessary catch.
  Properly synchronize debugger access, start and finish.
numberpi added a commit to numberpi/graal that referenced this pull request Nov 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants