Add a DSL for generating layouts for the object model #40

chrisseaton · 2016-02-07T17:16:40Z

Ruby has a relatively Java-style class hierarchy, and many complex built-in classes which we implement in Java and need many fields to store state. The OM is more suited to simpler prototypical objects with few internal properties, so we found it a little verbose to use in JRuby. This DSL allows the declaration of Java-style classes, but backed using the OM. In JRuby this made it possible to move to an OM-only object style, where previously we had our own classes for internal fields and so two objects for every Ruby object (the OM object and the instance of our class).

This PR isn't ready for merge yet as the implementation of the generator is not yet high enough quality, but I'd like a review of the API and its semantics before I spend time refining the implementation, so please take a look at the com.oracle.truffle.api.object.dsl and com.oracle.truffle.api.object.dsl.test packages. Please ignore the implementation in the com.oracle.truffle.api.object.dsl.generator packages.

The API already works very well for JRuby.

@chumer @jtulach please review for API, and @woess please check I'm using the OM correctly. Please don't merge (unless you're happy with the poor implementation of the generator).

woess · 2016-02-08T14:58:45Z

...object.dsl.processor/src/com/oracle/truffle/object/dsl/processor/layout/LayoutGenerator.java

+                            stream.printf("        return ((AtomicReference<%s>) %s_PROPERTY.get(object, true)).get();%n", property.getType(), NameUtils.identifierToConstant(property.getName()));
+                        }
+                    } else {
+                        stream.printf("        return (%s) %s_PROPERTY.get(object, true);%n", property.getType(), NameUtils.identifierToConstant(property.getName()));


The condition should not be true, but probably is%s(object), where %s is layout.getName(). (applies to all the gets)

I presume ideally it would be a boolean data edge from the earlier is%s(object) call that the user is presumably doing? But we don't have access to that in the DSL.

Can you tell me about the difference between doing PROPERTY.get(object, true), PROPERTY.get(object, false) and PROPERTY.get(object, isFoo(object)), and PROPERTY.get(object, object.getShape())? I think there are some subtleties here that I may not be aware of.

Do we expect Graal to be able to GVN an earlier call to isFoo(object) with one that I do in my DSL so that there should be no cost if I do it again later?

yes, but Graal can GVN to an earlier isFoo guard.

PROPERTY.get(object, condition):

false: fixed in control flow, always safe

true: floating unguarded (unsafe)

isFoo(object) or result of an earlier guard: dependency edge on the guard (GVN-able)

shape: equivalent to object.getShape() == shape

chumer · 2016-02-09T10:22:30Z

In general this is a good idea.

One problem of this approach is that layout implementations cannot speculate because they are not implemented in the context of a Node. Or do I miss something?

For the object model accesses I think I'd prefer standardized but still flexible nodes that languages reuse to perform certain operations like reading writing, iterating.

I had in mind some abstract datatype pattern for OM, similar to interop but simpler, that lets you create nodes instead of calling methods directly. We could still use a DSL to generate the node constructors invocations, but a bit different like this:

   public final class BasicTestLayoutImpl implements BaseTestLayout {

        AllocateBasicTestNode createBasicTest(int value);

        public PropertyGetNode getValue(DynamicObject object) {
            return PropertyGetNode.create(object, "value")
       ]

        public PropertySetNode setValue(DynamicObject object, int value) {
           return PropertySetNode.create(object, "value")
      }

      public abstract static class AllocateBasicTestNode extends DynamicObjectConstrutor {
            @Child PropertySetNode setValue;

            public DynamicObject allocate(int value) {
               DynamicObject object = super.allocate();
               setValue.executeSet(object, value);
               return object;
            }
       }   


    }

      public abstract static class PropertyGetNode extends Node {

         public abstract Object executeGet(DynamicObject object)
          ...
       }   

public abstract static class PropertySetNode extends Node {

         public abstract Object executeSet(DynamicObject object, Object value)
          ...
       }

@chrisseaton What do you think?

chrisseaton · 2016-02-09T13:10:45Z

The main motivation here is for something that works without nodes and doesn't need to speculate. I could just store all my implementation fields (by which I mean things like the byte[] for a String - things the user never sees but I need to store in objects) like normal Ruby instance variables but with special names, and use normal instance variable field nodes to read and write them. That would have two problems.

First of all, it would mean an explosion of nodes and code. All the places where I currently get the size of an array, or its storage field, or the bytes in a string, go from being one line to being at least another @Cached parameter.

Imagine this code using nodes instead of Layouts.ARRAY.get...: https://github.com/jruby/jruby/blob/c2787507d3f07ee9c847f6b89cf2174fc5241a56/truffle/src/main/java/org/jruby/truffle/core/array/ArrayAppendManyNode.java#L142-L158. That's not even a @Specialization, so the nodes would either have to be fields (more code) or passed in as parameters (yet more code).

This one method alone would need nine new nodes (or one slow, polymorphic one) https://github.com/jruby/jruby/blob/ff9e6968d47127a0f5bc4beccb032b9f75074e6a/truffle/src/main/java/org/jruby/truffle/core/proc/ProcNodes.java#L141-L149. If I have lots of fields why not store them in a single object? Because then that's another object to allocate.

Secondly, I can't execute nodes in code behind a @TruffleBoundary. Imagine have to somehow hoist code like this out from behind a boundary in order to be able to execute a property read node: https://github.com/jruby/jruby/blob/28a97d92915a90edf71255d0985c2b6d7eaaaba9/truffle/src/main/java/org/jruby/truffle/core/encoding/EncodingNodes.java#L193-L193. I'd have to split the method into two - one that did property reads, and other that was the boundary method. I'm not sure that would be practical in many cases.

Graal.js already uses the same approach as this DSL, just manually written out (see the implementation of JSString and the Property members). Graal.js would also have the same problems with the approach you are suggesting - such as property reads behind boundaries.

So I don't think using nodes for implementation fields is tractable.

What's more, I believe that my current implementation wouldn't even benefit from any speculation, as everything is as static as the OM allows and the code should compile to not much more than a Java field or array access. Maybe @woess can confirm or deny?

nirvdrum · 2016-02-09T13:58:47Z

I'm perhaps biased by having used this in JRuby, but I'd like to be able to access these fields easily without having to rely on passing nodes around. From what I've seen, the static helpers compile down to simple field loads. I don't know if doing that in a node would allow speculative optimizations, but we're mostly representing mutable state in a guest language object . . . it seems unlikely that we'll see values that are constant enough to even speculate on. JRuby has a handful of these (e.g., a String's encoding), but we always end up using them as guards in specializations.

Having said that, if the node approach can make things faster I don't think there's a problem in also generating helper nodes. I don't see the two approaches as being at odds with one another.

jtulach · 2016-02-09T15:13:34Z

....truffle.api.object.dsl.test/src/com/oracle/truffle/api/object/dsl/test/InheritanceTest.java

+    private static final BaseLayout BASE_LAYOUT = BaseLayoutImpl.INSTANCE;
+    private static final MiddleLayout MIDDLE_LAYOUT = MiddleLayoutImpl.INSTANCE;
+    private static final TopLayout TOP_LAYOUT = TopLayoutImpl.INSTANCE;
+


I'd like all the implementations in one package to be generated into single class, with a default name, for example Layouts

Do you mean just the instance fields, so it would be Layouts.BASE or something like that? I could do that. Why do you prefer it though?

I looked into this and can't see any way to implement it correctly. I don't have any control over when my processing rounds are run, or when new source files with @Layout annotations in them may be generated by other processors that I don't know about, so I can't see any way to generate one file for all layouts in a package at the same time.

Do you know a way to make it possible?

Yes, I know. One always needs to process the whole package. I thought I even wrote a blog post about that, but I cannot find it. I used the code at http://hg.netbeans.org/html4j/file/ed4b25eb66f3/boot/src/main/java/org/netbeans/html/boot/impl/JavaScriptProcesor.java which also needs to generate single $JsCallbacks$ file per package.

chumer · 2016-02-09T15:14:35Z

@chrisseaton There is a solution for property reads (or nodes) behind boundaries in FastR
https://github.com/graalvm/fastr/blob/master/com.oracle.truffle.r.nodes/src/com/oracle/truffle/r/nodes/access/vector/ExtractVectorNode.java#L147-L172

Regarding:
https://github.com/jruby/jruby/blob/c2787507d3f07ee9c847f6b89cf2174fc5241a56/truffle/src/main/java/org/jruby/truffle/core/array/ArrayAppendManyNode.java#L142-L158

This is a case where you redo the caching in the operation instead of the property accesses. This way you could probably express this code with a generic cache without most of the specializations. I can offer doing a sketch if needed.

BTW.: As a rule of thumb: Many specializations that all forward to the same method indicates that a generic cache might be useful.

Having nodes for property accesses might be useful also for instrumenting the AST (eg instrument all object writes).

Another interesting thing is that without nodes you cannot speculate on constant fields. We might want to decide to do such kind of optimizations in Ruby in the future.

Regarding footprint: You can reuse the standard single read and write node implementation for property accesses. You can and probably should preinitialize the node with the specialization for the shape and attribute on creation. So the call to the node can become monomorph in the interpreter. The class can even be shared across language implementations. Additional nodes need to be allocated in this case which is a clear downside with the nodes approach.

So I don't think using nodes for implementation fields is tractable.

To sum up:

On the use side its an additional child field or @cached attribute for the property access nodes.
You can read a property from any object without type check which enables more generic implementations of some operations closer to Ruby code.
Property accesses become instrumentable
You can use TruffleBoundaryNodes to access files behind boundaries.
Yes nodes need to be allocated and require additional space.
Using generic field access nodes hides more details about the OM. No need to know about shapes at all.

I think the most important issue is that we can hide bigger parts of the OM implementation.

You have good points, but I think regarding future evolution generic attribute access nodes might be a better choice. I also don't agree with the code that is in graal.js in that regard.

chrisseaton · 2016-02-09T15:25:41Z

If the proposed benefits of your approach, such as speculative constants and instrumentation, are nice-to-have rather than nessecary immediately for correctness or performance, then maybe the DSL could provide my static methods now and in the future, but add node getters and setters later on? Then we could transition code gradually.

You could get a FooLayoutImpl.INSTANCE.getBar() and a FooLayoutImpl.BarGetterNode.

chumer · 2016-02-09T15:51:21Z

Yes that could be a way forward. We should look on the R side if we can use this as well. I will have another look at the implementation tomorrow.

woess · 2016-02-09T15:51:25Z

maybe the DSL could provide my static methods now and in the future, but add node getters and setters later on

I think that's a good compromise. Currently, there does not seem to be much benefit in channeling all accesses through nodes.

woess · 2016-02-09T15:57:57Z

What's more, I believe that my current implementation wouldn't even benefit from any speculation, as everything is as static as the OM allows and the code should compile to not much more than a Java field or array access.

I agree, although type checks could benefit from specialization.

What I like most about the DSL is implementation detail hiding and less boilerplate code.

woess · 2016-02-09T16:00:51Z

@chrisseaton I see that you have commented code in various places. We should try to avoid that.

chrisseaton · 2016-02-09T16:03:42Z

Yes I'll re-do the implementation (where the comments are) from scratch with CodeTree.

jtulach · 2016-02-09T16:12:54Z

truffle/com.oracle.truffle.api.object.dsl/src/com/oracle/truffle/api/object/dsl/Volatile.java

+ */
+@Target(ElementType.PARAMETER)
+@Retention(RetentionPolicy.CLASS)
+public @interface Volatile {


Retention "class"? That's surprising. There is an annotation processor, so "source" ought to be enough, I think.

Yes, looks like it works as SOURCE (same for Nullable and Layout).

mlvdv · 2016-02-09T16:50:47Z

I vote for any approach that allows straightforward instrumentation of all assignments. This would permit a reasonable implementation of debugger watchpoints, for example, with the same performance advantages that we have for breakpoints. Thomas asked me once, quite a while back, whether we could ever do "value tracing"; I hadn't thought about it since then, but this would also be the most practical way to approach that as well. This would be a pretty impressive capability of the Truffle platform.

chrisseaton · 2016-02-09T16:55:54Z

I vote for any approach that allows straightforward instrumentation of all assignments.

But these aren't source-level assignments. They're internal, implementation fields. They aren't visible to the person writing a Ruby program. It's like wanting to set a breakpoint on the header word in a Java object. That may be interesting to us as VM implementors, but it's a failure of encapsulation otherwise.

…error case yet.

chrisseaton · 2016-03-02T17:11:11Z

The snippets class has been moved to top-level and package-private at the end of the file containing the Layout interface.

Final opinions, please. I'll merge at close of play tomorrow unless there are objections.

chumer · 2016-03-02T18:59:45Z

...e.truffle.object.dsl.processor/src/com/oracle/truffle/object/dsl/processor/LayoutParser.java

+        }
+
+        for (AnnotationMirror annotationMirror : layoutElement.getAnnotationMirrors()) {
+            if (annotationMirror.getAnnotationType().toString().equals(Layout.class.getCanonicalName())) {


You are lucky that this works. toString is undefined for TypeMirror and in fact works different between ECJ and the eclipse compiler. See Truffle DSL ElementUtils where all the bad hacks including getQualifiedName(TypeMirror) is hidden.

Fixing this. What is the correct way to get a Name as a String? toString isn't specifically defined there either.

chumer · 2016-03-02T19:37:43Z

CH1) I'd like to see the hardcoded addAllowedImplicitCast somehow fixed (no good idea how though).

CH2) Also I don't feel very good with TypeMirror#toString uses in the generator. Maybe you can copy getQualifiedName(TypeMirror) from ElementUtils?

CH3) (can be done later as well) We introduce a new distribution for this DSL that needs to be configured and distributed. Maybe we should put this processor into the TRUFFLE_DSL_PROCESSOR distribution as well? So we always have one truffle-api jar and one processor jar in the end. For a guest language developer granularity this fine does not make much sense. @chrisseaton What do you think?

CH4) (definitely for later) recast the generator with CodeTreeBuilder. Maybe we should track this with an issue?

Besides these small issues, I think this is a significant simplification compared to using the object model API directly.
If the issues are resolved Accept.

chrisseaton · 2016-03-03T02:26:21Z

CH1 resolved in df3c888ca86f5d1adc34e4dec941488992243d39

CH2 resolved in 90424f3

jtulach · 2016-03-03T03:38:03Z

Re. CH3 - I must have been blind to not notice this myself. The vision so far was to have one truffle-api.jar and one truffle-dsl-processor.jar - originally the DSL, but also all interop, TruffleLanguage and co. processors are in the second one. I believe there is no need to increase the number of distributions for Truffle - two is enough. Please include the new processor into the existing truffle-dsl-processor.jar.

Btw. should the new processor be in a separate JAR, we need to control imported/exported API between the new processor and existing truffle-dsl-processor.jar as they can be distributed separately and it could happen one is using 0.11 of truffle-dsl-processor.jar with 0.12 of the new one, or even some wilder combinations. That is a lot of headache which we can avoid by packaging all the processors into one distribution.

woess · 2016-03-03T04:10:09Z

I feel like it's in a good shape and ready for inclusion.
Two more comments, though:

There seems to be a problem with generating the factory method (create*) for the (Inherited)ShapePropertiesTest when the processor is run from eclipse (4.4).

Do you think it could make sense to create the *LayoutImpl with optionally, say, a Layout parameter in the future? e.g.,
CAST_LAYOUT = CastLayoutImpl.INSTANCE
could then potentially be:
CAST_LAYOUT = CastLayoutImpl.create(Layout.newLayout().addAllowedImplicitCast(IntToLong).build())
I don't really see a need for it right now (your solution for CH1 is fine), but I'm thinking that maybe we should still use an instantiation method instead of a field from an API flexibility perspective. Initially it would simply return INSTANCE (private field). We could then more easily change it later to not return a singleton. Feel free to ignore or defer it, though.

chrisseaton · 2016-03-03T12:29:06Z

CH3 resolved in 8967b83.

Can't test with JRuby until the iterate frames discussion is resolved.

chumer · 2016-03-03T19:07:57Z

CH1, CH2, and CH3 resolved. Thanks. I am good with the change now.

woess · 2016-03-03T22:50:37Z

Please have a look at the eclipse problem, though.

chrisseaton · 2016-03-04T00:37:21Z

@woess I did see the problem you're describing before, but not after I fixed CH2. Tell me if you see it again.

Add a DSL for generating layouts for the object model

woess · 2016-03-04T01:47:20Z

Yes, it's still there (eclipse 4.4.2).
expected (after mx build):

    public DynamicObject createShapeBase(DynamicObjectFactory factory) {
        assert factory != null;
        CompilerAsserts.partialEvaluationConstant(factory);
        assert createsShapeBase(factory);
        return factory.newInstance();
    }

actual (after cleaning the object.dsl.test project in eclipse):

    public DynamicObject createShapeBase(
            DynamicObjectFactory factory,
            com.oracle.truffle.api.object.DynamicObjectFactory factory,
        assert factory != null;
        CompilerAsserts.partialEvaluationConstant(factory);
        assert createsShapeBase(factory);
        assert factory.getShape().hasProperty(FACTORY_IDENTIFIER);
        assert factory != null;
        return factory.newInstance(
            factory,
    }

chrisseaton · 2016-03-04T01:48:12Z

Yes sorry it broke all the builders as well :( I'll revert.

…/truffle:DebuggerPause to master * commit 'f85e4933b5e6b1f0cf63868da6984493da5ffd7e': Removed unnecessary catch. Properly synchronize debugger access, start and finish.

chrisseaton added the oracle-emp label Feb 7, 2016

chrisseaton assigned jtulach Feb 7, 2016

woess reviewed Feb 8, 2016
View reviewed changes

jtulach reviewed Feb 9, 2016
View reviewed changes

chrisseaton added 12 commits February 15, 2016 22:12

Extract OM DSL from JRuby+Truffle.

eee83d3

OM DSL package-info.

af2ba3a

Add a first test for the OM DSL and fix a few issues it raised.

51710c4

OM-DSL: fix extra newline.

e7fa278

OM-DSL: allow layout interfaces to be nested in classes.

c3a914b

OM-DSL: test for read-only properties.

b959f8b

OM-DSL: nullability test, but it uses assertions so doesn't test the …

bcbedbe

…error case yet.

OM-DSL: fix generation for layouts with no properties.

1b8b01c

OM-DSL: guards tests.

6c03f40

OM-DSL: test for volatile.

661c58f

OM-DSL: semi-final properties test.

e427406

OM-DSL: inheritance test.

589c3b7

chumer reviewed Mar 2, 2016
View reviewed changes

chrisseaton added 2 commits March 2, 2016 19:09

OM-DSL: add package info.

9a6f897

OM-DSL: use ElementUtils to convert mirrors to strings.

90424f3

OM-DSL: make implicit casts configurable.

b0f71b7

chrisseaton added 2 commits March 3, 2016 12:23

OM-DSL: fix javadoc.

a105c89

OM-DSL: move processor into the same jar as the existing DSL processors.

8967b83

Merge branch 'master' into om-dsl

9ad5998

OM-DSL: add @SInCE tags.

88675f9

chrisseaton added a commit that referenced this pull request Mar 4, 2016

Merge pull request #40 from chrisseaton/om-dsl

b9d26cf

Add a DSL for generating layouts for the object model

chrisseaton merged commit b9d26cf into oracle:master Mar 4, 2016

chrisseaton deleted the om-dsl branch March 4, 2016 01:34

chrisseaton restored the om-dsl branch March 4, 2016 01:48

This was referenced Mar 4, 2016

Revert "Add a DSL for generating layouts for the object model" #93

Merged

Om dsl #97

Merged

jtulach mentioned this pull request Mar 18, 2016

OM-DSL: make the generated ObjectType public #138

Merged

brunoborges unassigned jtulach Jan 8, 2018

numberpi added a commit to numberpi/graal that referenced this pull request Nov 14, 2018

Remove one test case until SL issue oracle#40 is fixed

55fe5c6

Add a DSL for generating layouts for the object model #40

Add a DSL for generating layouts for the object model #40

Conversation

chrisseaton commented Feb 7, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chumer commented Feb 9, 2016

chrisseaton commented Feb 9, 2016

nirvdrum commented Feb 9, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chumer commented Feb 9, 2016

chrisseaton commented Feb 9, 2016

chumer commented Feb 9, 2016

woess commented Feb 9, 2016

woess commented Feb 9, 2016

woess commented Feb 9, 2016

chrisseaton commented Feb 9, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mlvdv commented Feb 9, 2016

chrisseaton commented Feb 9, 2016

chrisseaton commented Mar 2, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chumer commented Mar 2, 2016

chrisseaton commented Mar 3, 2016

jtulach commented Mar 3, 2016

woess commented Mar 3, 2016

chrisseaton commented Mar 3, 2016

chumer commented Mar 3, 2016

woess commented Mar 3, 2016

chrisseaton commented Mar 4, 2016

woess commented Mar 4, 2016

chrisseaton commented Mar 4, 2016