Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8261642: improve jextract source generation mode and remove class generation #450

Conversation

@mcimadamore
Copy link
Collaborator

@mcimadamore mcimadamore commented Feb 12, 2021

This patch replaces the existing code generators used by jextract by a single, unified, source generation strategy.

As discussed in [1], there are some trade-offs between source generation and classfile generation; for instance, when using classfile translation, we can take advantage of certain features of the JVM bytecode, such as dynamic constants, which makes it easy to generate big files full of constants which are only initialized on a by-need basis.

Classfile generation is not without issue though; first, it is almost always easier to deal with sourcefiles rather than classes - both in terms of building environment and also in terms of sharing (e.g. uploading generated bindings on a GitHub project). This is for instance why we have always used source-based bindings for libclang to power jextract. We feel that users will probably be subject to similar limitations.

Secondly, IDEs can sometimes be picky about what's in the classfile - and I have seen cases of IntelliJ failing to decompile classes containing dynamic constants.

Last, but not least, classfiles cannot contain information which would be helpful for the user, such as javadoc. Adding javadoc comments, containing at the very least the original signatures of the C functions/structs would be very helpful when developing applications which use jextract generated code.

For these reasons, I have decided to explore a source-based strategy for jextract. Note that this does not alter the set of options available in the command line, or the default behavior of jextract: that is, jextract will still emit classfiles if you do not specify --sources; but instead of having two different backends (as we do now), this patch replaces both backends with a single source-based backend. We can discuss, as followup changes, as to whether we should just make source generation the default and drop classfile generation, but that's for another day.

The basic idea behind the approach is to stash constants in a separate class; that is, when we need a native method handle, we can generate a class which contains all the constants relative to that native method handle. This allows us to only load the constants that are used for a given native invocation.

One obvious problem of this approach is the sheer number of classes that it generates (although only a subset of them are truly loaded and initialized). To counteract that, this patch adopts several strategies:

  1. first, it reuses a container class, if that's already available, for storing the constants. For instance, if we are generating a struct class, we can just emit the constants inside the struct class, since the constants are private to that struct anyway (we cannot do that trick for functional interfaces, unfortunately, as these are interfaces, and all fields in interfaces must be public - so there's no way to hide the constant fields in there - we could do that if Java supported private static fields in interfaces, which is probably not so unreasonable to consider).

  2. The new code avoids some duplication which was present in the old approach when it comes to #define constants; these had getters both in the header Java API and in the internal constant source files, therefore almost doubling the size of code required to support a single constant. This is now gone in the new scheme; since constants are pretty uniquitous in header files, this move saves quite a bit of footprint.

  3. To further reduce the number of constant holder classes, I added a very simple grouping logic, so that the same holder class can be reused across N different API points (where N can be tweaked using a JDK property). In other words, if you turn up N to a very high value (e.g. 1000) you end up with the same generation scheme we had before, where all constants end up in the same class (or very few different classes). The default is set to 5, which, in my experiments has proven to be a good balance between fast startup (negligible difference when compared to the dynamic constant classfile approach) and artifact footprint (what comes out of jextract has basically the same size as before).

In terms of implementation, the main idea was to double down on the XYZBuilder interface and, at the same time, simplify it. We now have the following abstractions:

  • JavaSourceBuilder: basic abstraction which deals with creating java sources, maintains a backing string buffer, etc. It has a main API, which is used by OutputFactory to add variables, functions, constants, structs, functional interfaces, typedefs. This probably represents the biggest architectural shift in the code: previously, OutputFactory was responsible for low-level operation, such as starting and ending source generation - now all these details are encapsulated in the builders and OutputFactory deals with a much simplified API (with the result that OutputFactory became a lot simpler in places, such as visitVariable and visitScoped). Note that, since this now includes the former StringSourceBuilder featuring the string buffer, a lot of indirections of the like builder.xyz... have been replaced with just xyz which makes the code easier to follow.

  • HeaderFileBuilder: as before, this is the builder that is responsible for holding the contents of an header file. It can be used to add vars (globals), functions, structs, typedefs and functional interfaces.

  • NestedClassBuilder: as before, this is a builder for all things that are nested inside the main header file.

  • ConstantBuilder: this is a new class which partly plays the role played by the former ConstantHelper - it is now a type of NestedClassBuilder which has logic to emit constants. Note that JavaSourceBuilder has a method, namely emitWithConstantClass which spins a new constant holder on the fly, and allows some code generation to depends on the constants being added to that holder. ConstantBuilder also has a main API for adding method/var handles, segment/address constants, etc. Each of these methods return a Constant instance, which has few methods that can be useful when generating code (e.g. to generate the access expression needed to get the constant, or to emit a getter for the constant inside another builder).

  • StructBuilder, FunctionalInterfaceBuilder, TypedefBuilder; as before - the only notable difference being that now StructBuilder extends ConstantBuilder, so it can be the holder of its own constants.

I think this covers the main things that have been touched by this patch - feel free to ask questions, in case I missed something!

Cheers

[1] - https://mail.openjdk.java.net/pipermail/panama-dev/2021-January/011887.html


Progress

  • Change must not contain extraneous whitespace
  • Change must be properly reviewed

Issue

  • JDK-8261642: improve jextract source generation mode and remove class generation

Reviewers

Download

$ git fetch https://git.openjdk.java.net/panama-foreign pull/450/head:pull/450
$ git checkout pull/450

mcimadamore added 25 commits Feb 8, 2021
Most tests pass but 6.
Add workaround for when layout path get fails
This gives us ability to generate code with same footprint as before.
…lders (this "leaks" constant fields outside).
Cleanup code for header file generation (e.g. libraries/constructors)
Fix all tests
@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Feb 12, 2021

👋 Welcome back mcimadamore! A progress list of the required criteria for merging this PR into foreign-jextract will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

Loading

@openjdk openjdk bot added the rfr label Feb 12, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Feb 12, 2021

Loading

@mlbridge
Copy link

@mlbridge mlbridge bot commented Feb 14, 2021

Mailing list message from Duncan Gittins on panama-dev:

The new generation does look cleaner and it's very helpful that the 2
output types are consistent. I've a few small observations from my
initial tests:

1) When compiling my --source output I noticed that my jars were still
quite different sizes to jar of .class file output. This is simply
because jextract uses these javac parameters: "-parameters" and
"-g:lines". Could you tell me whether the extra meta-data added by
"-parameters" is needed for improving the runtime speed of code, and
therefore should be always be added when compiling --source output?

It might help others if jextract had some way to supply -g:none setting
though it's not important because -g:none can be handled when javac the
--source output.

2) generator adds C.class / C.java but I think @C annotations was
removed in --source output by earlier fixes. My app ran fine without
C.class, can it be removed?

I still think having an option to cut down the symbols will help when
run against the monster Windows headers, but Eclipse is happy with what
I get now even though the jars could be x20 smaller.

Kind regards

Duncan

On 12/02/2021 22:33, Maurizio Cimadamore wrote:

Loading

@mlbridge
Copy link

@mlbridge mlbridge bot commented Feb 15, 2021

Mailing list message from Maurizio Cimadamore on panama-dev:

Hi Duncan

On Sun, 2021-02-14 at 13:34 +0000, Duncan Gittins wrote:

The new generation does look cleaner and it's very helpful that the
2
output types are consistent. I've a few small observations from my
initial tests:

1) When compiling my --source output I noticed that my jars were
still
quite different sizes to jar of .class file output. This is simply
because jextract uses these javac parameters: "-parameters" and
"-g:lines". Could you tell me whether the extra meta-data added by
"-parameters" is needed for improving the runtime speed of code, and
therefore should be always be added when compiling --source output?

few things to notice here:

* I believe that longer term, we'll just drop classfile generation -
e.g. --source will just become the default - so that users can just use
whatever option they want to compile the bindings down to classes -
deciding how much overhead they want.

* Some of the options (e.g. parameter names) were there to support the
C annotation, which is now gone (and will be replaced by more
accessible javadoc) - as you noted.

That said, in principle I don't see anything wrong with disabling debug
info on the generated classes, at least as a transient step; the
bindings are relatively low level and I can't see why people would want
to debug them with full info.

It might help others if jextract had some way to supply -g:none
setting
though it's not important because -g:none can be handled when javac
the
--source output.

2) generator adds C.class / C.java but I think @C annotations was
removed in --source output by earlier fixes. My app ran fine without
C.class, can it be removed?

Sure - that is an issue in the patch - I'll look into it, probably just
an hiccup with merging the latest bits.

ThanksMaurizio

I still think having an option to cut down the symbols will help
when
run against the monster Windows headers, but Eclipse is happy with
what
I get now even though the jars could be x20 smaller.

Kind regards

Duncan

On 12/02/2021 22:33, Maurizio Cimadamore wrote:

Loading

Remove debugging/parameter info from jextract compilation
@mlbridge
Copy link

@mlbridge mlbridge bot commented Feb 15, 2021

Mailing list message from Maurizio Cimadamore on panama-dev:

On Sun, 2021-02-14 at 13:34 +0000, Duncan Gittins wrote:

I still think having an option to cut down the symbols will help
when

run against the monster Windows headers, but Eclipse is happy with
what

I get now even though the jars could be x20 smaller.

We will definitively add more filtering capabilities - but we wanted to
"clean up the house" before doing that :-)

CheersMaurizio

Loading

@mcimadamore
Copy link
Collaborator Author

@mcimadamore mcimadamore commented Feb 15, 2021

I did some experiments with removing parameter info and debugging information - honestly the gain were much less than what I was hoping for (around 100K max). I ended up disabling parameter info (which has been added to support the now gone annotation) and remove g:lines which is the default for javac anyway (e.g. line number tables will still be emitted) as I did not find a significant bloat coming out of that.

Loading

Copy link
Member

@JornVernee JornVernee left a comment

LGTM, some minor comments inline

Loading

Loading
if (l instanceof ValueLayout) {
append(typeToLayoutName((ValueLayout) l));
} else if (l instanceof SequenceLayout) {
append("MemoryLayout.ofSequence(");
if (((SequenceLayout) l).elementCount().isPresent()) {
append(((SequenceLayout) l).elementCount().getAsLong() + ", ");
}
emitLayoutString(((SequenceLayout) l).elementLayout());
append(")");
} else if (l instanceof GroupLayout) {
if (((GroupLayout) l).isStruct()) {
Copy link
Member

@JornVernee JornVernee Feb 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good place to use instanceof pattern matching.

Loading

Copy link
Collaborator Author

@mcimadamore mcimadamore Feb 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true

Loading

Copy link
Member

@JornVernee JornVernee Feb 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You missed the ValueLayout and SequenceLayout cases?

Loading

Loading
Loading
@@ -49,104 +54,117 @@
this.structLayout = structLayout;
this.structType = structType;
prefixElementNames = new ArrayDeque<>();
classBegin();
Copy link
Member

@JornVernee JornVernee Feb 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kinda strange that the constructor here calls classBegin() but OutputFactory calls classEnd(). Could both calls be moved to OutputFactory?

Loading

Copy link
Collaborator Author

@mcimadamore mcimadamore Feb 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can fix this

Loading

Copy link
Collaborator Author

@mcimadamore mcimadamore Feb 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgot about this - I cannot really do this, as nested structs don't return a new builder - in other words, the OutputFactory code doesn't really know if it should begin a class or not. I could do if (newBuilder != currentBuilder) newBuilder.classBegin() ... but I don't think that improves much?

Loading

Copy link
Member

@JornVernee JornVernee Feb 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was looking at classEnd() where you use if (prefixElementNames.isEmpty()) to check if the struct builder is for an anonymous nested struct or not. So I thought you could do the same for classBegin() and have:

    @Override
    void classBegin() {
        if (prefixElementNames.isEmpty()) {
            super.classBegin();
            addLayout(layoutField(), ((Type.Declared)structType).tree().layout().get())
                    .emitGetter(this, MEMBER_MODS, Constant.SUFFIX_ONLY);
        }
    }

?

Loading

Copy link
Member

@JornVernee JornVernee Feb 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it might make things clearer by creating an isAnonymousNested() predicate for that (just for reader's sake).

Loading

Copy link
Collaborator Author

@mcimadamore mcimadamore Feb 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the predicate - in the sense that the builder is not an anonymous nested - it is just being reused/recycled for an anonymous nested.

Loading

@openjdk
Copy link

@openjdk openjdk bot commented Feb 15, 2021

@mcimadamore This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8261642: improve jextract source generation mode and remove class generation

Reviewed-by: jvernee, sundar

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 1 new commit pushed to the foreign-jextract branch:

  • 4d7180a: 8261578: jextract crashes with Crossing storage unit boundaries

Please see this link for an up-to-date comparison between the source branch of this pull request and the foreign-jextract branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the foreign-jextract branch, type /integrate in a new comment.

Loading

@openjdk openjdk bot added the ready label Feb 15, 2021
Copy link
Member

@JornVernee JornVernee left a comment

Looks good. Thanks for addressing.

Loading

@mcimadamore
Copy link
Collaborator Author

@mcimadamore mcimadamore commented Feb 15, 2021

/integrate

Loading

@openjdk openjdk bot closed this Feb 15, 2021
@openjdk openjdk bot added integrated and removed ready rfr labels Feb 15, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Feb 15, 2021

@mcimadamore Since your change was applied there has been 1 commit pushed to the foreign-jextract branch:

  • 4d7180a: 8261578: jextract crashes with Crossing storage unit boundaries

Your commit was automatically rebased without conflicts.

Pushed as commit e4cd13d.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
3 participants