Skip to content

8308293: A linker should expose the layouts it supports#14037

Closed
mcimadamore wants to merge 8 commits intoopenjdk:masterfrom
mcimadamore:linker_types
Closed

8308293: A linker should expose the layouts it supports#14037
mcimadamore wants to merge 8 commits intoopenjdk:masterfrom
mcimadamore:linker_types

Conversation

@mcimadamore
Copy link
Contributor

@mcimadamore mcimadamore commented May 17, 2023

This patch adds an instance method on Linker, namely Linker::canonicalLayouts which returns all the layouts known by the linker as implementing some ABI type. For instance, if I call this on my machine (Linux/x64) I get this:

jshell> import java.lang.foreign.*;

jshell> Linker.nativeLinker().canonicalLayouts()
$2 ==> {char16_t=c16, int8_t=b8, long=j64, size_t=j64, bool=z8, int=i32, long long=j64, int64_t=j64, void*=a64, float=f32, char=b8, int16_t=s16, int32_t=i32, short=s16, double=d64}

This can be useful to discover the ABI types supported by a linker implementation, as well as for, in the future, add support for more exotic (and platform-dependent) linker types, such as long double or complex long.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change requires CSR request JDK-8308304 to be approved

Issues

  • JDK-8308293: A linker should expose the layouts it supports
  • JDK-8308304: A linker should expose the layouts it supports (CSR)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/14037/head:pull/14037
$ git checkout pull/14037

Update a local copy of the PR:
$ git checkout pull/14037
$ git pull https://git.openjdk.org/jdk.git pull/14037/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 14037

View PR using the GUI difftool:
$ git pr show -t 14037

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/14037.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented May 17, 2023

👋 Welcome back mcimadamore! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented May 17, 2023

@mcimadamore The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the core-libs core-libs-dev@openjdk.org label May 17, 2023
SymbolLookup defaultLookup();

/**
* {@return a mapping between the names of data types used by the ABI implemented by this linker and their
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much of the verbiage here is carried over from defaultLookup as we need to do the usual dance of saying that the set of returned types is not specified, but should be (a) sensible and (b) stable.

* <p>
* All the native linker implementations limit the function descriptors that they support to those that contain
* only so-called <em>canonical</em> layouts. A canonical layout has the following characteristics:
* All the native linker implementations can only operate on a subset of memory layouts, called <em>supported layouts</em>.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I revamped this section as I realized that what we had did not cover things in the recursive case - e.g. a struct layout is only supported if it contains other supported layouts. This new text should hopefully capture everything in a more mathematical form.

@mcimadamore
Copy link
Contributor Author

@mcimadamore mcimadamore marked this pull request as ready for review May 17, 2023 17:47
@openjdk openjdk bot added csr Pull request needs approved CSR before integration rfr Pull request is ready for review labels May 17, 2023
@mlbridge
Copy link

mlbridge bot commented May 17, 2023

Webrevs

@openjdk
Copy link

openjdk bot commented May 17, 2023

@mcimadamore Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information.

* <li>It does not contain padding other than what is strictly required to align its non-padding layout elements,
* or to satisfy constraint 3</li>
* <li>the alignment constraint of {@code G} is set to its <a href="MemoryLayout.html#layout-align">natural alignment</a>;</li>
* <li>the size of {@code G} is a multiple of its alignment constraint;</li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this is a constraint that will hold across all linker implementations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All "native linkers" as the text says. Other linkers (e.g. not for C) might obey other rules. These rules are basically constraining struct layouts to what can come out of a native compiler in the absence of pragma pack directives.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, i see now, i missed the relevance of that term, introduced earlier in non-modified text.

Copy link
Member

@PaulSandoz PaulSandoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we need to be stronger on the compatibility requirements for the supported canonical layouts. It seems we can make stronger claims than symbols made available by the defaultLookup because the ABI supported by the native linker will make strong (specification) claims.

So maybe this comes down to the linker supporting a subset ABI's data types, and that subset might increase over time, but never decrease? In this respect could we present a table for each supported linker ABI listing the ABI type and its canonical layout type? (in practice it might just be one table with noted adjustments.)

Then there is the possibility that a linker might change the layout corresponding to a data type. Ideally the linker support the prior layout and the new layout. I don't think this will arise with the current support set of supported ABI data types, but it might if we choose to add support for say the optional __int128 data type prior to the platform adding support for a primitive value class of Int128?

[1] https://gitlab.com/x86-psABIs/x86-64-ABI

@mcimadamore
Copy link
Contributor Author

So maybe this comes down to the linker supporting a subset ABI's data types, and that subset might increase over time, but never decrease? In this respect could we present a table for each supported linker ABI listing the ABI type and its canonical layout type? (in practice it might just be one table with noted adjustments.)

I see what you mean and I'm not sure about this. On the one hand, having a set of "trusted" type names would be handy - but I don't know how much commitment we want to put in there? I'm also a bit skeptical at listing all possible ABIs, since I suspect the set of supported platforms will change quickly.

Is what you are after some kind of guarantee of "at least these type names will be available" ?

As for a linker possibly having multiple different layouts for the same ABI type, that is true, and, in a way, already the case with ValueLayout.OfChar/ValueLayout.OfShort. I worked around that by using different type names - e.g. int16_t and char16_t.

For more exotic types which might be modeled initially opaquely with MemorySegment, and later on with some other ValueLayout.OfFooBar, I believe we'd need to provide a way to go from the opaque layout to the less opaque one.

The other option would be to admit that a single ABI type can map to multiple layouts, and have Map<String, List<MemoryLayout>>. It seemed a bit on the convoluted side of things (esp. given that we can get away without it, at least in this patch), but if you think that would be more robust, I can change that.

@PaulSandoz
Copy link
Member

Here's the crux of what i am wondering about. Can we specify native linker support for a subset of the System V Application Binary Interface (e.g., LP64 and ILP32 programming models for all non-optional scalar types, sequences of, and groups of) such that a developer can write code using the FFM API and it will work across all JDK implementations supporting that native linker?

AFAICT the closest we have to that is the table in the Linker doc, and that table references C type names. Perhaps we can use C type name as the ABI type name for the System V Application Binary Interface? (literally copy the name used in Figure 3.1 C column of the ABI specification).

Then can do we the same and specify the equivalent native linker support for ABIs of Windows 64/32 and ARM?

@mcimadamore
Copy link
Contributor Author

mcimadamore commented May 18, 2023

Here's the crux of what i am wondering about. Can we specify native linker support for a subset of the System V Application Binary Interface (e.g., LP64 and ILP32 programming models for all non-optional scalar types, sequences of, and groups of) such that a developer can write code using the FFM API and it will work across all JDK implementations supporting that native linker?

AFAICT the closest we have to that is the table in the Linker doc, and that table references C type names. Perhaps we can use C type name as the ABI type name for the System V Application Binary Interface? (literally copy the name used in Figure 3.1 C column of the ABI specification).

Then can do we the same and specify the equivalent native linker support for ABIs of Windows 64/32 and ARM?

Consider that, at the time of writing we support (or might support soon):

  • Sysv (Linux and MacOS)
  • Aarch64(Linux, MacOs, Windows) - these have actually some ABI differences (e.g. variadic calls)
  • PowerPC (which might come in big/little endian flavors)
  • RiscV

That's quite a lot of ABIs and tables to have.

Also, if we wanted to tighten up the spec a little bit, what the user cares about is some minimum guarantees about the supported ABI types across platforms. E.g. you don't want a table-per-ABI, precisely because you want to know (I think): "if I call linker.canonicalLayout().get("int")" is this call guaranteed to succeed?

So, pulling on the string, IMHO we should:

  • define which common subset of C types we support (e.g. list the C type names) - probably the one we use now is a good set;
  • Give an example, on x64, of how these types might be realized using layouts (e.g. current table)

More pulling, the char16_t type is probably not defined on all ABIs (e.g. I can't find it here [1]). Which seems to suggest that perhaps canonical layout should just return mapping from string to lists of layouts, even if inconvenient?

[1] - https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#arm-c-and-c-language-mappings

@PaulSandoz
Copy link
Member

I agree focusing on a subset of C types is the way to go. That avoids the unnecessary verbosity of many tables, and we can enumerate the types differing by data model (e.g., LP64 and ILP32).

As a developer i would like to know for all C-based native linkers (which is all native linkers? what else would they be based on?) if:

  1. the C int type etc is supported; and
  2. whether i can use the JAVA_INT layout in my function descriptors, and know it corresponds to a C int.

It seems obvious that i should be able to but AFAICT the specification is more example based, so it's not clear to me if different Java implementations can deviate in such behaviour.

Requiring the use of the C type names in the canonical mapping does help, because then i can more directly ask the C-based linker "Hey what's your canonical layout for the C char type?" .

I don't see char16_t defined either and i believe that type is an alias for uint_least16_t, which is a type of at least 16 bits. We need to use a C type of unsigned short or uint16_t:

  • Java ValueLayout.JAVA_SHORT <-> C short
  • Java ValueLayout.JAVA_CHAR <-> C unsigned short

?

For canonical type names we may want to prefer types specified by the C language over those defined by the C library in standard headers?

@mcimadamore
Copy link
Contributor Author

I don't see char16_t defined either and i believe that type is an alias for uint_least16_t, which is a type of at least 16 bits. We need to use a C type of unsigned short or uint16_t:

* Java `ValueLayout.JAVA_SHORT` <-> C `short`

* Java `ValueLayout.JAVA_CHAR` <-> C `unsigned short`

?
Good point on "unsigned short".

For canonical type names we may want to prefer types specified by the C language over those defined by the C library in standard headers?

Yes, I think better to stick with standard C types - IMHO with the exception of size_t which is very ubiquitous.

@mcimadamore
Copy link
Contributor Author

I've addressed the comments and tweaked the javadoc. Now we list all the C types that a linker is guaranteed to support (and state that the canonical layout associated with those types can vary depending on data model). Then we roll in the usual/more concrete table for Linux/x64.

@mcimadamore
Copy link
Contributor Author

* via the generation of {@linkplain #upcallStub(MethodHandle, FunctionDescriptor, Arena, Option...) upcall stubs}.</li>
* </ul>
* A linker provides a way to lookup up the <em>canonical layouts</em> associated with the data types used by the ABI.
* For example, the canonical layout for the C {@code size_t} type is equal to {@link ValueLayout#JAVA_LONG}. The canonical
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* For example, the canonical layout for the C {@code size_t} type is equal to {@link ValueLayout#JAVA_LONG}. The canonical
* For example, the canonical layout for the C {@code size_t} type is equal to {@link ValueLayout#JAVA_LONG} on 64-bit platforms. The canonical

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct in calling this out. I think this should be spelled out more (similarly to what we do for default lookup) since we're still in the "general" linker section. E.g.

A linker provides a way to lookup up the <em>canonical layouts</em> associated with the data types used by the ABI.
For example, a linker implementing the C ABI might chose to provide a canonical layout for the C {@code size_t} type. On 64-bit platforms, this canonical layout might be equal to {@link ValueLayout#JAVA_LONG}. The canonical
layouts supported by a linker are exposed via the {@link #canonicalLayouts()} method, which returns a map from
ABI type names to canonical layouts.

Copy link
Member

@PaulSandoz PaulSandoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This look much better. Can we strengthen the specification of canonicalLayouts in accordance with the class specification

@mcimadamore
Copy link
Contributor Author

This look much better. Can we strengthen the specification of canonicalLayouts in accordance with the class specification

We can't do more in that method javadoc, think, as that has to be general enough for all linkers. I think the rules set up in that method javadoc are good - e.g. the set of layouts should be stable (both in terms of names and layout types).

What we can do is to sprinkle some wording in the nativeLinker factory - e.g. the native linker is guaranteed to provide canonical layouts for basic C types <link to the class section>.

@PaulSandoz
Copy link
Member

This look much better. Can we strengthen the specification of canonicalLayouts in accordance with the class specification

We can't do more in that method javadoc, think, as that has to be general enough for all linkers. I think the rules set up in that method javadoc are good - e.g. the set of layouts should be stable (both in terms of names and layout types).

What we can do is to sprinkle some wording in the nativeLinker factory - e.g. the native linker is guaranteed to provide canonical layouts for basic C types <link to the class section>.

Yes, that's better.

SymbolLookup defaultLookup();

/**
* {@return a mapping between the names of data types used by the ABI implemented by this linker and their
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should state we return "an unmodifiable mapping".

Copy link
Contributor

@minborg minborg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Are there any additional types we might consider apart from the basic ones and size_t? Maybe one for an address pointing at errno?

@mcimadamore
Copy link
Contributor Author

LGTM. Are there any additional types we might consider apart from the basic ones and size_t? Maybe one for an address pointing at errno?

I think there is a certain gravitational pull towards keeping the set of guaranteed canonical layouts as minimal as possible. In that sense pointer to errno seems not to meet the bar IMHO. (also note that one could always ask the captureStateLayout, and figure out how to express the errno type from there).

@mcimadamore
Copy link
Contributor Author

@PaulSandoz after thinking some more, it seems a bit ad-hoc to guarantee a canonical for "unsigned short", but not for other unsigned types? Possible alternatives (beside keeping what we have in this PR):

  • guarantee also all the other unsigned types (e.g. char, int, long and long long)
  • do not guarantee unsigned short - but provide a mapping for it anyways
  • do not guarantee unsigned short, and do not provide a mapping for it - this means that JAVA_CHAR will not be usable when linking

What do you think?

@PaulSandoz
Copy link
Member

On further reflection i think mapping C unsigned short only to Java char is a misleading. Although Java char is an integral type is not really intended to be used generally as an unsigned 16-bit integer, so arguably Java char is not as useful a carrier type for native interoperation as Java short might be even though it is signed.
Thus i am inclined to remove that mapping.

What if we say something to the effect of:

unless explicitly declared in the canonical layouts C's unsigned integral types are mapped to the layouts associated with the required C's signed integral types of the same bit sizes.

?

Arguably C unsigned short could map to carriers Java short or Java char, but i am inclined to say the user should cast between Java short to char in such cases.

FWIW i checked what the FFM API and jextract does today and it maps unsigned C types to signed Java types.

@mcimadamore
Copy link
Contributor Author

On further reflection i think mapping C unsigned short only to Java char is a misleading. Although Java char is an integral type is not really intended to be used generally as an unsigned 16-bit integer, so arguably Java char is not as useful a carrier type for native interoperation as Java short might be even though it is signed. Thus i am inclined to remove that mapping.

What if we say something to the effect of:

unless explicitly declared in the canonical layouts C's unsigned integral types are mapped to the layouts associated with the required C's signed integral types of the same bit sizes.

?

Arguably C unsigned short could map to carriers Java short or Java char, but i am inclined to say the user should cast between Java short to char in such cases.

FWIW i checked what the FFM API and jextract does today and it maps unsigned C types to signed Java types.

On further reflection i think mapping C unsigned short only to Java char is a misleading. Although Java char is an integral type is not really intended to be used generally as an unsigned 16-bit integer, so arguably Java char is not as useful a carrier type for native interoperation as Java short might be even though it is signed. Thus i am inclined to remove that mapping.

What if we say something to the effect of:

unless explicitly declared in the canonical layouts C's unsigned integral types are mapped to the layouts associated with the required C's signed integral types of the same bit sizes.

?

Arguably C unsigned short could map to carriers Java short or Java char, but i am inclined to say the user should cast between Java short to char in such cases.

FWIW i checked what the FFM API and jextract does today and it maps unsigned C types to signed Java types.

I tend to agree with your conclusion. And I confirm that we do not use "char" anywhere in jextract. The only "problem" with that approach is that if we go down that path, JAVA_CHAR is no longer a canonical type, so users cannot mention it in function descriptors. Apart from requiring few test updates, I don't see many other problems with it - if one really really wanted the result of a native call to be converted to char, a MH adapter can be used. Effectively, what you suggest amount at saying: we do have a JAVA_CHAR layout, which is mostly there for Java interop. But a native linker (which only cares about native interop) doesn't really care much about that. Does that sound good?

@mcimadamore
Copy link
Contributor Author

Arguably C unsigned short could map to carriers Java short or Java char, but i am inclined to say the user should cast between Java short to char in such cases.

(Another advantage of this is that, should we get proper unsigned carriers from Valhalla one day, native linkers could be updated to support those en masse - not just for short)

@PaulSandoz
Copy link
Member

Effectively, what you suggest amount at saying: we do have a JAVA_CHAR layout, which is mostly there for Java interop. But a native linker (which only cares about native interop) doesn't really care much about that. Does that sound good?

Yes.

Beef up javadoc
@openjdk
Copy link

openjdk bot commented May 24, 2023

@mcimadamore this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout linker_types
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added the merge-conflict Pull request has merge conflict with target branch label May 24, 2023
@mcimadamore
Copy link
Contributor Author

@openjdk openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label May 24, 2023
{ JAVA_BYTE, byteToInt((byte) 42), BYTE_HOB_MASK, BYTE_TO_INT, SAVE_BYTE_AS_INT },
{ JAVA_SHORT, shortToInt((short) 42), SHORT_HOB_MASK, SHORT_TO_INT, SAVE_SHORT_AS_INT },
{ JAVA_CHAR, charToInt('a'), CHAR_HOB_MASK, CHAR_TO_INT, SAVE_CHAR_AS_INT }
{ JAVA_SHORT, shortToInt((short) 42), SHORT_HOB_MASK, SHORT_TO_INT, SAVE_SHORT_AS_INT }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since arrays support trailing commas, this can use that:

Suggested change
{ JAVA_SHORT, shortToInt((short) 42), SHORT_HOB_MASK, SHORT_TO_INT, SAVE_SHORT_AS_INT }
{ JAVA_SHORT, shortToInt((short) 42), SHORT_HOB_MASK, SHORT_TO_INT, SAVE_SHORT_AS_INT },

Copy link
Member

@PaulSandoz PaulSandoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks reasonable. I say let's soak it and then see if we need refine based on feedback and further research (e.g., if we find we need to declare multiple layouts per ABI type for extensibility reasons).

@mcimadamore
Copy link
Contributor Author

mcimadamore commented May 26, 2023

Thanks for taking the time to review. After some more consideration, I will withdraw this PR. While this API is largely not problematic, we need to make sure that this API fits with how the FFM API will be evolved to support other types besides the C basic types we know and love (e.g. long double, vectors, complex types). So adding this API point now seems premature.

I will bring over relevant javadoc improvements in the other javadoc PR I have open: https://git.openjdk.org/jdk/pull/14098

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core-libs core-libs-dev@openjdk.org csr Pull request needs approved CSR before integration rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

4 participants