8308293: A linker should expose the layouts it supports#14037
8308293: A linker should expose the layouts it supports#14037mcimadamore wants to merge 8 commits intoopenjdk:masterfrom
Conversation
Add char type
|
👋 Welcome back mcimadamore! A progress list of the required criteria for merging this PR into |
|
@mcimadamore The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
| SymbolLookup defaultLookup(); | ||
|
|
||
| /** | ||
| * {@return a mapping between the names of data types used by the ABI implemented by this linker and their |
There was a problem hiding this comment.
Much of the verbiage here is carried over from defaultLookup as we need to do the usual dance of saying that the set of returned types is not specified, but should be (a) sensible and (b) stable.
| * <p> | ||
| * All the native linker implementations limit the function descriptors that they support to those that contain | ||
| * only so-called <em>canonical</em> layouts. A canonical layout has the following characteristics: | ||
| * All the native linker implementations can only operate on a subset of memory layouts, called <em>supported layouts</em>. |
There was a problem hiding this comment.
I revamped this section as I realized that what we had did not cover things in the recursive case - e.g. a struct layout is only supported if it contains other supported layouts. This new text should hopefully capture everything in a more mathematical form.
Webrevs
|
|
@mcimadamore Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information. |
| * <li>It does not contain padding other than what is strictly required to align its non-padding layout elements, | ||
| * or to satisfy constraint 3</li> | ||
| * <li>the alignment constraint of {@code G} is set to its <a href="MemoryLayout.html#layout-align">natural alignment</a>;</li> | ||
| * <li>the size of {@code G} is a multiple of its alignment constraint;</li> |
There was a problem hiding this comment.
Do you think this is a constraint that will hold across all linker implementations?
There was a problem hiding this comment.
All "native linkers" as the text says. Other linkers (e.g. not for C) might obey other rules. These rules are basically constraining struct layouts to what can come out of a native compiler in the absence of pragma pack directives.
There was a problem hiding this comment.
Ah, i see now, i missed the relevance of that term, introduced earlier in non-modified text.
There was a problem hiding this comment.
I wonder if we need to be stronger on the compatibility requirements for the supported canonical layouts. It seems we can make stronger claims than symbols made available by the defaultLookup because the ABI supported by the native linker will make strong (specification) claims.
So maybe this comes down to the linker supporting a subset ABI's data types, and that subset might increase over time, but never decrease? In this respect could we present a table for each supported linker ABI listing the ABI type and its canonical layout type? (in practice it might just be one table with noted adjustments.)
Then there is the possibility that a linker might change the layout corresponding to a data type. Ideally the linker support the prior layout and the new layout. I don't think this will arise with the current support set of supported ABI data types, but it might if we choose to add support for say the optional __int128 data type prior to the platform adding support for a primitive value class of Int128?
I see what you mean and I'm not sure about this. On the one hand, having a set of "trusted" type names would be handy - but I don't know how much commitment we want to put in there? I'm also a bit skeptical at listing all possible ABIs, since I suspect the set of supported platforms will change quickly. Is what you are after some kind of guarantee of "at least these type names will be available" ? As for a linker possibly having multiple different layouts for the same ABI type, that is true, and, in a way, already the case with ValueLayout.OfChar/ValueLayout.OfShort. I worked around that by using different type names - e.g. For more exotic types which might be modeled initially opaquely with MemorySegment, and later on with some other ValueLayout.OfFooBar, I believe we'd need to provide a way to go from the opaque layout to the less opaque one. The other option would be to admit that a single ABI type can map to multiple layouts, and have |
|
Here's the crux of what i am wondering about. Can we specify native linker support for a subset of the System V Application Binary Interface (e.g., LP64 and ILP32 programming models for all non-optional scalar types, sequences of, and groups of) such that a developer can write code using the FFM API and it will work across all JDK implementations supporting that native linker? AFAICT the closest we have to that is the table in the Linker doc, and that table references C type names. Perhaps we can use C type name as the ABI type name for the System V Application Binary Interface? (literally copy the name used in Figure 3.1 C column of the ABI specification). Then can do we the same and specify the equivalent native linker support for ABIs of Windows 64/32 and ARM? |
Consider that, at the time of writing we support (or might support soon):
That's quite a lot of ABIs and tables to have. Also, if we wanted to tighten up the spec a little bit, what the user cares about is some minimum guarantees about the supported ABI types across platforms. E.g. you don't want a table-per-ABI, precisely because you want to know (I think): "if I call So, pulling on the string, IMHO we should:
More pulling, the [1] - https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#arm-c-and-c-language-mappings |
|
I agree focusing on a subset of C types is the way to go. That avoids the unnecessary verbosity of many tables, and we can enumerate the types differing by data model (e.g., LP64 and ILP32). As a developer i would like to know for all C-based native linkers (which is all native linkers? what else would they be based on?) if:
It seems obvious that i should be able to but AFAICT the specification is more example based, so it's not clear to me if different Java implementations can deviate in such behaviour. Requiring the use of the C type names in the canonical mapping does help, because then i can more directly ask the C-based linker "Hey what's your canonical layout for the C I don't see
? For canonical type names we may want to prefer types specified by the C language over those defined by the C library in standard headers? |
Yes, I think better to stick with standard C types - IMHO with the exception of size_t which is very ubiquitous. |
|
I've addressed the comments and tweaked the javadoc. Now we list all the C types that a linker is guaranteed to support (and state that the canonical layout associated with those types can vary depending on data model). Then we roll in the usual/more concrete table for Linux/x64. |
|
Updated javadoc and specdiffs: |
| * via the generation of {@linkplain #upcallStub(MethodHandle, FunctionDescriptor, Arena, Option...) upcall stubs}.</li> | ||
| * </ul> | ||
| * A linker provides a way to lookup up the <em>canonical layouts</em> associated with the data types used by the ABI. | ||
| * For example, the canonical layout for the C {@code size_t} type is equal to {@link ValueLayout#JAVA_LONG}. The canonical |
There was a problem hiding this comment.
| * For example, the canonical layout for the C {@code size_t} type is equal to {@link ValueLayout#JAVA_LONG}. The canonical | |
| * For example, the canonical layout for the C {@code size_t} type is equal to {@link ValueLayout#JAVA_LONG} on 64-bit platforms. The canonical |
?
There was a problem hiding this comment.
You are correct in calling this out. I think this should be spelled out more (similarly to what we do for default lookup) since we're still in the "general" linker section. E.g.
A linker provides a way to lookup up the <em>canonical layouts</em> associated with the data types used by the ABI.
For example, a linker implementing the C ABI might chose to provide a canonical layout for the C {@code size_t} type. On 64-bit platforms, this canonical layout might be equal to {@link ValueLayout#JAVA_LONG}. The canonical
layouts supported by a linker are exposed via the {@link #canonicalLayouts()} method, which returns a map from
ABI type names to canonical layouts.
PaulSandoz
left a comment
There was a problem hiding this comment.
This look much better. Can we strengthen the specification of canonicalLayouts in accordance with the class specification
We can't do more in that method javadoc, think, as that has to be general enough for all linkers. I think the rules set up in that method javadoc are good - e.g. the set of layouts should be stable (both in terms of names and layout types). What we can do is to sprinkle some wording in the |
Yes, that's better. |
| SymbolLookup defaultLookup(); | ||
|
|
||
| /** | ||
| * {@return a mapping between the names of data types used by the ABI implemented by this linker and their |
There was a problem hiding this comment.
I think we should state we return "an unmodifiable mapping".
minborg
left a comment
There was a problem hiding this comment.
LGTM. Are there any additional types we might consider apart from the basic ones and size_t? Maybe one for an address pointing at errno?
I think there is a certain gravitational pull towards keeping the set of guaranteed canonical layouts as minimal as possible. In that sense pointer to errno seems not to meet the bar IMHO. (also note that one could always ask the captureStateLayout, and figure out how to express the errno type from there). |
|
@PaulSandoz after thinking some more, it seems a bit ad-hoc to guarantee a canonical for "unsigned short", but not for other unsigned types? Possible alternatives (beside keeping what we have in this PR):
What do you think? |
|
On further reflection i think mapping C What if we say something to the effect of:
? Arguably C FWIW i checked what the FFM API and jextract does today and it maps unsigned C types to signed Java types. |
I tend to agree with your conclusion. And I confirm that we do not use "char" anywhere in jextract. The only "problem" with that approach is that if we go down that path, JAVA_CHAR is no longer a canonical type, so users cannot mention it in function descriptors. Apart from requiring few test updates, I don't see many other problems with it - if one really really wanted the result of a native call to be converted to |
(Another advantage of this is that, should we get proper unsigned carriers from Valhalla one day, native linkers could be updated to support those en masse - not just for |
Yes. |
Beef up javadoc
|
@mcimadamore this pull request can not be integrated into git checkout linker_types
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push |
|
Updated javadoc and specdiffs (v3): |
| { JAVA_BYTE, byteToInt((byte) 42), BYTE_HOB_MASK, BYTE_TO_INT, SAVE_BYTE_AS_INT }, | ||
| { JAVA_SHORT, shortToInt((short) 42), SHORT_HOB_MASK, SHORT_TO_INT, SAVE_SHORT_AS_INT }, | ||
| { JAVA_CHAR, charToInt('a'), CHAR_HOB_MASK, CHAR_TO_INT, SAVE_CHAR_AS_INT } | ||
| { JAVA_SHORT, shortToInt((short) 42), SHORT_HOB_MASK, SHORT_TO_INT, SAVE_SHORT_AS_INT } |
There was a problem hiding this comment.
Since arrays support trailing commas, this can use that:
| { JAVA_SHORT, shortToInt((short) 42), SHORT_HOB_MASK, SHORT_TO_INT, SAVE_SHORT_AS_INT } | |
| { JAVA_SHORT, shortToInt((short) 42), SHORT_HOB_MASK, SHORT_TO_INT, SAVE_SHORT_AS_INT }, |
PaulSandoz
left a comment
There was a problem hiding this comment.
This all looks reasonable. I say let's soak it and then see if we need refine based on feedback and further research (e.g., if we find we need to declare multiple layouts per ABI type for extensibility reasons).
|
Thanks for taking the time to review. After some more consideration, I will withdraw this PR. While this API is largely not problematic, we need to make sure that this API fits with how the FFM API will be evolved to support other types besides the C basic types we know and love (e.g. I will bring over relevant javadoc improvements in the other javadoc PR I have open: https://git.openjdk.org/jdk/pull/14098 |
This patch adds an instance method on
Linker, namelyLinker::canonicalLayoutswhich returns all the layouts known by the linker as implementing some ABI type. For instance, if I call this on my machine (Linux/x64) I get this:This can be useful to discover the ABI types supported by a linker implementation, as well as for, in the future, add support for more exotic (and platform-dependent) linker types, such as
long doubleorcomplex long.Progress
Issues
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/14037/head:pull/14037$ git checkout pull/14037Update a local copy of the PR:
$ git checkout pull/14037$ git pull https://git.openjdk.org/jdk.git pull/14037/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 14037View PR using the GUI difftool:
$ git pr show -t 14037Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/14037.diff
Webrev
Link to Webrev Comment