Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8314949: linux PPC64 Big Endian: Implementation of Foreign Function & Memory API #15417

Closed

Conversation

TheRealMDoerr
Copy link
Contributor

@TheRealMDoerr TheRealMDoerr commented Aug 24, 2023

I've found a way to solve the remaining FFI problem on linux PPC64 Big Endian. Large structs (>8 Bytes) which are passed in registers or on stack require shifting the Bytes in the last slot if the size is not a multiple of 8. This PR adds the required functionality to the Java code.

Please review and provide feedback. There may be better ways to implement it. I just found one which works and makes the tests pass:

Test summary
==============================
   TEST                                              TOTAL  PASS  FAIL ERROR   
   jtreg:test/jdk/java/foreign                          88    88     0     0   

Note: This PR should be considered as preparation work for AIX which also uses ABIv1.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8314949: linux PPC64 Big Endian: Implementation of Foreign Function & Memory API (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/15417/head:pull/15417
$ git checkout pull/15417

Update a local copy of the PR:
$ git checkout pull/15417
$ git pull https://git.openjdk.org/jdk.git pull/15417/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 15417

View PR using the GUI difftool:
$ git pr show -t 15417

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/15417.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 24, 2023

👋 Welcome back mdoerr! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Aug 24, 2023
@openjdk
Copy link

openjdk bot commented Aug 24, 2023

@TheRealMDoerr The following labels will be automatically applied to this pull request:

  • core-libs
  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added hotspot-compiler hotspot-compiler-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Aug 24, 2023
@mlbridge
Copy link

mlbridge bot commented Aug 24, 2023

/**
* PPC64 CallArranger specialized for ABI v1.
*/
public class ABIv1CallArranger extends CallArranger {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be more natural for CallArranger to have an abstract method (or even a kind() accessor for the different kinds of ABI supported) and then have these specialized subclasses return the correct kind? It seems to me that setting the useXYZAbi flag using an instanceof test is a little dirty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had something like that, but another reviewer didn't like it, either. Originally, I had thought that the v1 and v2 CallArrangers would get more content, but they're still empty. Would it be better to remove these special CallArrangers and distinguish in the base class?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I've changed it with the 2nd commit.

* Positive [shiftAmount] converts to long if needed and shifts left.
* Negative [shiftAmount] shifts right and converts to int if needed.
*/
record ShiftLeft(int shiftAmount, Class<?> type) implements Binding {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the situation you are facing, perhaps adding the new binding here is unavoidable. Let's wait to hear from @JornVernee. In the meantime, can you point me to a document which explains this behavior? I'm curious and I'd like to know more :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm starting to see it - it's not a special rule, as much as it is a consequence of the endianness. E.g. if you have a struct that is 64 + 32 bytes, you can store the first 64 bytes as a long. Then, there's an issue as we have to fill another long, but we have only 32 bits of value. Is it the problem that if we just copy the value into the long word "as is" it will be stored in the "wrong" 32 bits? So the shift takes care of that, I guess?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my assumption above is correct, then maybe another way to solve the problem, would be to, instead of adding a new shift binding, to generalize the VM store binding we have to allow writing a smaller value into a bigger storage, with an offset. Correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ABI says: "An aggregate or union smaller than one doubleword in size is padded so that it appears in the least significant bits of the doubleword. All others are padded, if necessary, at their tail." [https://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.html#PARAM-PASS].
I have written examples which pass 9 and 15 Bytes.
In the first case, we need to get 0x0001020304050607 in the first argument and 0x08XXXXXXXXXXXXXX into the second argument (X is "don't care"). Shift amount is 7.
In the second case, we need to get 0x0001020304050607 in the first argument and 0x08090a0b0c0d0eXX into the second argument. Shift amount is 1.
In other words, we need shift amounts between 1 and 7. Stack slots and registers are always 64 bit on PPC64.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it - I found these representations:

https://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi-1.7.html#BYTEORDER

Very helpful. So you have e.g. a short value (loaded from somewhere) and you have to store it on a double-word. Now, if you just stored it at offset 0, you will write the bits 0-15, which are the "most" significant bits in big-endian representation. So, it's backwards. I believe FFM will take care of endianness, so that the bytes 0-7 and 8-15 will be "swapped" when writing into the double-word (right?) but their base offset (0) is still off, as they should really start at offset 48. Hence the shift.

Copy link
Member

@JornVernee JornVernee Sep 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking for a while, I think having the new binding operator is needed. e.g. for stores: we load a 32 bit value from the end of a struct, then the low-order bits of the value needs to be padded with zeros to get a 64 bit register value, leaving the original 32 bit value in the high order bits. This can't be handled by the current cast operator. e.g. if we had an int -> long cast conversion, then in the resulting value the low-order bits would be occupied by the 32 bit value, which is incorrect.

Copy link
Contributor

@mcimadamore mcimadamore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall these changes look good - as commented I'd like to learn a bit more of the underlying ABI, to get a sense of whether adding a new binding is ok. But overall it's great to see support for a big-endian ABI - apart from the linker, I am pleased to see that you did not encounter too many issues in the memory-side of the FFM API.

@Override
public void interpret(Deque<Object> stack, StoreFunc storeFunc,
LoadFunc loadFunc, SegmentAllocator allocator) {
if (shiftAmount > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we assume we can only deal with ints or longs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have inserted casts into public Binding.Builder shiftLeft(int shiftAmount, Class<?> type) (similar to other bindings). The VM handles integral types smaller than int like int and uses 4 Bytes for arithmetic operations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see that now - it's done the binding "builder".

@TheRealMDoerr
Copy link
Contributor Author

@mcimadamore: Thanks for your feedback! Jorn and I had resolved the other issues already when we have worked on the linux little endian part. It already contains some ABIv1 code. Note that we already have one big endian platform: s390. But that one doesn't pass structs >8 Bytes in registers.

Copy link
Member

@JornVernee JornVernee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, I've been on vacation.

Comment on lines 395 to 398
if (shiftAmount > 0 && isSubIntType(type)) {
bindings.add(Binding.cast(type, int.class));
type = int.class;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the casts are handled here with explicit cast bindings, but the widening from int -> long, and narrowing from long -> int are handled implicitly as part of the ShiftLeft implementation. I'd much prefer if all the type conversions are handled with explicit cast bindings. This would also semantically simplify the shift operator, since it would just handle the actual shifting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we would need to add additional bindings for that? Is is worth adding more just for a big endian corner case? Or can that be done with the existing ones?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Cast case for int -> long and long -> int would need to be added. Given the existing setup, that should only be a few lines of code for each. (See e.g. for int -> long master...JornVernee:jdk:I2L). I don't think the cost is that high.

Is is worth adding more just for a big endian corner case?

I think it's worth it in order to have a cleaner contract for the shift ops, should we want to use them for anything else in the future, but also just to make them easier to understand for future readers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth it in order to have a cleaner contract for the shift ops, should we want to use them for anything else in the future, but also just to make them easier to understand for future readers.

I agree that having a cleaner contract for the shift binding would prove useful in the long run. If we do that, we can also simplify the binding itself, as it would no longer need an input type?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as it would no longer need an input type?

Yes. Then both shift ops would always operate on long.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed it. Note that I need many more conversions because buffer load/store also use subtypes of int. Please take a look at my updated version (after commit number 5).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, your changes look great.

* Positive [shiftAmount] converts to long if needed and shifts left.
* Negative [shiftAmount] shifts right and converts to int if needed.
*/
record ShiftLeft(int shiftAmount, Class<?> type) implements Binding {
Copy link
Member

@JornVernee JornVernee Sep 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking for a while, I think having the new binding operator is needed. e.g. for stores: we load a 32 bit value from the end of a struct, then the low-order bits of the value needs to be padded with zeros to get a 64 bit register value, leaving the original 32 bit value in the high order bits. This can't be handled by the current cast operator. e.g. if we had an int -> long cast conversion, then in the resulting value the low-order bits would be occupied by the 32 bit value, which is incorrect.

@TheRealMDoerr
Copy link
Contributor Author

Sorry for the delay, I've been on vacation.

No problem. Hope you had a good time! Thanks for your feedback.

@JornVernee
Copy link
Member

@TheRealMDoerr We've been discussing the shifts in order to wrap our heads around it, and we've ended up with this diagram in order to try and visualize what happens:

Let's say we have a struct with 3 ints:

struct Foo {
    int x;
    int y;
    int z;
};

If this struct is passed as an argument, then the load of the second 'half' of the struct would look like this:

offset         : 0 .... 32 ..... 64 ..... 96 .... 128
values         : xxxxxxxx|yyyyyyyy|zzzzzzzz|????????   (can't touch bits 96..128)
Load int       :                  V        +--------+
                                  |                 |
                                  +--------+        |
                                           V        V
In register    :                   ????????|zzzzzzzz   (MSBs are 0)
Shift left     :                   zzzzzzzz|00000000   (LSBs are zero)
Write long     :                  V                 V
Result         : xxxxxxxx|yyyyyyyy|zzzzzzzz|00000000

So, the 'Result' is padded at the tail with zeros.

Does that seem right? Does it seem useful to add this diagram as a comment somewhere, for us when we come back to this code a year from now? Thanks

@mcimadamore
Copy link
Contributor

If this struct is passed as an argument, then the load of the second 'half' of the struct would look like this:

It would perhaps be cleaner if in the MSB/LSB comments we said:

LSBs are zzz...z
LSBs are 000...0

(e.g. avoid to refer to MSBs in the first, since those bytes are not exactly zero, they are the padding bytes)

@TheRealMDoerr
Copy link
Contributor Author

TheRealMDoerr commented Sep 4, 2023

Hmm. Do you see a good place for such a comment? Maybe it would be better to use a different size for the last chunk. Maybe three or five Bytes. That's even less straight-forward.

@JornVernee
Copy link
Member

Do you see a good place for such a comment?

PPC CallArranger seems like a good place to me. We have a similar explanation comment in the AArch64 CallArranger.

Maybe it would be better to use a different size for the last chunk. Maybe three or five Bytes. That's even less straight-forward.

Does the size matter that much? It just changes the shift amount right? Could use short as type for z as well.

type = int.class;
if (type == int.class || isSubIntType(type)) {
bindings.add(Binding.cast(type, long.class));
type = long.class;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line seems redundant now (type is not used below).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Removed.

Comment on lines 395 to 398
if (shiftAmount > 0 && isSubIntType(type)) {
bindings.add(Binding.cast(type, int.class));
type = int.class;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, your changes look great.

@TheRealMDoerr
Copy link
Contributor Author

Thank you! I have added an example to the CallArranger. Please take a look.

@JornVernee
Copy link
Member

Latest version looks great! I've started a CI job as well. Will approve when that comes back green.

@TheRealMDoerr
Copy link
Contributor Author

Thanks! test/jdk/java/foreign tests have passed on linux x86_64, ppc64 and ppc64le on my side.

Copy link
Member

@JornVernee JornVernee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some known failures in CI (1). So, this is good to go from my perspective.

@openjdk
Copy link

openjdk bot commented Sep 5, 2023

@TheRealMDoerr This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8314949: linux PPC64 Big Endian: Implementation of Foreign Function & Memory API

Reviewed-by: mcimadamore, jvernee

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 129 new commits pushed to the master branch:

  • 939d7c5: 8161536: sun/security/pkcs11/sslecc/ClientJSSEServerJSSE.java fails with ProviderException
  • ebe3127: 8315717: ProblemList serviceability/sa/TestHeapDumpForInvokeDynamic.java with ZGC
  • 969fcdb: 8314191: C2 compilation fails with "bad AD file"
  • cef9fff: 8305507: Add support for grace period before AbortVMOnSafepointTimeout triggers
  • ed2b467: 8315499: build using devkit on Linux ppc64le RHEL puts path to devkit into libsplashscreen
  • 4b44575: 8305637: Remove Opaque1 nodes for Parse Predicates and clean up useless predicate elimination
  • 8647f00: 8293850: need a largest_committed metric for each category of NMT's output
  • 5a2e151: 8315548: G1: Document why VM_G1CollectForAllocation::doit() may allocate without completing a GC
  • 9013b03: 8315442: Enable parallelism in vmTestbase/nsk/monitoring/stress/thread tests
  • 744b397: 8312491: Update Classfile API snippets and examples
  • ... and 119 more: https://git.openjdk.org/jdk/compare/e36620d80ed837b50cb37e1cf0b66a5eb36e4d46...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 5, 2023
@TheRealMDoerr
Copy link
Contributor Author

@mcimadamore: May I ask you or somebody else from the Panama team to provide a 2nd review? This PR requires Panama knowledge, not really PPC knowledge.

@mcimadamore
Copy link
Contributor

@mcimadamore: May I ask you or somebody else from the Panama team to provide a 2nd review? This PR requires Panama knowledge, not really PPC knowledge.

Sorry - I have already reviewed it - but didn't approve as I was waiting for @JornVernee to chime in. Now he has, and I will add my approval as well.

Copy link
Contributor

@mcimadamore mcimadamore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great - thanks!

@TheRealMDoerr
Copy link
Contributor Author

Thanks for your outstanding support! I'm planning to integrate tomorrow.

@TheRealMDoerr
Copy link
Contributor Author

/integrate

@openjdk
Copy link

openjdk bot commented Sep 6, 2023

Going to push as commit f6c203e.
Since your change was applied there have been 138 commits pushed to the master branch:

  • a01b3fb: 8288660: JavaDoc should be more helpful if it doesn't recognize a tag
  • ba1a463: 8315377: C2: assert(u->find_out_with(Op_AddP) == nullptr) failed: more than 2 chained AddP nodes?
  • a258fc4: 8315648: Add test for JDK-8309979 changes
  • 5d3fdc1: 8315612: RISC-V: intrinsic for unsignedMultiplyHigh
  • 5cbff24: 8315406: [REDO] serviceability/jdwp/AllModulesCommandTest.java ignores VM flags
  • 7a08e6b: 8313575: Refactor PKCS11Test tests
  • d3ee704: 8315563: Remove references to JDK-8226420 from problem list
  • aba89f2: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set
  • 1f4cdb3: 8315127: CDSMapTest fails with incorrect number of oop references
  • 939d7c5: 8161536: sun/security/pkcs11/sslecc/ClientJSSEServerJSSE.java fails with ProviderException
  • ... and 128 more: https://git.openjdk.org/jdk/compare/e36620d80ed837b50cb37e1cf0b66a5eb36e4d46...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Sep 6, 2023
@openjdk openjdk bot closed this Sep 6, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Sep 6, 2023
@openjdk
Copy link

openjdk bot commented Sep 6, 2023

@TheRealMDoerr Pushed as commit f6c203e.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@TheRealMDoerr TheRealMDoerr deleted the 8314949_PPC64_Panama_ABIv1 branch September 6, 2023 08:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated
3 participants