Skip to content

Conversation

@magicus
Copy link
Member

@magicus magicus commented Feb 21, 2024

Summary: Finally get rid of the mapfiles in Hotspot, and replace it with compiler options and JNIEXPORT on all platforms.

The bug that this PR solves, JDK-8017234, was created in 2013. Even back then the use of mapfiles in Hotspot was dated, so this is really good riddance with old rubbish.

This code touches on central but not well understood parts of the Hotspot dynamic library, which has contributed to why this bug has stayed unresolved for so long. I will need to explain this fix in more detail than usually necessary. (Please bare with me if this gets long.) I also anticipate that not all solutions that I've picked will be accepted, and we'll have to discuss how to proceed. I think it is better to have actual concrete code to discuss around, rather than starting by an abstract discussion. To keep this description short, I will post the discussion as a comment to the PR.

I have run this PR through tier 1-3 in our CI system. I have also carefully checked how the resulting dynamic library differs with this patch (not much; see discussion below). For build system changes, this is often the most relevant metric.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8017234: Hotspot should stop using mapfiles (Enhancement - P3)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/17955/head:pull/17955
$ git checkout pull/17955

Update a local copy of the PR:
$ git checkout pull/17955
$ git pull https://git.openjdk.org/jdk.git pull/17955/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 17955

View PR using the GUI difftool:
$ git pr show -t 17955

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/17955.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 21, 2024

👋 Welcome back ihse! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@magicus
Copy link
Member Author

magicus commented Feb 21, 2024

First some general notes on related build changes: This was the last place in the build system where mapfiles where used. It will now be possible to clean up a lot of this framework, but I will do that separately after this has been integrated. Also, Hotspot is now built much closer to the rest of the native libraries in the JDK, so there is potential for more unification (e.g. placing --exclude-libs,ALL in a common location). This too will be done separately.

On Windows, we still need to do the dance of finding symbols from the object files, and add them to a list of exported files (included by -def:). I decoupled this logic away from the old mapfile stuff, so we can safely delete the mapfile code later on. So what might look like new code in CompileJvm.gmk is actually just old code from JvmMapfile.gmk that I have cleaned up somewhat. These symbols are consumed by SA. It might be that we can limit the amount of symbols exported this way, or maybe that we can find another solution completely in SA, but that will have to be a future topic for research. For now, we keep the same exported symbols.

I have checked the resulting libjvm.so/jvm.dll using the COMPARE_BUILD functionality, which (among other things) analyzes differences between native libraries before and after a patch. This has verified that there is no change in the symbols of the Hotspot dynamic library on macOS/x64, macOS/aarch64 or Windows/x64. I have also gotten help from @MBaesken and @JoKern65 to confirmed that there is no effect on AIX (in fact, AIX never used the mapfile it created...). There are, however, a few minor changes on linux/x64 and linux/aarch64. I will discuss these below.

When I first started doing this conversion, I got a lot more differences.

First of all, the -fvisibility=hidden flag does not apply to assembly files, so the symbols marked .global in these files also needed to be marked .hidden. (On macOS, Apple's clang assembler uses the directive .private_extern instead.) The assembly files had some inconsistent whitespace (using tabs instead of spaces every once in a while), which my editor kept changing, so I gave up and fixed all of them. This makes the diff artificially larger; if someone objects to this I can revert the whitespace changes.

Secondly, it turned out that there were several symbols declared JNIEXPORT that were not included in the symbol list, and hence were not exported on Linux. In contrast to Windows, which only relies on JNIEXPORT, on Linux it was necessary to both use JNIEXPORT and declare the symbol in the symbol list to get it actually exported.

To be consistent with the old behavior, I have undefined JNIEXPORT in these files. It works to keep the output the same, but it does not look very nice. I suggest we either:

a) say we want to remove this #undef and actually export the functions annotated JNIEXPORT. This is what I would recommend. If so, let's keep the ugly undef for now, and I'll remove it in a follow-up bug. (I really would like not to change any behavior with this patch; it is intrusive enough as it is.)

b) say we really don't want to export these functions on Linux, only on Windows. In that case, we should probably replace JNIEXPORT with FOO_EXPORT, and define FOO_EXPORT to be JNIEXPORT on Windows, and blank on Linux.

This is a decision for the Hotspot team; let me know which solution you chose. (It can be different in different places, of course).

And thirdly, there seem to be a bug in gcc, which ignores the -fvisibility=hidden flag in certain circumstances involving templates. I could not pinpoint what conditions were necessary to trigger this bug, but in accessBackend.cpp I have manually added a HIDDEN attribute to a bunch of arraycopy_conjoint functions to keep them from being exported. It looks ugly; and we might be willing to accept that these functions are exported even though they should not be, to get rid of this. Or it might be that someone finds out if what is different with this declarations that trigger this bug, and constructs a work-around. (I gave up on it, though...)

About the changes on linux/x64 and linux/aarch64: I believe the new values are in some way more correct, but I also think the difference is insignificant. The difference is that the data (variable) symbols __bss_start, _edata and _end, and the text (function) symbols _fini and _init has changed from local to global. Afaik, these are symbols created by the linker. Also, when we used a mapfile, the symbol SUNWprivate_1.1 that was part of the mapfile definition was included in libjvm.so, and this is no longer the case.

Finally, in the gtest libjvmo.so there are some additional differences as well. I did not really care about these, since this is only used for testing, but for the sake of completeness, here are the differences. They belong to two different categories.

Group 1 includes four symbols like _Z9type_nameIiEPKcv. These come from this definition:

  template <typename T> const char* type_name();

in test_parse_memory_size.cpp, and is seemingly the same issue as with the arraycopy_conjoint functions in accessBackend.cpp. They have changed from local to global.

Group 2 includes symbols that seem to originate in the googletest library. They have changed
from local to global, but also to weak symbols. They match some different patterns:

  • Seven symbols like _ZNSt15_Sp_counted_ptrIPKN7testing8internal2REELN9__gnu_cxx12_Lock_policyE2EED2Ev.
    Demangled, these read std::_Sp_counted_ptr<testing::internal::RE const*, (__gnu_cxx::_Lock_policy)2>.

  • One symbol _ZN7testing15AssertionResultlsIA12_cEERS0_RKT_ (demangled: testing::AssertionResult& testing::AssertionResult::operator<< <char [12]>(char const (&) [12]))

  • And finally, only on x64 (but not aarch64), one symbol _ZN7testing4Test10HasFailureEv (demangled: testing::Test::HasFailure()).

@openjdk openjdk bot added the rfr Pull request is ready for review label Feb 21, 2024
@openjdk
Copy link

openjdk bot commented Feb 21, 2024

@magicus The following labels will be automatically applied to this pull request:

  • build
  • graal
  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added graal graal-dev@openjdk.org build build-dev@openjdk.org hotspot hotspot-dev@openjdk.org labels Feb 21, 2024
@mlbridge
Copy link

mlbridge bot commented Feb 21, 2024

@dholmes-ora
Copy link
Member

it turned out that there were several symbols declared JNIEXPORT that were not included in the symbol list,

There appear to be two main categories of functions here:

  1. Those intended for use from a debugger

Have you verified those functions can still be called from the debugger?

  1. JVMCI

The Graal folk will need to chime in here as it may be that these are needed for libgraal.

@dholmes-ora
Copy link
Member

My main concern with a change like this as that it likely has little effect on the OpenJDK itself, but we may be changing something that 3rd party applications linking with libjvm are relying on. I realize this is a bit "hand wavy" but are we sure this presents the exact same view of libjvm as before?

Copy link
Member

@djelinski djelinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this! This is a good improvement IMO.

If the SUNWprivate_1.1 part of the symbols is still needed, it can be added using --default-symver ld parameter.

You might want to run tier1-5 tests; compilation-related changes are often detected by higher-level builds and tests only.

@dougxc
Copy link
Member

dougxc commented Feb 22, 2024

The Graal folk will need to chime in here as it may be that these are needed for libgraal.

They should be fine. JNI linkage is only used by the native methods in CompilerToVM.java.

@magicus
Copy link
Member Author

magicus commented Feb 22, 2024

it turned out that there were several symbols declared JNIEXPORT that were not included in the symbol list,

There appear to be two main categories of functions here:

1. Those intended for use from a debugger

Have you verified those functions can still be called from the debugger?

2. JVMCI

The Graal folk will need to chime in here as it may be that these are needed for libgraal.

I tried to be clear, but apparently I failed: There is no change to these symbols in the patch! So everything should work just as before!

The reason I even brought it up to discussion is that it required a bit of a "hack" to keep the functionality identical, and this looks suspicious in the source code.

@magicus
Copy link
Member Author

magicus commented Feb 22, 2024

Thanks for doing this! This is a good improvement IMO.

Thanks!

If the SUNWprivate_1.1 part of the symbols is still needed, it can be added using --default-symver ld parameter.

Ah, good to know. In the spirit of making as minimal changes as possible in this PR, I will add it.

You might want to run tier1-5 tests; compilation-related changes are often detected by higher-level builds and tests only.

I can run tier 4 and 5 also. But once again, the entire point of this PR being so convoluted is that there should be a minimum of differences in compilation.

On macOS and Windows, the binaries before and after this patch are almost identical; just a few hundred assembler instructions have been reordered slightly. (Why this is even so I don't know, but in the end I did not care to dig into the details.)

On Linux, more code is shuffled around, making for a greater bit-by-bit difference. Still, the symbol tables are (almost) identical, and these changes should only affect symbol visibility, so I believe this is mostly an artifact of how the linker organises stuff.

@magicus
Copy link
Member Author

magicus commented Feb 22, 2024

The Graal folk will need to chime in here as it may be that these are needed for libgraal.

They should be fine. JNI linkage is only used by the native methods in CompilerToVM.java.

Once again, it is a misunderstanding that anything here changes. These functions will stay the same before and after this PR.

However, I interpret your comment as that it is possible to remove the JNIEXPORT from these functions in a follow-up PR. And that is good to know!

@magicus
Copy link
Member Author

magicus commented Feb 22, 2024

My main concern with a change like this as that it likely has little effect on the OpenJDK itself, but we may be changing something that 3rd party applications linking with libjvm are relying on. I realize this is a bit "hand wavy" but are we sure this presents the exact same view of libjvm as before?

I realize that my essay-style comment was a bit TL;DR, but I tried to answer questions like this ahead of time in it. Let me quote:

I have checked the resulting libjvm.so/jvm.dll using the COMPARE_BUILD functionality, which (among other things) analyzes differences between native libraries before and after a patch. This has verified that there is no change in the symbols of the Hotspot dynamic library on macOS/x64, macOS/aarch64 or Windows/x64. I have also gotten help from @MBaesken and @JoKern65 to confirmed that there is no effect on AIX.

On linux/x64 and linux/aarch64 there are some very minor changes:

The difference is that the data (variable) symbols __bss_start, _edata and _end, and the text (function) symbols _fini and _init has changed from local to global. Afaik, these are symbols created by the linker. Also, when we used a mapfile, the symbol SUNWprivate_1.1 that was part of the mapfile definition was included in libjvm.so, and this is no longer the case.

So yes, I have worked hard to make sure the resulting libjvm.so/jvm.dll is as close as humanly possible before and after this PR. I cannot think of a way in which 3rd party applications could even spot the difference before and after this patch. (Except for a contrived counterexample like explicitly looking up the visibility of _init and checking that it is indeed local...)

@magicus
Copy link
Member Author

magicus commented Feb 22, 2024

Let me also add some background to the why of this PR.

A good reason is to get rid of a lot of weird code and special cases in the build. It will also speed up the build since we needed to scan all .o files with nm after they were compiled but before they were linked, adding to the slow path of linking hotspot. But for the 10+ years this has been a JBS issue it has never completely pushed the scales.

The triggering factor now is the Hermetic Java project. We need to make properly supported static libraries for all native code in the JDK. One of the main issues with creating a single static library from all the native code is symbol visibility. If the same symbol name is exported from two different libraries, this will cause a failure. This weird "dance" in Hotspot makes it much harder to reason about, or fix, issues related to symbol visibility. That is why I want to have this fixed, once and for all.

@magicus
Copy link
Member Author

magicus commented Feb 22, 2024

If the SUNWprivate_1.1 part of the symbols is still needed, it can be added using --default-symver ld parameter.

Ah, good to know. In the spirit of making as minimal changes as possible in this PR, I will add it.

Unfortunately, that was not so easy. :( --default-symver is only available on the bfd ld, not on gold (which is what Oracle uses). :-(

https://sourceware.org/bugzilla/show_bug.cgi?id=15910

@dougxc
Copy link
Member

dougxc commented Feb 22, 2024

However, I interpret your comment as that it is possible to remove the JNIEXPORT from these functions in a follow-up PR. And that is good to know!

I believe you're right. In any case, most of the JVMCI tests will fail if JNIEXPORT is removed and it turns out to be needed.

@magicus
Copy link
Member Author

magicus commented Feb 22, 2024

I just realized I could keep an extremely simplified linker script ("mapfile") for gcc, and thereby keeping the @SUNWprivate_1.1 on the exported symbols, and keeping __bss_start and friends local. This further minimizes the difference to the existing libjvm.so for gcc builds.

I think we can get rid of this as well going forward, but as with other cleanups, let's do that separately.

@erikj79
Copy link
Member

erikj79 commented Feb 22, 2024

The difference is that the data (variable) symbols __bss_start, _edata and _end, and the text (function) symbols _fini and _init has changed from local to global. Afaik, these are symbols created by the linker. Also, when we used a mapfile, the symbol SUNWprivate_1.1 that was part of the mapfile definition was included in libjvm.so, and this is no longer the case.

Regarding SUNWprivate_1.1, reading JDK-4916160 it seems like it could cause problems for native user libraries linked against an older JDK version if we remove it. I think this is what David Holmes was referring to. So it seems like a good idea to figure out a way to keep it around.

@magicus
Copy link
Member Author

magicus commented Feb 22, 2024

Regarding SUNWprivate_1.1, reading JDK-4916160 it seems like it could cause problems for native user libraries linked against an older JDK version if we remove it. I think this is what David Holmes was referring to. So it seems like a good idea to figure out a way to keep it around.

"Forward compatibility issue : 1.4.1_03 onwards". Wow, that's a blast from the past! The sins of the fathers, etc. :)

I'm not sure I understand JDK-4916160 completely, but it seems they had the opposite situation -- prior to 1.4.1_03 we did not link with version scripts, and from 1.4.1_03 we did. And this was closed as "not an issue".

Nevertheless, I have found a way to keep the SUNWprivate_1.1 that is not too ugly, so with the latest commit to this PR I am doing that. I have now verified that this makes no visible difference in the symbol tables at all between the current libjvm.so and the one produced with this patch on gcc. There is still a difference in the ordering of symbols; it seems like the linker is using a hash table and something is making it different -- perhaps the explicit ordering given in the old mapfile affected the way the linker produced the output. So it is still not a bit-by-bit equality, but it is as far as it is reasonable to get.

@magicus
Copy link
Member Author

magicus commented Feb 22, 2024

Just a thought: Perhaps worth introducing some kind of macro that captures defining a symbol in assembly code (something that does .globl + .hidden on Linux and .private_extern on APPLE?

I have thought the same. :) There seem to be room for improvement in a simple "framework" to reduce duplication across the .S files. (There is also plenty of room for improvement in coordinating the style across the .S files...) I believe one reason this has not yet happened is that up until a year or two ago, most Hotspot assembly files where .s (which are not preprocessed) as opposed to .S. I standardized on .S for all assembly code in the JDK project in JDK-8264188, and after that, the door has been open for the Hotspot developers to create such a macro.

Also, separate from this change: perhaps worth turning on whitespace checking (no tabs) for .S files?

I'm way ahead of you there. :) That is the next one up my pipeline in "turning on jcheck for all text files, one file type at the time". But is a chore, really, to get those kinds of fixes approved. And I wanted to get progress in the mapfile question first.

@dholmes-ora
Copy link
Member

I tried to be clear, but apparently I failed: There is no change to these symbols in the patch! So everything should work just as before!

Yes but "should work" and "does work" may not be the same unless you have actually verified the functionality. That said I now see that we only actually need to export on Windows for the debugging functions to work, so I agree everything should work just fine in that regard.

Apologies - It took me a little while to click to the fact not all JNIEXPORT symbols were actually being exported due to their omission in the mapfile - hence the need to now actually hide them to keep things working the same. (FWIW I don't think it matters if we export the debug functions unnecessarily on some platforms - I'd go for code simplicity/consistency. EDIT and I see you went that way too.)

@magicus
Copy link
Member Author

magicus commented Feb 23, 2024

Just to confirm: this PR passes tier 1-5.

@MBaesken
Copy link
Member

MBaesken commented Feb 23, 2024

Just to confirm: this PR passes tier 1-5.

We had the PR in our SAP internal build/test queue, no issues seen with it.

@magicus
Copy link
Member Author

magicus commented Feb 23, 2024

We had the PR in our SAP internal build/test queue, so issues seen with it.

What issues did you see? Or was that a typo for "no issues"?

@MBaesken
Copy link
Member

We had the PR in our SAP internal build/test queue, so issues seen with it.

What issues did you see? Or was that a typo for "no issues"?

Sorry Magnus, tests were fine no issues were observed.

@djelinski
Copy link
Member

Great. The only remaining difference I see is that the PR adds the following export:

_ZGRN14AsyncLogWriter4NoneE_@@SUNWprivate_1.1

Any idea what it might be? If not, I guess we can live with that.

@erikj79
Copy link
Member

erikj79 commented Feb 23, 2024

I just realized I could keep an extremely simplified linker script ("mapfile") for gcc, and thereby keeping the @SUNWprivate_1.1 on the exported symbols, and keeping __bss_start and friends local. This further minimizes the difference to the existing libjvm.so for gcc builds.

Why do this just for GCC? Shouldn't this be for Linux as we are doing it for backwards compatibility with user JNI libraries.

Copy link
Member

@djelinski djelinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows-x64 and Linux-x64 look fine to me.

@openjdk
Copy link

openjdk bot commented Feb 23, 2024

@magicus This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8017234: Hotspot should stop using mapfiles

Reviewed-by: djelinski, erikj, dholmes

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 9 new commits pushed to the master branch:

  • 0901ded: 8326433: Make file-local functions static in src/jdk.jdwp.agent/unix/native/libjdwp/exec_md.c
  • d22d890: 8325898: ChoiceFormat returns erroneous result when formatting bad pattern
  • 93feda3: 8322962: Upcall stub might go undetected when freezing frames
  • fc67c2b: 8325530: Vague error message when com.sun.tools.attach.VirtualMachine fails to load agent library
  • b87d9cf: 8325506: Ensure randomness is only read from provided SecureRandom object
  • 0963a4e: 8326699: Problemlist CertMsgCheck.java
  • bf13a4e: 8322881: java/nio/file/Files/CopyMoveVariations.java fails with AccessDeniedException due to permissions of files in /tmp
  • f62b578: 8311644: Server should not send bad_certificate alert when the client does not send any certificates
  • 9a9cfbe: 8325340: Add ASCII fast-path to Data-/ObjectInputStream.readUTF

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Feb 23, 2024
@magicus
Copy link
Member Author

magicus commented Feb 26, 2024

Great. The only remaining difference I see is that the PR adds the following export:

_ZGRN14AsyncLogWriter4NoneE_@@SUNWprivate_1.1

Any idea what it might be? If not, I guess we can live with that.

No. I vaguely recognize this symbol as sticking out, but can't find it now in my runs. Which gcc version are you using?

I think this is another example of the gcc bug. c++filt says reference temporary #0 for AsyncLogWriter::None; I guess this is from logAsyncWriter.cpp line 47-49:

// LogDecorator::None applies to 'constant initialization' because of its constexpr constructor.
const LogDecorations& AsyncLogWriter::None = LogDecorations(LogLevel::Warning, LogTagSetMapping<LogTag::__NO_TAG>::tagset(),
                                      LogDecorators::None);

But I think we can live with that anyway, even if it gets exported for some gcc versions after this PR.

@djelinski
Copy link
Member

This is on GCC 9.4.0, Ubuntu 20.04.
I wonder if the reference to a temporary was intentional... but I guess that's a subject for another PR.

@magicus
Copy link
Member Author

magicus commented Feb 26, 2024

I just realized I could keep an extremely simplified linker script ("mapfile") for gcc, and thereby keeping the @SUNWprivate_1.1 on the exported symbols, and keeping __bss_start and friends local. This further minimizes the difference to the existing libjvm.so for gcc builds.

Why do this just for GCC? Shouldn't this be for Linux as we are doing it for backwards compatibility with user JNI libraries.

Good point. I did not really consider the case of clang on linux, since this is a secondary option, and I did not believe keeping consistency there was important. But as you say, the fix is trivial and makes a lot of sense. I have now verified that the symbols look identical when compiling with clang, between this PR and the latest (and hopefully last!) update, and mainline.

@magicus
Copy link
Member Author

magicus commented Feb 26, 2024

Can I get a second Hotspot reviewer for the Hotspot changes?

Copy link
Member

@erikj79 erikj79 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a build point of view, I think this looks good now.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given discussions and additional testing/validation, the hotspot changes appear okay. Thanks.

@magicus
Copy link
Member Author

magicus commented Feb 27, 2024

/integrate

@openjdk
Copy link

openjdk bot commented Feb 27, 2024

Going to push as commit da14aa4.
Since your change was applied there have been 19 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Feb 27, 2024
@openjdk openjdk bot closed this Feb 27, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Feb 27, 2024
@openjdk
Copy link

openjdk bot commented Feb 27, 2024

@magicus Pushed as commit da14aa4.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@magicus magicus deleted the hotspot-symbols-proper branch February 27, 2024 10:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build build-dev@openjdk.org graal graal-dev@openjdk.org hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

9 participants