Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8273967: gtest os.dll_address_to_function_and_library_name_vm fails on macOS12 #6193

Closed
wants to merge 2 commits into from

Conversation

dcubed-ojdk
Copy link
Member

@dcubed-ojdk dcubed-ojdk commented Nov 1, 2021

macOS12 has changed the dladdr() function to accept "-1" as a valid address and
we have functions that use dladdr() to convert DLL addresses into function or
library names. We also have a gtest that verifies that "-1" is not a valid value to use
as a symbol address.

As you might imagine, existing code that uses os::dll_address_to_function_name()
or os::dll_address_to_library_name() can get quite confused (and sometimes crash)
if an addr parameter of -1 was allowed to be used.

I've also made two cleanup changes as part of this fix:

  1. In src/hotspot/os/bsd/os_bsd.cpp there is some macOS specific code that should
    be properly #ifdef'ed. There is also some code that makes sense for ELF format
    files, but not for Mach-O format files so that code needs to be excluded on macOS.

  2. In src/hotspot/share/runtime/os.cpp I noticed a simple typo in a comment on an
    #endif that I fixed. That typo does not appear anywhere else in the HotSpot code
    base so I'd like to fix it with this bug ID since I'm in related areas.

This fix has been tested with Mach5 Tier[1-6].


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8273967: gtest os.dll_address_to_function_and_library_name_vm fails on macOS12

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/6193/head:pull/6193
$ git checkout pull/6193

Update a local copy of the PR:
$ git checkout pull/6193
$ git pull https://git.openjdk.java.net/jdk pull/6193/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 6193

View PR using the GUI difftool:
$ git pr show -t 6193

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/6193.diff

@dcubed-ojdk
Copy link
Member Author

/label add hotspot-runtime

@dcubed-ojdk
Copy link
Member Author

/label add serviceability

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 1, 2021

👋 Welcome back dcubed! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@dcubed-ojdk dcubed-ojdk marked this pull request as ready for review November 1, 2021 15:35
@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Nov 1, 2021
@openjdk
Copy link

openjdk bot commented Nov 1, 2021

@dcubed-ojdk
The hotspot-runtime label was successfully added.

@openjdk openjdk bot added rfr Pull request is ready for review serviceability serviceability-dev@openjdk.org labels Nov 1, 2021
@openjdk
Copy link

openjdk bot commented Nov 1, 2021

@dcubed-ojdk
The serviceability label was successfully added.

@mlbridge
Copy link

mlbridge bot commented Nov 1, 2021

Webrevs

@dcubed-ojdk
Copy link
Member Author

Ping! Any takers?

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Daniel,

looks good. Small remarks below. I leave it up to you if you take my suggestions.

Cheers, Thomas

// The Mach-O binary format does not contain a "list of files" with address
// ranges like ELF. That makes sense since Mach-O can contain binaries for
// than one instruction set so there can be more than one address range for
// each "file".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit, it seems confusing to have a Mac-specific comment in the BSD section.

Maybe this would be better in MachDecoder? E.g. implement the 6-arg version of decode() but stubbed out returning false, and with your comment there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually fairly common to have Mac-specific stuff in the BSD files. The macOS
port was built on top of the BSD port and the BSD port was built by copying a LOT
of code from Linux into BSD specific files with modifications as needed.

If I pushed this change down into MachDecoder, then I would have to lose the
ShouldNotReachHere() call in order to not assert in non-release bits. I don't
think I want to do that since this may not be the only place that calls the
6-arg version of decode().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually fairly common to have Mac-specific stuff in the BSD files. The macOS port was built on top of the BSD port and the BSD port was built by copying a LOT of code from Linux into BSD specific files with modifications as needed.

I always wondered whether anyone actually builds the BSDs in head. I assume Oracle does not, right? I know there are downstream porters somewhere but only for old releases, or?

If I pushed this change down into MachDecoder, then I would have to lose the ShouldNotReachHere() call in order to not assert in non-release bits. I don't think I want to do that since this may not be the only place that calls the 6-arg version of decode().

Fair enough, thanks for the clarification.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oracle does not build BSD in head. At one point, Dmitry Samersoff used to build BSD
in his lab, but I don't know if he still does that.

if (offset) *offset = -1;
return false;
}
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use dladdr() in several places in this code. I wonder whether it would make sense to fix all of those with a wrapper instead:

     static int my_dladdr(const void* addr, Dl_info* info) {
     	if (addr != (void*)-1) {
     	   return dladdr(addr, info);
     	} else {
     	   // add comment here
     	   return 0;
     	}
     }
#define dladdr my_dladdr

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look at the other calls to dladdr(). I'm trying to limit what I change
here to things that actually failed in our test on macOS12 on X64 and aarch64.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a quick look at the other calls to dladdr() in src/hotspot/os/bsd/os_bsd.cpp
and I'm not comfortable with changing those uses without having a specific test
case that I can use to investigate those code paths.

We are fairly early in our testing on macOS12 so I may run into a reason to revisit
this choice down the road.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a quick look at the other calls to dladdr() in src/hotspot/os/bsd/os_bsd.cpp and I'm not comfortable with changing those uses without having a specific test case that I can use to investigate those code paths.

We are fairly early in our testing on macOS12 so I may run into a reason to revisit this choice down the road.

Okay!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had a chance to think about this overnight and I'm not liking
my duplication of code so I'm going to look at adding a wrapper
that is called by the two calls sites where know I need the special
handling.

Copy link
Member Author

@dcubed-ojdk dcubed-ojdk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tstuefe - Thanks for your review.

if (offset) *offset = -1;
return false;
}
#endif
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look at the other calls to dladdr(). I'm trying to limit what I change
here to things that actually failed in our test on macOS12 on X64 and aarch64.

// The Mach-O binary format does not contain a "list of files" with address
// ranges like ELF. That makes sense since Mach-O can contain binaries for
// than one instruction set so there can be more than one address range for
// each "file".
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually fairly common to have Mac-specific stuff in the BSD files. The macOS
port was built on top of the BSD port and the BSD port was built by copying a LOT
of code from Linux into BSD specific files with modifications as needed.

If I pushed this change down into MachDecoder, then I would have to lose the
ShouldNotReachHere() call in order to not assert in non-release bits. I don't
think I want to do that since this may not be the only place that calls the
6-arg version of decode().

Copy link

@gerard-ziemski gerard-ziemski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few optional nitpicks that I personally would have done, if it were me doing the change.

Comment on lines 927 to 929
#endif

#if defined(__APPLE__)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just do:

#else

here instead and collapse these 3 lines into 1?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... I'll take a look at doing that.


#if defined(__APPLE__)
char localbuf[MACH_MAXSYMLEN];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This __APPLE__ section is the only one, that I can see, using MACH_MAXSYMLEN, why not move:

#if defined(__APPLE__)
 #define MACH_MAXSYMLEN 256
 #endif

here (i.e. just the #define MACH_MAXSYMLEN 256 and minimize the need for __APPLE__ sections?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm.... I'll take a look at cleaning this up a bit.

Comment on lines 958 to 964
if (addr == (address)(intptr_t)-1) {
// dladdr() in macOS12/Monterey returns success for -1, but that addr
// value should not be allowed to work to avoid confusion.
buf[0] = '\0';
if (offset) *offset = -1;
return false;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this here? Wouldn't the earlier call to Decoder::decode(addr, localbuf, MACH_MAXSYMLEN, offset, dlinfo.dli_fbase)) catch this with ShouldNotReachHere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the 5-parameter version of decode() and it doesn't have ShouldNotReachHere.

So if that code site is called and returns false, then we get into
dll_address_to_library_name() and reach this dladdr() call which
will accept the "-1"...

@openjdk
Copy link

openjdk bot commented Nov 4, 2021

@dcubed-ojdk This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8273967: gtest os.dll_address_to_function_and_library_name_vm fails on macOS12

Reviewed-by: stuefe, gziemski

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 85 new commits pushed to the master branch:

  • c393ee8: 8276632: Use blessed modifier order in security-libs code
  • 7023b3f: 8276628: Use blessed modifier order in serviceability code
  • b933136: 8276641: Use blessed modifier order in jshell
  • 0616d86: 8276635: Use blessed modifier order in compiler code
  • d95299a: 8276634: Remove usePlainDatagramSocketImpl option from the test DatagramChannel/SendReceiveMaxSize.java
  • 3c0faa7: 8276173: Clean up and remove unneeded casts in HeapDumper
  • 323d201: 8275506: Rename allocated_on_stack to allocated_on_stack_or_embedded
  • 96c396b: 8276151: AArch64: Incorrect result for double to int vector conversion
  • 7281861: 8272065: jcmd cannot rely on the old core reflection implementation which will be changed after JEP 416
  • 8e17ce0: 8275185: Remove dead code and clean up jvmstat LocalVmManager
  • ... and 75 more: https://git.openjdk.java.net/jdk/compare/b7104ba9a9006ab65e08ea9d7db22e72611ed07c...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Nov 4, 2021
Copy link
Member Author

@dcubed-ojdk dcubed-ojdk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tstuefe - Thanks for closing the loop on my previous replies.

@gerard-ziemski - Thanks for the review!

I'm going to make more tweaks to this fix and will update the
PR after my test cycle is complete.

if (offset) *offset = -1;
return false;
}
#endif
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had a chance to think about this overnight and I'm not liking
my duplication of code so I'm going to look at adding a wrapper
that is called by the two calls sites where know I need the special
handling.

// The Mach-O binary format does not contain a "list of files" with address
// ranges like ELF. That makes sense since Mach-O can contain binaries for
// than one instruction set so there can be more than one address range for
// each "file".
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oracle does not build BSD in head. At one point, Dmitry Samersoff used to build BSD
in his lab, but I don't know if he still does that.


#if defined(__APPLE__)
char localbuf[MACH_MAXSYMLEN];
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm.... I'll take a look at cleaning this up a bit.

Comment on lines 927 to 929
#endif

#if defined(__APPLE__)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... I'll take a look at doing that.

Comment on lines 958 to 964
if (addr == (address)(intptr_t)-1) {
// dladdr() in macOS12/Monterey returns success for -1, but that addr
// value should not be allowed to work to avoid confusion.
buf[0] = '\0';
if (offset) *offset = -1;
return false;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the 5-parameter version of decode() and it doesn't have ShouldNotReachHere.

So if that code site is called and returns false, then we get into
dll_address_to_library_name() and reach this dladdr() call which
will accept the "-1"...

@dcubed-ojdk
Copy link
Member Author

@tstuefe and @gerard-ziemski - please re-review when you get the chance.

@dcubed-ojdk
Copy link
Member Author

This version has been tested with Mach5 Tier1 and with runs of the specific
tests using release and debug bits on macosx-aarch64 and macosx-x64
test machines running macOS12.

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@dcubed-ojdk
Copy link
Member Author

@tstuefe - Thanks for the re-review!

@gerard-ziemski
Copy link

Thank you Dan for the fix!

@dcubed-ojdk
Copy link
Member Author

@gerard-ziemski - Thanks for the re-review!

@dcubed-ojdk
Copy link
Member Author

/integrate

@openjdk
Copy link

openjdk bot commented Nov 5, 2021

Going to push as commit 92d2176.
Since your change was applied there have been 86 commits pushed to the master branch:

  • a74a839: 8276571: C2: pass compilation options as structure
  • c393ee8: 8276632: Use blessed modifier order in security-libs code
  • 7023b3f: 8276628: Use blessed modifier order in serviceability code
  • b933136: 8276641: Use blessed modifier order in jshell
  • 0616d86: 8276635: Use blessed modifier order in compiler code
  • d95299a: 8276634: Remove usePlainDatagramSocketImpl option from the test DatagramChannel/SendReceiveMaxSize.java
  • 3c0faa7: 8276173: Clean up and remove unneeded casts in HeapDumper
  • 323d201: 8275506: Rename allocated_on_stack to allocated_on_stack_or_embedded
  • 96c396b: 8276151: AArch64: Incorrect result for double to int vector conversion
  • 7281861: 8272065: jcmd cannot rely on the old core reflection implementation which will be changed after JEP 416
  • ... and 76 more: https://git.openjdk.java.net/jdk/compare/b7104ba9a9006ab65e08ea9d7db22e72611ed07c...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Nov 5, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Nov 5, 2021
@openjdk
Copy link

openjdk bot commented Nov 5, 2021

@dcubed-ojdk Pushed as commit 92d2176.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@dcubed-ojdk dcubed-ojdk deleted the JDK-8273967 branch November 5, 2021 19:04
@dnadlinger
Copy link

dnadlinger commented Dec 31, 2021

@dcubed-ojdk: I just ran into this in the Julia runtime and reached a similar conclusion/fix. Did you find out anything more about why this happens? At a glance, I didn't see any blatant addr == -1-style checks in the latest dyld open-source dump.

@dcubed-ojdk
Copy link
Member Author

@dnadlinger
Copy link

@dcubed-ojdk: Thanks for taking the time to replyr regardless! The dlopen/dladdr/… man pages all show up fine on my machine, by the way (macOS 12.1).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated serviceability serviceability-dev@openjdk.org
4 participants