Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8273373: Zero: Cannot invoke JVM in primordial threads on Zero #5376

Closed

Conversation

shipilev
Copy link
Member

@shipilev shipilev commented Sep 6, 2021

See the bug for more discussion.

Additional testing:

  • Linux Zero x86_64 passes most gtests now (there are unrelated failures)
  • Linux Zero x86_32 passes most gtests now (there are unrelated failures)

Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8273373: Zero: Cannot invoke JVM in primordial threads on Zero

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/5376/head:pull/5376
$ git checkout pull/5376

Update a local copy of the PR:
$ git checkout pull/5376
$ git pull https://git.openjdk.java.net/jdk pull/5376/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 5376

View PR using the GUI difftool:
$ git pr show -t 5376

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/5376.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Sep 6, 2021

👋 Welcome back shade! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Sep 6, 2021
@openjdk
Copy link

openjdk bot commented Sep 6, 2021

@shipilev The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Sep 6, 2021
@mlbridge
Copy link

mlbridge bot commented Sep 6, 2021

Webrevs

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Aleksey,

The assert in the shared code seems fine. I can't really comment on the Zero changes and the dropping of IA64 support here.

Thanks,
David

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

I do not see a reason why this would be dependent on !zero. Maybe @GoeLin did not want to deal with it, or the problem did not show up on zero in the first place.

I think you are right in that gtestlauncher is the only launcher where we end up running in the primordial thread.

Removing ia64 code block is fine. Arguably zero may be the last variant where ia64 would still be useful. Debian seems still to build Zero+ia64 in openjdk15: https://buildd.debian.org/status/logs.php?pkg=openjdk-15&arch=ia64

About the zero gtest errors, do you understand why they only happen in other_vm resp. death tests? Something must be special here. I am curious since invoking JNI on the primordial thread may not be that well tested.

I tried to understand the primordial-thread-stack-boundaries coding, but it left a number of questions. For instance we try to find the primordial stack dimensions by:

  • read /proc/self/stat, column 28 ("stackstart", bottom address)
  • then we go and find the VMA containing this address, and assume the highest address of this VM is the stack top.

I am not sure the latter always works. Is that not dependent on memory layout? It relies on the process stack having its own VMA, but could the kernel not fold it with a neighboring VMA and show it as one line in /proc/self/maps? In that case, the stack start address we calculate could be bogus.

src/hotspot/os/linux/os_linux.cpp Show resolved Hide resolved
@tstuefe
Copy link
Member

tstuefe commented Sep 8, 2021

I got curious and analyzed this further. Let's see if I got this correctly.

When we create the VM on the primordial thread, we locate the stack boundaries for the primordial thread in os::Linux::capture_initial_stack():

There, we basically look for the boundaries of the primordial thread stack in /proc/pid/maps. We manage to locate the correct VMA for the primordial stack. But then we do not simply take the boundary of that VMA as stack boundary. Instead, we take the high VMA address, subtract an assumed stack size (calculated from -Xss and RLIMIT_STACK), then take that address as stack bottom (_initial_thread_stack_bottom).

Seems that that assumed stack size can be smaller than the real size of the stack, resulting in an offset between the real stack VMA low address and _initial_thread_stack_bottom.

That in itself would be not a problem, but in StackOverflow::create_stack_guard_pages() we place the guard pages at an address which we calculate from os::current_stack_base(). os::current_stack_base() calls current_stack_region(), which on Zero just calls pthread_attr_getstack(3), ignoring the previously captured _initial_thread_stack_bottom.

In os::pd_create_stack_guard_pages() we then essentially use both values - os::current_stack_base() and _initial_thread_stack_bottom - to calculate the arguments to get_stack_commited_bottom(). The resulting assert is caused by mixing those conflicting values.

Bottomline, we should either be using pthread_attr_getstack(3) or our _initial_thread_stack_(bottom|size) but not mix them since they may differ. Therefore the proposed patch is correct. The non-zero version of current_stack_region() in os_linux.cpp handles the primordial thread stack correctly.


There are a number of additional beauty spots, like the fact that in get_stack_commited_bottom() the pages variable is 32bit unsigned. So the assert itself is random in that it depends on whether or not the overflow produces a negative overflow.

Also, I don't understand the logic in os::pd_create_stack_guard_pages() where the "fallback" is to call mincore() with what basically would be a zero size? Since both addr and stack_extend are the stack bottom.


I would probably rename the issue to "Cannot invoke JVM in primordial threads on Zero".

Cheers, Thomas

@shipilev shipilev changed the title 8273373: Zero: Handle thread stack sizes with a generic Linux code 8273373: Zero: Cannot invoke JVM in primordial threads on Zero Sep 8, 2021
@shipilev
Copy link
Member Author

shipilev commented Sep 8, 2021

Bottomline, we should either be using pthread_attr_getstack(3) or our _initial_thread_stack_(bottom|size) but not mix them since they may differ. Therefore the proposed patch is correct. The non-zero version of current_stack_region() in os_linux.cpp handles the primordial thread stack correctly.

Yes, that seems to be the gist of it. In fact, if I reply _initial_thread_stack_(bottom|size) in Zero version of current_stack_region() for a primordial thread, gtests also start to pass. But I gather there is no need to continue supporting Zero-specific current_stack_region(), and thus dodge the issue altogether by using the code the rest of VM code uses.

@shipilev
Copy link
Member Author

shipilev commented Sep 8, 2021

I looked around Debian's support for IA64, and I think we don't want to drop the support for it just yet. Zero VM is one of those last stands for weird arches portability :) Instead of adding the IA64 block to a shared code, I shall try to make a more conservative fix to Zero-specific code then...

@tstuefe
Copy link
Member

tstuefe commented Sep 9, 2021

Bottomline, we should either be using pthread_attr_getstack(3) or our _initial_thread_stack_(bottom|size) but not mix them since they may differ. Therefore the proposed patch is correct. The non-zero version of current_stack_region() in os_linux.cpp handles the primordial thread stack correctly.

Yes, that seems to be the gist of it. In fact, if I reply _initial_thread_stack_(bottom|size) in Zero version of current_stack_region() for a primordial thread, gtests also start to pass. But I gather there is no need to continue supporting Zero-specific current_stack_region(), and thus dodge the issue altogether by using the code the rest of VM code uses.

Yes, and zero also needs Goetz' fix for the glibc guard size problem in JDK-8169373.

@tstuefe
Copy link
Member

tstuefe commented Sep 9, 2021

I looked around Debian's support for IA64, and I think we don't want to drop the support for it just yet. Zero VM is one of those last stands for weird arches portability :) Instead of adding the IA64 block to a shared code, I shall try to make a more conservative fix to Zero-specific code then...

I'd be surprised if it still works in JDK 18 though. The latest they build is 15. ia64 breaks so easily, and no one took care of it in the recent past.

@shipilev
Copy link
Member Author

shipilev commented Sep 9, 2021

I'd be surprised if it still works in JDK 18 though. The latest they build is 15. ia64 breaks so easily, and no one took care of it in the recent past.

Yes, but I also want the fix to be backportable to JDK 11. Which means I should probably keep whatever IA64 support there is. I'll figure this out.

@tstuefe
Copy link
Member

tstuefe commented Sep 9, 2021

I'd be surprised if it still works in JDK 18 though. The latest they build is 15. ia64 breaks so easily, and no one took care of it in the recent past.

Yes, but I also want the fix to be backportable to JDK 11. Which means I should probably keep whatever IA64 support there is. I'll figure this out.

Oh right. Makes sense.

@shipilev
Copy link
Member Author

I pushed the most conservative version I could think of now. Debian folks seem to also patch this method for HPPA: it seems to have unusual stack arrangements too, so I left the original method body as untouched as possible. This still makes gtests happier for Zero.

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@openjdk
Copy link

openjdk bot commented Sep 14, 2021

@shipilev This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8273373: Zero: Cannot invoke JVM in primordial threads on Zero

Reviewed-by: stuefe

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 3 new commits pushed to the master branch:

  • ed7789d: 8256977: Bump minimum GCC from 5.x to 6 for JDK
  • 5bfd043: 8273497: building.md should link to both md and html
  • 3884580: 8273597: Rectify Thread::is_ConcurrentGC_thread()

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 14, 2021
@shipilev
Copy link
Member Author

Thank you!

/integrate

@openjdk
Copy link

openjdk bot commented Sep 14, 2021

Going to push as commit 0f31d0f.
Since your change was applied there have been 7 commits pushed to the master branch:

  • fe89dd3: 8271254: javac generates unreachable code when using empty semicolon statement
  • 8974b95: 8273730: WorkGangBarrierSync constructor unused
  • 1d3eb14: 8273635: Attempting to acquire lock StackWatermark_lock/9 out of order with lock tty_lock/3
  • 31667da: 8273491: java.util.spi.LocaleServiceProvider spec contains statement that is too strict
  • ed7789d: 8256977: Bump minimum GCC from 5.x to 6 for JDK
  • 5bfd043: 8273497: building.md should link to both md and html
  • 3884580: 8273597: Rectify Thread::is_ConcurrentGC_thread()

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Sep 14, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated labels Sep 14, 2021
@openjdk openjdk bot removed the rfr Pull request is ready for review label Sep 14, 2021
@openjdk
Copy link

openjdk bot commented Sep 14, 2021

@shipilev Pushed as commit 0f31d0f.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@shipilev shipilev deleted the JDK-8273373-zero-initial-stack branch September 14, 2021 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated
3 participants