Skip to content

8214097: Rework thread initialization and teardown logic #2411

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

sunny868
Copy link
Contributor

@sunny868 sunny868 commented Dec 19, 2023

runtime/InternalApi/ThreadCpuTimesDeadlock.java can trigger exception occasionally on jdk11u fastdebug, the reason is that when fast_cpu_time(Thread *thread) is called, a non-java-thread state is ALLOCATED, so set_osthread (osthread) or osthread ->set_pthread_id (tid) is not yet completed.

14:13:40 #
14:13:40 # A fatal error has been detected by the Java Runtime Environment:
14:13:40 #
14:13:40 #  SIGSEGV (0xb) at pc=0x000000fff454e840, pid=954448, tid=954670
14:13:40 #
14:13:40 # JRE version: OpenJDK Runtime Environment ls-internal (11.0.22+6) (fastdebug build 11.0.22+6-1)
14:13:40 # Java VM: OpenJDK 64-Bit Server VM ls-internal (fastdebug 11.0.22+6-1, mixed mode, tiered, compressed oops, g1 gc, linux-loongarch64)
14:13:40 # Problematic frame:
14:13:40 # C  [libpthread.so.0+0x12840]  pthread_getcpuclockid+0x0
14:13:40 #

so need backport this patch.
GHA and tier2-3 locally has finish, the failed items are not related to this patch.

==============================
Test summary
==============================
   TEST                                              TOTAL  PASS  FAIL ERROR   
   jtreg:test/hotspot/jtreg:tier2                      323   323     0     0   
>> jtreg:test/jdk:tier2                               3480  3478     2     0 <<
   jtreg:test/langtools:tier2                           11    11     0     0   
   jtreg:test/nashorn:tier2                             36    36     0     0   
>> jtreg:test/jaxp:tier2                               443    24   419     0 <<
==============================
==============================                                                  
Test summary                                                                    
==============================                                                  
   TEST                                              TOTAL  PASS  FAIL ERROR    
   jtreg:test/hotspot/jtreg:tier3                       48    48     0     0    
>> jtreg:test/jdk:tier3                               2114  2113     1     0 << 
   jtreg:test/langtools:tier3                            0     0     0     0    
   jtreg:test/nashorn:tier3                              0     0     0     0    
   jtreg:test/jaxp:tier3                                 0     0     0     0    
============================== 

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • JDK-8214097 needs maintainer approval

Issue

  • JDK-8214097: Rework thread initialization and teardown logic (Enhancement - P4 - Rejected)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk11u-dev.git pull/2411/head:pull/2411
$ git checkout pull/2411

Update a local copy of the PR:
$ git checkout pull/2411
$ git pull https://git.openjdk.org/jdk11u-dev.git pull/2411/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 2411

View PR using the GUI difftool:
$ git pr show -t 2411

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk11u-dev/pull/2411.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 19, 2023

👋 Welcome back sunny868! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot changed the title Backport 526f854cc1ef8e88f9abd0b8eb38e418e77ff97e 8214097: Rework thread initialization and teardown logic Dec 19, 2023
@openjdk
Copy link

openjdk bot commented Dec 19, 2023

This backport pull request has now been updated with issue from the original commit.

@openjdk openjdk bot added backport rfr Pull request is ready for review labels Dec 19, 2023
@mlbridge
Copy link

mlbridge bot commented Dec 19, 2023

Webrevs

@sunny868
Copy link
Contributor Author

/approval request Fix Request jdk11u
Applies not cleanly. manually modify src/hotspot/share/runtime/thread.cpp, src/hotspot/share/services/management.cpp. Do not merged test/hotspot/gtest/threadHelper.inline.hpp. test tier1-3 passed.

@openjdk
Copy link

openjdk bot commented Dec 19, 2023

@sunny868
8214097: The approval request has been created successfully.

@openjdk openjdk bot added the approval label Dec 19, 2023
@GoeLin
Copy link
Member

GoeLin commented Dec 19, 2023

Hi,
please only label fix-request if you have a review. Also, your tests are failing badly! SEe also https://wiki.openjdk.org/display/JDKUpdates/How+to+contribute+or+backport+a+fix

@openjdk openjdk bot removed the approval label Dec 19, 2023
@jerboaa
Copy link
Contributor

jerboaa commented Dec 19, 2023

@sunny868 Why are you backporting this patch to OpenJDK 11u? It's an intrusive patch that changes thread initialization which isn't suitable for backport at this stage in the lifecycle of JDK 11. There must be a very good reason to even consider it.

@sunny868
Copy link
Contributor Author

@sunny868 Why are you backporting this patch to OpenJDK 11u? It's an intrusive patch that changes thread initialization which isn't suitable for backport at this stage in the lifecycle of JDK 11. There must be a very good reason to even consider it.

runtime/InternalApi/ThreadCpuTimesDeadlock.java can trigger exception occasionally. I traced the cause:

// src/hotspot/os/linux/os_linux.cpp
// be called by com.sun.jmx.mbeanserver.JmxMBeanServer

6171 static jlong fast_cpu_time(Thread *thread) {                                    
6172     clockid_t clockid;                                                          
6173     int rc = os::Linux::pthread_getcpuclockid(thread->osthread()->pthread_id(), 
6174                                               &clockid);        //  thread->osthread()  maybe NULL or pthread_id() is 0.
6175     if (rc == 0) {                                                              
6176       return os::Linux::fast_thread_cpu_time(clockid);                          
6177     } else {                                                                    
6178       // It's possible to encounter a terminated native thread that failed      
6179       // to detach itself from the VM - which should result in ESRCH.           
6180       assert_status(rc == ESRCH, rc, "pthread_getcpuclockid failed");           
6181       return -1;                                                                
6182     }                                                                           
6183 } 

Here. Line 6173 thread->osthread() is a possibility of being nullptr because the non-java-thread was placed on the _the_list too early.

// share/runtime/thread.cpp
1309 NonJavaThread::NonJavaThread() : Thread(), _next(NULL) {                        
1310   // Add this thread to _the_list.                                              
1311   MutexLockerEx lock(NonJavaThreadsList_lock, Mutex::_no_safepoint_check_flag); 
1312   _next = _the_list._head;                                                      
1313   OrderAccess::release_store(&_the_list._head, this);          // Do early. It should be done after thread state is RUNNED
1314 } 

I see this patch do add thread to _the_list in NonJavaThread::pre_run(), which can make sure the non-java-thread has finished set_osthread and osthread->set_pthread_id on os::create_thread. The correct order is

os::create_thread(Thread* thread ...) {
      thread->set_osthread(osthread);
      pthread_create(&tid, &attr, (void* (*)(void*)) thread_native_entry, thread);
      osthread->set_pthread_id(tid);
      wait until osthread state equal INITIALIZED.
}

thread_native_entry(Thread *thread) {
  osthread->set_state(INITIALIZED);
  wait until os::start_thread() // set thread state RUNNED
  thread->call_run();  // call NonJavaThread::pre_run(), add thread to _the_list
}

@sunny868
Copy link
Contributor Author

Hi, please only label fix-request if you have a review. Also, your tests are failing badly! SEe also https://wiki.openjdk.org/display/JDKUpdates/How+to+contribute+or+backport+a+fix

Thank you for the reminder. I haven't tested the debug mode locally, so it seems that I still need relevant patch support.

// and that happens just before Thread::current is set. No other thread
// can attach as the VM is not created yet, so they can't execute this code.
// If the main thread creates other threads before the barrier set that is an error.
assert(Thread::current_or_null() == NULL, "creating thread before barrier set");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failed here in fastdebug mode, I'm not sure if this is correct

@jerboaa
Copy link
Contributor

jerboaa commented Dec 19, 2023

@sunny868 Why are you backporting this patch to OpenJDK 11u? It's an intrusive patch that changes thread initialization which isn't suitable for backport at this stage in the lifecycle of JDK 11. There must be a very good reason to even consider it.

runtime/InternalApi/ThreadCpuTimesDeadlock.java can trigger exception occasionally.

I'm sorry, but this isn't a good enough reason to accept the risk of doing such an invasive backport.

@bridgekeeper
Copy link

bridgekeeper bot commented Jan 17, 2024

@sunny868 This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 14, 2024

@sunny868 This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

@bridgekeeper bridgekeeper bot closed this Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

3 participants