Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JFR ThreadCPULoad event implementation #6053

Merged

Conversation

AlexanderScherbatiy
Copy link
Contributor

This is a proposal for JFR ThreadCPULoad event implementation in GraalVM.

The ThreadCPULoad event is emitted from two places in HotSpot: Thread exit and JFR periodic events.
The former ThreadCPULoad event is created for the current thread, the latter one for all VM threads.

The proposed JFR ThreadCPULoad event implementation is based on HotSpot jfrThreadCPULoadEvent code and is made as close as possible to the original one.

  • JfrNativeEventWriter.putFloat(...) method is added to put float user/system mode CPU load events values.
  • cpuTime/userTime/wallClockTime thread local variables are added to JfrThreadLocal class
  • wallClockTime initialization is added into JfrThreadLocal.beforeThreadStart() method (wallClockTime variable is initialized in the jfrThreadLocal class constructor in HotSpot).
  • IsolateThreadConsumer interface and iterateIsolateThreads(IsolateThreadConsumer) method are added to PlatformThreads class to emit ThreadCPULoad periodic events for all threads.
  • user ThreadCPULoad value is zero in current implementation because GraalVM LinuxThreadCpuTimeSupport.getThreadCpuTime(OSThreadHandle osThreadHandle, boolean includeSystemTime) method does not support user thread cpu time.

Example of ThreadCPULoad events for a Java program with 2 threads which reads and writes data into a RandomAccessFile.

HotSpot:

jdk.ThreadCPULoad {
  startTime = 21:24:14.292 (2023-02-22)
  user = 15.28%
  system = 0.53%
  eventThread = "C2 CompilerThread1" (javaThreadId = 14)
}

jdk.ThreadCPULoad {
  startTime = 21:24:17.125 (2023-02-22)
  user = 11.61%
  system = 11.31%
  eventThread = "Hello Thread: 1" (javaThreadId = 27)
}

jdk.ThreadCPULoad {
  startTime = 21:24:17.140 (2023-02-22)
  user = 11.02%
  system = 12.18%
  eventThread = "Hello Thread: 0" (javaThreadId = 26)
}

jdk.ThreadCPULoad {
  startTime = 21:24:17.140 (2023-02-22)
  user = 4.76%
  system = 0.54%
  eventThread = "main" (javaThreadId = 1)
}

GraalVM:
Note: ThreadCPULoad event user field has zero value.

jdk.ThreadCPULoad {
  startTime = 21:25:15.030 (2023-02-22)
  user = 0.00%
  system = 23.81%
  eventThread = "Hello Thread: 0" (javaThreadId = 36)
}

jdk.ThreadCPULoad {
  startTime = 21:25:15.033 (2023-02-22)
  user = 0.00%
  system = 23.59%
  eventThread = "Hello Thread: 1" (javaThreadId = 37)
}

jdk.ThreadCPULoad {
  startTime = 21:25:15.033 (2023-02-22)
  user = 0.00%
  system = 0.00%
  eventThread = "main" (javaThreadId = 1)
}

jdk.ThreadCPULoad {
  startTime = 21:25:15.033 (2023-02-22)
  user = 0.00%
  system = 25.00%
  eventThread = "JFR Shutdown Hook" (javaThreadId = 35)
}

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Feb 22, 2023
@fniephaus
Copy link
Member

@roberttoyonaga could you take a look at this please? #5410.

Copy link
Collaborator

@roberttoyonaga roberttoyonaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes seem good to me. I wonder if the Javadaoc for com.oracle.svm.core.posix.linux.LinuxThreadCpuTimeSupport.getThreadCpuTime(OSThreadHandle, bool) is a bit misleading since it does not actually account for user time at all.

@AlexanderScherbatiy
Copy link
Contributor Author

Could you review the updated fix?
The mayBeInlined = true is removed from @Uninterruptible annotation for IsolateThreadConsumer.accept(IsolateThread) method.

@roberttoyonaga
Copy link
Collaborator

The mayBeInlined = true is removed from @Uninterruptible annotation for IsolateThreadConsumer.accept(IsolateThread) method.

Yes I think it's fine to remove mayBeInlined = true. I don't think it matters too much in this case because IsolateThreadConsumer.accept(IsolateThread) is only called from one method (which is itself uninterruptible and will ensure all callees are also uninterruptible).

Generally, it's more important to ensure that the method that acquires and releases the lock (the one that defines the critical section) never gets inlined. Otherwise, if it's inlined into a interruptible code, it becomes interruptible as well. This is already the case for iterateIsolateThreads(IsolateThreadConsumer), so everything should be ok.

@oracle-contributor-agreement
Copy link

Thank you for your pull request and welcome to our community! To contribute, please sign the Oracle Contributor Agreement (OCA).
The following contributors of this PR have not signed the OCA:

To sign the OCA, please create an Oracle account and sign the OCA in Oracle's Contributor Agreement Application.

When signing the OCA, please provide your GitHub username. After signing the OCA and getting an OCA approval from Oracle, this PR will be automatically updated.

If you are an Oracle employee, please make sure that you are a member of the main Oracle GitHub organization, and your membership in this organization is public.

@oracle-contributor-agreement oracle-contributor-agreement bot added OCA Required At least one contributor does not have an approved Oracle Contributor Agreement. and removed OCA Verified All contributors have signed the Oracle Contributor Agreement. labels Mar 14, 2023
@oracle-contributor-agreement oracle-contributor-agreement bot added OCA Verified All contributors have signed the Oracle Contributor Agreement. and removed OCA Required At least one contributor does not have an approved Oracle Contributor Agreement. labels Mar 15, 2023
@AlexanderScherbatiy
Copy link
Contributor Author

Merge conflicts with following fixes are resolved:

  • Added more documentation and changed the locking concept. 39d9907
  • Fix JFR teardown. 1081ab5

The TestThreadCPULoadEvent is updated according to the fix More fixes and refactorings. 51d19cb

  • parent test is changed from JfrTest to JfrRecordingTest
  • abstract validateEvents(List<RecordedEvent> events) method is overridden

@roberttoyonaga
Copy link
Collaborator

Hi @christianhaeubl, this PR seems ok to me. When you have time, can you see if it's ready for integration?

@AlexanderScherbatiy
Copy link
Contributor Author

I built the fix with the latest graal changes and found that the AbstractJfrTest had been updated by Refactored and fixed issues in the JFR test cases. dea8f7f

I merged the fix with the master and updated the TestThreadCPULoadEvent:

  • removed getTestedEvents() method
  • added startRecording(...)/stopRecording(...) to the test method

@christianhaeubl christianhaeubl self-assigned this Apr 25, 2023
Copy link
Member

@christianhaeubl christianhaeubl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, I added a few comments.

@@ -125,6 +131,7 @@ public void teardown() {
public void beforeThreadStart(IsolateThread isolateThread, Thread javaThread) {
if (SubstrateJVM.get().isRecording()) {
SubstrateJVM.getThreadRepo().registerThread(javaThread);
ThreadCPULoadEvent.initializeWallClockTime(isolateThread);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code will only be executed for threads that are started while JFR recording is active. So, this won't work for threads that are already running when JFR recording is started.

Does this value need to be reset when JFR recording is stopped and started again? Or is it sufficient to just set the value once?

Copy link
Contributor Author

@AlexanderScherbatiy AlexanderScherbatiy May 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also added the current time initialization to the JfrThreadRepository.registerRunningThreads() method to to init the time when JFR begins a recording operation.
The ThreadCPULoadEvent.initCurrentTime() method is updated to init the current time only the previous time has not been already set.

I would say that the current time value need not to be reset when JFR recording is stopped and started again. In this way the total available time is calculated for the whole period of time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look at the HotSpot sources and as far as I can see, the value is only reset when the event is emitted. I changed your code accordingly.


@Uninterruptible(reason = "Called from uninterruptible code.", mayBeInlined = true)
private static int getProcessorCount() {
int currProcessorCount = Jvm.JVM_ActiveProcessorCount();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the returned value take the container support into account? If so, then it is necessary to call Containers.activeProcessorCount().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JDK JfrThreadCPULoadEvent::get_processor_count() method calls os::active_processor_count() which includes fixes on linux:
8140793 getAvailableProcessors may incorrectly report the number of cpus in Docker container
6515172 Runtime.availableProcessors() ignores Linux taskset command

It looks like the Containers.activeProcessorCount() needs to be used to emit ThreadCPULoad event.

What I see the Containers.activeProcessorCount() is not uninterruptible:

What is the right way to call interruptible Containers.activeProcessorCount() method from an uninterruptible code?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be good enough to call and cache the result of Containers.activeProcessorCount() from interruptible code before you enqueue the VMoperation. I think this should be fine because the processor count you get will still be with respect to the interval of time you are calculating the thread CPU load over. The difference is that you poll somewhere in the middle of the interval instead of at the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should solve the case where the event is emitted from the interruptible EndChunkNativePeriodicEvents.emit() path.
It is still not clear for me what can be done with the case where the event is emitted from the pure uninterruptible JfrThreadLocal.afterThreadExit(IsolateThread, Thread) method.

public void afterThreadExit(IsolateThread isolateThread, Thread javaThread) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, Containers.activeProcessorCount() is currently not implemented in a way that it can be called from uninterruptible code. We will need to change that at some point. For now, I added a comment.

@christianhaeubl
Copy link
Member

Thanks for the changes. Your PR should get merged in the next few days.

@graalvmbot graalvmbot merged commit 7280fe8 into oracle:master May 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants