-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add JFR ThreadCPULoad event implementation #6053
Add JFR ThreadCPULoad event implementation #6053
Conversation
@roberttoyonaga could you take a look at this please? #5410. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes seem good to me. I wonder if the Javadaoc for com.oracle.svm.core.posix.linux.LinuxThreadCpuTimeSupport.getThreadCpuTime(OSThreadHandle, bool)
is a bit misleading since it does not actually account for user time at all.
Could you review the updated fix? |
Yes I think it's fine to remove Generally, it's more important to ensure that the method that acquires and releases the lock (the one that defines the critical section) never gets inlined. Otherwise, if it's inlined into a interruptible code, it becomes interruptible as well. This is already the case for |
Thank you for your pull request and welcome to our community! To contribute, please sign the Oracle Contributor Agreement (OCA). To sign the OCA, please create an Oracle account and sign the OCA in Oracle's Contributor Agreement Application. When signing the OCA, please provide your GitHub username. After signing the OCA and getting an OCA approval from Oracle, this PR will be automatically updated. If you are an Oracle employee, please make sure that you are a member of the main Oracle GitHub organization, and your membership in this organization is public. |
4f6ef90
to
a1d2f03
Compare
Merge conflicts with following fixes are resolved: The TestThreadCPULoadEvent is updated according to the fix
|
Hi @christianhaeubl, this PR seems ok to me. When you have time, can you see if it's ready for integration? |
I built the fix with the latest graal changes and found that the I merged the fix with the master and updated the
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, I added a few comments.
substratevm/src/com.oracle.svm.test/src/com/oracle/svm/test/jfr/TestThreadCPULoadEvent.java
Show resolved
Hide resolved
@@ -125,6 +131,7 @@ public void teardown() { | |||
public void beforeThreadStart(IsolateThread isolateThread, Thread javaThread) { | |||
if (SubstrateJVM.get().isRecording()) { | |||
SubstrateJVM.getThreadRepo().registerThread(javaThread); | |||
ThreadCPULoadEvent.initializeWallClockTime(isolateThread); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code will only be executed for threads that are started while JFR recording is active. So, this won't work for threads that are already running when JFR recording is started.
Does this value need to be reset when JFR recording is stopped and started again? Or is it sufficient to just set the value once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also added the current time initialization to the JfrThreadRepository.registerRunningThreads()
method to to init the time when JFR begins a recording operation.
The ThreadCPULoadEvent.initCurrentTime()
method is updated to init the current time only the previous time has not been already set.
I would say that the current time value need not to be reset when JFR recording is stopped and started again. In this way the total available time is calculated for the whole period of time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a look at the HotSpot sources and as far as I can see, the value is only reset when the event is emitted. I changed your code accordingly.
...src/com.oracle.svm.core/src/com/oracle/svm/core/jfr/events/EndChunkNativePeriodicEvents.java
Outdated
Show resolved
Hide resolved
substratevm/src/com.oracle.svm.test/src/com/oracle/svm/test/jfr/TestThreadCPULoadEvent.java
Outdated
Show resolved
Hide resolved
substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/jfr/events/ThreadCPULoadEvent.java
Outdated
Show resolved
Hide resolved
substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/thread/PlatformThreads.java
Outdated
Show resolved
Hide resolved
substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/jfr/events/ThreadCPULoadEvent.java
Outdated
Show resolved
Hide resolved
|
||
@Uninterruptible(reason = "Called from uninterruptible code.", mayBeInlined = true) | ||
private static int getProcessorCount() { | ||
int currProcessorCount = Jvm.JVM_ActiveProcessorCount(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the returned value take the container support into account? If so, then it is necessary to call Containers.activeProcessorCount()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JDK JfrThreadCPULoadEvent::get_processor_count() method calls os::active_processor_count()
which includes fixes on linux:
8140793 getAvailableProcessors may incorrectly report the number of cpus in Docker container
6515172 Runtime.availableProcessors() ignores Linux taskset command
It looks like the Containers.activeProcessorCount()
needs to be used to emit ThreadCPULoad
event.
What I see the Containers.activeProcessorCount()
is not uninterruptible:
graal/substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/Containers.java
Line 67 in 4e04678
public static int activeProcessorCount() { |
What is the right way to call interruptible Containers.activeProcessorCount()
method from an uninterruptible code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be good enough to call and cache the result of Containers.activeProcessorCount()
from interruptible code before you enqueue the VMoperation. I think this should be fine because the processor count you get will still be with respect to the interval of time you are calculating the thread CPU load over. The difference is that you poll somewhere in the middle of the interval instead of at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should solve the case where the event is emitted from the interruptible EndChunkNativePeriodicEvents.emit()
path.
It is still not clear for me what can be done with the case where the event is emitted from the pure uninterruptible JfrThreadLocal.afterThreadExit(IsolateThread, Thread)
method.
graal/substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/jfr/JfrThreadLocal.java
Line 134 in 16913ce
public void afterThreadExit(IsolateThread isolateThread, Thread javaThread) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, Containers.activeProcessorCount()
is currently not implemented in a way that it can be called from uninterruptible code. We will need to change that at some point. For now, I added a comment.
substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/jfr/JfrThreadLocal.java
Outdated
Show resolved
Hide resolved
Thanks for the changes. Your PR should get merged in the next few days. |
This is a proposal for JFR ThreadCPULoad event implementation in GraalVM.
The ThreadCPULoad event is emitted from two places in HotSpot: Thread exit and JFR periodic events.
The former ThreadCPULoad event is created for the current thread, the latter one for all VM threads.
The proposed JFR ThreadCPULoad event implementation is based on HotSpot jfrThreadCPULoadEvent code and is made as close as possible to the original one.
JfrNativeEventWriter.putFloat(...)
method is added to put float user/system mode CPU load events values.cpuTime/userTime/wallClockTime
thread local variables are added to JfrThreadLocal classwallClockTime
initialization is added intoJfrThreadLocal.beforeThreadStart()
method (wallClockTime
variable is initialized in the jfrThreadLocal class constructor in HotSpot).IsolateThreadConsumer
interface anditerateIsolateThreads(IsolateThreadConsumer)
method are added toPlatformThreads
class to emit ThreadCPULoad periodic events for all threads.user
ThreadCPULoad value is zero in current implementation because GraalVM LinuxThreadCpuTimeSupport.getThreadCpuTime(OSThreadHandle osThreadHandle, boolean includeSystemTime) method does not support user thread cpu time.Example of ThreadCPULoad events for a Java program with 2 threads which reads and writes data into a RandomAccessFile.
HotSpot:
GraalVM:
Note: ThreadCPULoad event user field has zero value.