Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC Pause time metric includes concurrent cycle time for Generational ZGC #4764

Closed
rreimann opened this issue Feb 21, 2024 · 1 comment
Closed
Assignees
Labels
bug A general bug module: micrometer-core An issue that is related to our core module
Milestone

Comments

@rreimann
Copy link

Describe the bug
For Generational ZGC in Java 21 the jvm.gc.pause metric doesn't only measure actual pause times but also concurrent GC cycle times.

The gcName(s) provided by GarbageCollectorMXBean / GarbageCollectionNotificationInfo for Generational ZGC are different from the gcName(s) for single-generation ZGC...

gcNames for ZGC:

  • ZGC Pauses
  • ZGC Cycles

gcNames for GenZGC:

  • ZGC Minor Pauses
  • ZGC Major Pauses
  • ZGC Minor Cycles
  • ZGC Major Cycles

The condition "ZGC Cycles".equals(name) to detect concurrent phases for ZGC in JvmMemory#isConcurrentPhase doesn't work for the above mentioned gcNames for GenZGC. Replacing aforementioned condition with something like (name.startsWith("ZGC") && !name.endsWith("Pauses")) should IMHO work for both variants of ZGC.

Environment

  • Micrometer version: 1.12.2
  • Micrometer registry: prometheus
  • OS: macOS
  • Java version: openjdk version "21.0.2" 2024-01-16 LTS

To Reproduce
Run the following code with the JVM options -XX:+UseZGC -XX:+ZGenerational ...

import java.lang.management.ManagementFactory;

public class ExampleGarbageCollectorMXBean {
    public static void main(String[] args) {
        // Run 100 GCs
        for (int i = 0; i < 100; i++) {
            System.gc();
        }

        // Print basic information from available beans
        for (final var bean : ManagementFactory.getGarbageCollectorMXBeans()) {
            System.out.println(bean.getName());
            System.out.println("   Count: " + bean.getCollectionCount());
            System.out.println("   Total Time: " + bean.getCollectionTime() + "ms");
            System.out.println("   Average Time: " + (bean.getCollectionTime() / (double)bean.getCollectionCount()) + "ms");
        }
    }
}

... which should result in an output like this:

ZGC Minor Cycles
   Count: 0
   Total Time: 0ms
   Average Time: NaNms
ZGC Minor Pauses
   Count: 0
   Total Time: 0ms
   Average Time: NaNms
ZGC Major Cycles
   Count: 101
   Total Time: 2292ms
   Average Time: 22.693069306930692ms
ZGC Major Pauses
   Count: 805
   Total Time: 3ms
   Average Time: 0.0037267080745341614ms

Expected behavior
For Generational ZGC ...

  • the time measured by jvm.gc.pause should be the same as ZGC Minor Pauses + ZGC Major Pauses from the output
  • the time measured by jvm.gc.concurrent.phase.time should be the same as ZGC Minor Cycles + ZGC Major Cycles from the output
@shakuzen shakuzen added bug A general bug module: micrometer-core An issue that is related to our core module and removed waiting-for-triage labels Feb 22, 2024
@shakuzen shakuzen added this to the 1.12.x milestone Feb 22, 2024
@shakuzen shakuzen changed the title GC Pause time metric includes concurrent cycle time for Generational ZGC in Java 21 GC Pause time metric includes concurrent cycle time for Generational ZGC Feb 28, 2024
@shakuzen
Copy link
Member

Thanks for reporting the issue with all the details. I missed this when working on #4258, and although we have a test that, if working, should have caught this, it's disabled because GC can happen before JvmGcMetrics is registered and begins tracking GC metrics:

@Test
@Disabled("Garbage collection can happen before JvmGcMetrics are registered, making our metrics not match overall counts/timings")
// available for some platforms and distributions earlier, but broadest availability
// in an LTS is 17
@EnabledForJreRange(min = JRE.JAVA_17)
void gcTimingIsCorrect() {
System.gc();
long pausePhaseCount = 0;
long pauseTimeMs = 0;
long concurrentPhaseCount = 0;
long concurrentTimeMs = 0;
for (GarbageCollectorMXBean mbean : ManagementFactory.getGarbageCollectorMXBeans()) {
if (mbean.getName().contains("Pauses")) {
pausePhaseCount += mbean.getCollectionCount();
pauseTimeMs += mbean.getCollectionTime();
}
else if (mbean.getName().contains("Cycles")) {
concurrentPhaseCount += mbean.getCollectionCount();
concurrentTimeMs += mbean.getCollectionTime();
}
System.out
.println(mbean.getName() + " (" + mbean.getCollectionCount() + ") " + mbean.getCollectionTime() + "ms");
}
checkPhaseCount(pausePhaseCount, concurrentPhaseCount);
checkCollectionTime(pauseTimeMs, concurrentTimeMs);
}

I'll improve that test so it can run at least with some collectors and make sure this doesn't regress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A general bug module: micrometer-core An issue that is related to our core module
Projects
None yet
Development

No branches or pull requests

2 participants