Java route harvesting causes prolonged freezes in Java apps with huge number of symbols #2301
Replies: 3 comments
-
|
Ouch, that doesn't look good. I'm going to turn it off by default until we can investigate it. I'm guessing the holdup is related to how long it takes for OBI to pull down the data and it hold up the JVM. I had seen this kind of delay once in a test k8s cluster but I was never able to reproduce it again. We probably need another approach. |
Beta Was this translation helpful? Give feedback.
-
|
This indeed should be a bug issue. I have created #2304 to track this. @miloszivkovic please take a look at that issue and add any details you can about what caused the freeze. Ideally, if you can provide a way to reproduce it, that would be helpful. |
Beta Was this translation helpful? Give feedback.
-
|
Awesome, thanks @MrAlias and @grcevski. The issue describes things perfectly, I don't have anything to add. We can probably find a publicly available beefy Java app (Kafka maybe?), enable safe point logging and see how long The log I pasted above is from when I triggered the command manually and I don't expect OBI somehow reading the output is gonna make it any better. Either way, I think alternative approach is needed. @grcevski I see you picked this up, many thanks! If help is needed, let me know. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello folks,
I don't think this should be submitted as bug so I'm opening a discussion, hope that's okay.
For an app where
jcmd <PID> VM.symboltable -verboseprints hundreds of thousands of lines, java route harvesting seems to be taking way too long for larger apps. This is a stop-the-world event and app is fully unresponsive during that time. Observed freezes lasting anywhere between 20 and 100 seconds.For example:
[150.815s][info ][safepoint ] Safepoint "DumpHashtable", Time since last: 87138853667 ns, Reaching safepoint: 102250 ns, At safepoint: 19877671666 ns, Leaving safepoint: 1486542 ns, Total: 19879260458 ns, Threads: 0 runnable, 781 totalThis behavior (and configuration around route harvesting) should at least be well documented, and if possible figure out a different way to do it because this is probably not viable for production apps in the future.
I'll give it some thought and maybe follow up here, for now I just wanted to suggest documenting this in details and to start a discussion about possible alternatives. I'm no expert by any means so I don't want to propose half baked solutions at the moment (if there are any alternatives at all).
Beta Was this translation helpful? Give feedback.
All reactions