Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defer initialization of JGroups after logging is set up by Quarkus #29131

Merged

Conversation

ahus1
Copy link
Contributor

@ahus1 ahus1 commented Apr 29, 2024

Closes #29129

Closes keycloak#29129

Signed-off-by: Alexander Schwartz <aschwart@redhat.com>
@ahus1 ahus1 self-assigned this Apr 29, 2024
Copy link
Contributor

@vmuzikar vmuzikar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahus1 Thank you for the PR! The change makes sense to me but let's wait for @mabartos as he has more context about the original change.

@vmuzikar vmuzikar requested a review from mabartos April 29, 2024 08:33
This was referenced Apr 29, 2024
@ahus1 ahus1 marked this pull request as ready for review April 29, 2024 09:02
@ahus1 ahus1 requested review from a team as code owners April 29, 2024 09:02
@mabartos
Copy link
Contributor

@ahus1 Thanks for the investigation around it, good job!

This is unfortunate that these 'revert' changes are required, as they removed the no-work CPU gap as shown here (don't focus on the red circle, but rather on the CPU utilization during the time ~2.3s-4.2s):
image

It was expected that the CPU usage would be higher because these tasks for different threads leverage more parallel executions than in the figure above.

However, as you mentioned, the issue is more critical with the trace logs, assumedly written into the DelayedHandler.

I'm just wondering if we have some possibilities to avoid providing changes in this PR.
What about setting the log level for the logger in the org.jgroups.stack.Protocol class before its initialization and avoiding using the trace level?

At the time of executing the build step, we already have initialized the whole configuration, so if the trace level for JGroups is required, we can allow it.

To summarize it, we could:

  1. Check if the trace level for JGroups is required from the configuration, otherwise, change the level to info or whatever
  2. Set the level for the org.jgroups.stack.Protocol log via its config option?
  3. Start the new cache thread
  4. Do not move the build step into the RUNTIME_INIT phase :)

@ahus1 I haven't tried it, but it might be feasible, right?

@ahus1
Copy link
Contributor Author

ahus1 commented May 14, 2024

@mabartos - the pause you see, is that maybe the pause where the discovery of ISPN takes place? It only applies when the first node is started.

My usual comment about optimizations applies here: If it makes things more fragile and complex, it is a trade-off with maintainability, and I'd rather not do it. Even if we fix it for the logging of the protocol, we might miss other locations, as the JGroups library is also "optimizing".

@pruivo - is the a way using the regular Java APIs to get hold of the instance of org.jgroups.protocols.TP to set the log level? If that is possible, one would also need to verify what kind of log framework JGroups detects here, as the Slf4j implementation doesn't allow for setting the log level and will throw an exception :-(

@pruivo
Copy link
Contributor

pruivo commented May 14, 2024

@ahus1 after the CacheManager is instantiated:

var t = GlobalComponentRegistry.componentOf(cacheManager, Transport.class);
((JGroupsTransport)t).getChannel().getProtocolStack().getTransport().isTrace(false);

@mabartos
Copy link
Contributor

@mabartos - the pause you see, is that maybe the pause where the discovery of ISPN takes place? It only applies when the first node is started.

Yep, AFAIK, during that phase, the cluster is analyzed and checks the existence of the coordinator. If there is any, the node will become the coordinator after exceeding the time. So, it should apply only to the first node, as you've mentioned.

Perhaps it'd be really better to stick with these changes you've provided, as the CPU gap for the first node is not a big deal. It'll probably be better in terms of maintainability, as you've mentioned.

Copy link
Contributor

@mabartos mabartos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my previous comment, I'm ok with these changes.

@vmuzikar vmuzikar merged commit 701e49e into keycloak:main May 14, 2024
65 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JGroups creates log messages as it switched internally to "trace"
4 participants