Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-38692] Add extra logging to help diagnosing Thread spike. #116

Merged
merged 2 commits into from Oct 25, 2016

Conversation

@jtnord
Copy link
Member

commented Oct 6, 2016

JENKINS-38692 Add extra logging statements to try and get to the root cause
of the IOHub thread spike.

@reviewbybees esp @stephenc

[JENKINS-38692] Add extra logging to help diagnosing Thread spike.
Add extra logging statements to try and get to the root cause
of the IOHub thread spike.
@jtnord

This comment has been minimized.

Copy link
Member Author

commented Oct 6, 2016

pending running through the stress test benchmark to check for performance regressions.

@jtnord

This comment has been minimized.

Copy link
Member Author

commented Oct 6, 2016

Stress test results
Fix uses more RAM, but has a lower VM load.

@@ -292,6 +292,10 @@ public final void removeInterestConnect(SelectionKey key) {
* @param key the key.
*/
public final void addInterestRead(SelectionKey key) {
if (LOGGER.isLoggable(Level.FINEST)) {

This comment has been minimized.

Copy link
@stephenc

stephenc Oct 6, 2016

Member

I thought you were going to pull this out to a class level constant initialized on classloading so that hotspot can reduce to no-op

This comment has been minimized.

Copy link
@batmat

batmat Oct 15, 2016

Member

I wouldn't have added the if here. It's adding verbosity for no reason in that case. I suppose you added that because of the TODO below...

This comment has been minimized.

Copy link
@jtnord

jtnord Oct 24, 2016

Author Member

@stephenc so the performance without inlining didn't seem to bad - and if we leave it as is then you can use the standard Jenkins UI (or JMX for agents) to change the logging level

This comment has been minimized.

Copy link
@jtnord

jtnord Oct 25, 2016

Author Member

I suppose you added that because of the TODO below...

yes - easier to do it now and know the performance hit - also prevents someone forgetting to do it later.

@stephenc

This comment has been minimized.

Copy link
Member

commented Oct 6, 2016

Single measurement results

agent-load-test --protocol JNLP4-plaintext --warmup 60 --collect 180 on OS-X, java version "1.8.0_102"

CPU load before: 1.53
CPU load after: 1.49

Avg Memory before: 195592±43263
Avg Memory after: 190434±41279

No significant difference observed... I'll need to collect a few more runs to get some measure on CPU load effects

@oleg-nenashev

This comment has been minimized.

Copy link
Member

commented Oct 9, 2016

I think this PR is fine. More diagnostics rarely hurts, and additional condition check should not cause a significant performance degradation. So 👍 form me. BTW, it's going into remoting 3.1+ IMHO

@amuniz

This comment has been minimized.

Copy link
Member

commented Oct 10, 2016

👍

@oleg-nenashev

This comment has been minimized.

Copy link
Member

commented Oct 14, 2016

Retriggering

@batmat
batmat approved these changes Oct 15, 2016
Copy link
Member

left a comment

🐝

@@ -292,6 +292,10 @@ public final void removeInterestConnect(SelectionKey key) {
* @param key the key.
*/
public final void addInterestRead(SelectionKey key) {
if (LOGGER.isLoggable(Level.FINEST)) {

This comment has been minimized.

Copy link
@batmat

batmat Oct 15, 2016

Member

I wouldn't have added the if here. It's adding verbosity for no reason in that case. I suppose you added that because of the TODO below...

@@ -480,7 +496,11 @@ public final void run() {
* Process the scheduled tasks list.
*/
private void processScheduledTasks() {
if (scheduledTasks.size() > 4) {
final int tasksWaiting = scheduledTasks.size();
if (LOGGER.isLoggable(Level.FINEST)) {

This comment has been minimized.

Copy link
@batmat

This comment has been minimized.

Copy link
@jtnord

jtnord Oct 25, 2016

Author Member

test saves auto-boxing and GC churn

@stephenc

This comment has been minimized.

Copy link
Member

commented Oct 15, 2016

@batmat those if statements are critical for performance as this is hot code and they avoid constructing the new Object[]

@batmat

This comment has been minimized.

Copy link
Member

commented Oct 15, 2016

@stephenc my comment wasn't blindly targeting all ifs. I'm aware of what you say. The ones I commented on weren't constructing objects. Most here actually currently only pass references and don't construct anything.

@stephenc

This comment has been minimized.

Copy link
Member

commented Oct 15, 2016

Still having to set up the stack frame to pass through... the testing I did on a few of these showed a negative impact where the if was removed... there was negligible difference vs where I moved the LOGGER.isLoggable(Level.FINEST) to a class level final boolean though

@jtnord

This comment has been minimized.

Copy link
Member Author

commented Oct 24, 2016

@reviewbybees done - @oleg-nenashev over to you.

@oleg-nenashev

This comment has been minimized.

Copy link
Member

commented Oct 24, 2016

is it still WiP? I doubt

@oleg-nenashev oleg-nenashev merged commit 46c72a5 into jenkinsci:master Oct 25, 2016

1 of 2 checks passed

continuous-integration/jenkins/pr-head This commit is being built
Details
Jenkins This pull request looks good
Details
oleg-nenashev added a commit to oleg-nenashev/jenkins that referenced this pull request Nov 9, 2016
Update remoting to 2.31 in the Jenkins core.
The change introduces one serious bugfix (JENKINS-39596) and a bunch of various diagnostics improvements.

Bugfixes:

* [JENKINS-39596](https://issues.jenkins-ci.org/browse/JENKINS-39596) -
Jenkins URL in `hudson.remoting.Engine` was always `null` since `3.0`.
It was causing connection failures of Jenkins JNLP agents when using Java Web Start.
([PR jenkinsci#131](jenkinsci/remoting#131))
* [JENKINS-39617](https://issues.jenkins-ci.org/browse/JENKINS-39617) -
`hudson.remoting.Engine` was failing to establish connection if one of the URLs parameter in parameters was malformed.
([PR jenkinsci#131](jenkinsci/remoting#131))

Improvements:

* [JENKINS-39150](https://issues.jenkins-ci.org/browse/JENKINS-39150) -
Add logic for dumping diagnostics across all the channels.
([PR jenkinsci#122](jenkinsci/remoting#122), [PR jenkinsci#125](jenkinsci/remoting#125))
* [JENKINS-39543](https://issues.jenkins-ci.org/browse/JENKINS-39543) -
Improve the caller/callee correlation diagnostics in thread dumps.
([PR jenkinsci#119](jenkinsci/remoting#119))
* [JENKINS-39290](https://issues.jenkins-ci.org/browse/JENKINS-39290) -
Add the `org.jenkinsci.remoting.nio.NioChannelHub.disabled` flag for disabling NIO (mostly for debugging purposes).
([PR jenkinsci#123](jenkinsci/remoting#123))
* [JENKINS-38692](https://issues.jenkins-ci.org/browse/JENKINS-38692) -
Add extra logging to help diagnosing `IOHub` Thread spikes.
([PR jenkinsci#116](jenkinsci/remoting#116))
* [JENKINS-39289](https://issues.jenkins-ci.org/browse/JENKINS-39289) -
 When a proxy fails, report what caused the channel to go down.
([PR jenkinsci#128](jenkinsci/remoting#128))
oleg-nenashev added a commit to jenkinsci/jenkins that referenced this pull request Nov 10, 2016
Update remoting to 2.31 in the Jenkins core. (#2628)
The change introduces one serious bugfix (JENKINS-39596) and a bunch of various diagnostics improvements.

Bugfixes:

* [JENKINS-39596](https://issues.jenkins-ci.org/browse/JENKINS-39596) -
Jenkins URL in `hudson.remoting.Engine` was always `null` since `3.0`.
It was causing connection failures of Jenkins JNLP agents when using Java Web Start.
([PR #131](jenkinsci/remoting#131))
* [JENKINS-39617](https://issues.jenkins-ci.org/browse/JENKINS-39617) -
`hudson.remoting.Engine` was failing to establish connection if one of the URLs parameter in parameters was malformed.
([PR #131](jenkinsci/remoting#131))

Improvements:

* [JENKINS-39150](https://issues.jenkins-ci.org/browse/JENKINS-39150) -
Add logic for dumping diagnostics across all the channels.
([PR #122](jenkinsci/remoting#122), [PR #125](jenkinsci/remoting#125))
* [JENKINS-39543](https://issues.jenkins-ci.org/browse/JENKINS-39543) -
Improve the caller/callee correlation diagnostics in thread dumps.
([PR #119](jenkinsci/remoting#119))
* [JENKINS-39290](https://issues.jenkins-ci.org/browse/JENKINS-39290) -
Add the `org.jenkinsci.remoting.nio.NioChannelHub.disabled` flag for disabling NIO (mostly for debugging purposes).
([PR #123](jenkinsci/remoting#123))
* [JENKINS-38692](https://issues.jenkins-ci.org/browse/JENKINS-38692) -
Add extra logging to help diagnosing `IOHub` Thread spikes.
([PR #116](jenkinsci/remoting#116))
* [JENKINS-39289](https://issues.jenkins-ci.org/browse/JENKINS-39289) -
 When a proxy fails, report what caused the channel to go down.
([PR #128](jenkinsci/remoting#128))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.