-
-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JENKINS-48130] - Improve handling and diagnostics of RejectedExecutionException in the code #156
[JENKINS-48130] - Improve handling and diagnostics of RejectedExecutionException in the code #156
Conversation
This pull request originates from a CloudBees employee. At CloudBees, we require that all pull requests be reviewed by other CloudBees employees before we seek to have the change accepted. If you want to learn more about our process please see this explanation. |
Various timeouts on both CI instances :( |
@Override | ||
public void run() { | ||
try { | ||
// Deduplication: There is a risk that multiple downloadables get scheduled, hence we check if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic has been moved to a nested class, The deduplication section below is the only difference
ExecutorServiceUtils.submitAsync(downloader, new DownloadRunnable(channel, sum1, sum2, key, promise)); | ||
// Now we are sure that the task has been accepted to the queue, hence we cache the promise | ||
// if nobody else caches it before. | ||
inprogress.putIfAbsent(key, promise); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this logic I am not fully convinced that ConcurrentMap
is still a good idea here, maybe a common lock would be preferable since it allows to avoid duplication on race conditions and then the deduplication logic.
In this case there is also a risk that DownloadRunnable
gets executed before putting the promise into the cache.
DEL: No return value bug
So the change in NioHub and SingleLaneExecutorService somehow makes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit code-blind but AIUI seems OK (assuming the build failures get resolved)
# Conflicts: # src/main/java/org/jenkinsci/remoting/nio/NioChannelHub.java
After the patch
|
@reviewbybees I have finally discovered why the tests hang. 🤦♂️ , but now it works. needs review |
Created https://issues.jenkins-ci.org/browse/JENKINS-48130 for this issue |
|
@@ -644,6 +646,11 @@ public void run() { | |||
// It causes the channel failure, hence it is severe | |||
LOGGER.log(SEVERE, "Communication problem in " + t + ". NIO Transport will be aborted.", e); | |||
t.abort(e); | |||
} catch (ExecutorServiceUtils.ExecutionRejectedException e) { | |||
// TODO: should we try to reschedule the task if the issue is not fatal? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still not sure about that... But it's still better than killing the entire NioChannelHub, right?
@reviewbybees a second review would be useful |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐝
I am feeling lucky. 🚢 🇮🇹 |
Background: I was analysing JIRA issues related to the NIOHub fatal channel termination causing massive disconnection of agents. It appears that the
SingleLaneExecutor
is not completely correctly used there...TL;DR: A single packet sent to the channel with pending shutdown may cause the termination of all remoting channels in
JNLP1
,JNLP2
,CLI
, andCLI2
protocols.JNLP4
does not seem to be affected.NioTransport
, message parts are are being submitted to itsSingleLaneExecutor
RejectedExecutionException
is being thrown. E.g. hereNioTransport
level and gets proparated to the top level ofNioChannelHub#run()
NioChannelHub#run()
catches the unhandledRuntimeException
... and terminates the entireNioHub
. code is herehttps://issues.jenkins-ci.org/browse/JENKINS-48130
Applied changes
ExecutorServiceUtils
class, which encapsulates the unsafe task submission and its runtime exception** FindBugs was raising the flag about the method handling BTW. Here is also a discussion about this feature in FindBugs
RejectedExecutionException
and FATALRejectedExecutionException
in the logicNioChannelHub
to handle the task submission errors without BOOMJarCacheSupport
to handle the task submission errors (e.g. when remoting gets shutdown in parallel with classloading) - same causeSingleLaneExecutorService
to properly handle the issues happening in its proxied Executor ServiceRelated issues