New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JENKINS-21251] Slave channel timeout seemingly caused by usage of System.currentTimeMillis() #33
Conversation
clock drift. The calculation on how long to wait in various places on the code was suseptable to the clock changing as multiple calls where made to System.currentTimeMillis() for the initial calculation. Changed this so that we use a single call to System.nanoTime and made just a single call for the initial calculation. There is still a potential issue for any callers of Channel.getLastHeard() this will be addressed in a future change and has only been noted in this commit.
q.wait(end-System.currentTimeMillis()); | ||
long now = System.nanoTime(); | ||
long end = now + unit.toNanos(timeout); | ||
while (isAlive() && (end - now > 0L)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that for timeout == 0
, this will now wait(0)
, whereas the original code would have skipped the loop altogether. Probably does not matter, just FYI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if timeout == 0
then end = now + 0
so (end - now > 0 == false
) so we never enter the loop and call wait(0)
👍 |
while(response==null && (now=System.currentTimeMillis())<end) { | ||
long now = System.nanoTime(); | ||
long end = now + unit.toNanos(timeout); | ||
while (response == null && (end - now > 0L)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's wrong with end > now
BTW
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to account for overflow of the 64bit number. It starts from an undetermined number - could easily be Long.MAX_VALUE - 3
so end could be negative and now could be positive yet we still need to sleep.
Thank you for a pull request! Please check this document for how the Jenkins project handles pull requests |
In the loop set the minimum sleep duration to 1 ns.
👍 |
[JENKINS-21251] Slave channel timeout seemingly caused by usage of System.currentTimeMillis()
Initial calculation of wait time was susceptible to clock drift.
The calculation on how long to wait in various places on the code was
susceptible to the clock changing as multiple calls where made to
System.currentTimeMillis() for the initial calculation.
Changed this so that we use a single call to System.nanoTime and made just
a single call for the initial calculation.
There is still a potential issue for any callers of Channel.getLastHeard()
this will be addressed in a future change and has only been noted in this
commit.
Replaces Pull #29 which was getting multiple different fixes intertwined.
@reviewbybees