Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-21251] Slave channel timeout seemingly caused by usage of System.currentTimeMillis() #33

Merged
merged 3 commits into from Mar 13, 2015

Conversation

jtnord
Copy link
Member

@jtnord jtnord commented Feb 20, 2015

Initial calculation of wait time was susceptible to clock drift.

The calculation on how long to wait in various places on the code was
susceptible to the clock changing as multiple calls where made to
System.currentTimeMillis() for the initial calculation.

Changed this so that we use a single call to System.nanoTime and made just
a single call for the initial calculation.

There is still a potential issue for any callers of Channel.getLastHeard()
this will be addressed in a future change and has only been noted in this
commit.

Replaces Pull #29 which was getting multiple different fixes intertwined.

@reviewbybees

clock drift.

The calculation on how long to wait in various places on the code was
suseptable to the clock changing as multiple calls where made to
System.currentTimeMillis() for the initial calculation.

Changed this so that we use a single call to System.nanoTime and made just
a single call for the initial calculation.

There is still a potential issue for any callers of Channel.getLastHeard()
this will be addressed in a future change and has only been noted in this
commit.
q.wait(end-System.currentTimeMillis());
long now = System.nanoTime();
long end = now + unit.toNanos(timeout);
while (isAlive() && (end - now > 0L)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that for timeout == 0, this will now wait(0), whereas the original code would have skipped the loop altogether. Probably does not matter, just FYI.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if timeout == 0 then end = now + 0 so (end - now > 0 == false) so we never enter the loop and call wait(0)

@jglick
Copy link
Member

jglick commented Feb 20, 2015

👍

while(response==null && (now=System.currentTimeMillis())<end) {
long now = System.nanoTime();
long end = now + unit.toNanos(timeout);
while (response == null && (end - now > 0L)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's wrong with end > now BTW

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to account for overflow of the 64bit number. It starts from an undetermined number - could easily be Long.MAX_VALUE - 3 so end could be negative and now could be positive yet we still need to sleep.

@jenkinsadmin
Copy link
Member

Thank you for a pull request! Please check this document for how the Jenkins project handles pull requests

In the loop set the minimum sleep duration to 1 ns.
@stephenc
Copy link
Member

stephenc commented Mar 3, 2015

👍

jtnord added a commit that referenced this pull request Mar 13, 2015
[JENKINS-21251] Slave channel timeout seemingly caused by usage of System.currentTimeMillis()
@jtnord jtnord merged commit 7f35e25 into master Mar 13, 2015
@jtnord jtnord deleted the JENKINS-21251_take2 branch March 13, 2015 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants