Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use System.nanoTime for elapsed time #11058

Closed
wants to merge 3 commits into from

Conversation

mikemccand
Copy link
Contributor

I'm running a "power loss" ES test and noticed this strange log line from ES after reboot:

crashnode: [2015-05-08 05:10:57,381][DEBUG][index.gateway            ] [crashnode] [logs][4] recovery completed from [shard_gateway], took [4h]

The [4h] is completely wrong ... the recovery only took a minute or two. Digging into it, I noticed that for some reason, on my Fedora Linux box, System.currentTimeMillis suddenly jumps 4 hours into the future, thus messing up this log line. This is reproducible...

I think when we want to measure elapsed time we should use System.nanoTime, unless it's in some crazy hot spot (nanoTime is somewhat more costly). currentTimeMillis pulls from the local clock and timezone, which can easily change... whereas nanoTime, on modern JVM/OS/hardware, should only monotonically increase.

When I changed RecoveryState.time to use nanoTime instead, the wrong [4h] output is fixed.

We use System.currentTimeMillis in a number of places in ES, and unfortunately I was only able to fix a subset of them in this PR because in the other places the timestamp is part of the API and I'm not sure if someone "relies" on that timestamp having come from currentTimeMillis...

@mikemccand mikemccand added v2.0.0-beta1 :Core/Infra/Core Core issues without another label v1.6.0 labels May 8, 2015
@mikemccand mikemccand self-assigned this May 8, 2015
@dakrone
Copy link
Member

dakrone commented May 8, 2015

whereas nanoTime, on modern JVM/OS/hardware, should only monotonically increase

There's a bug for this http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6458294 also, so we should make sure we don't rely on it monotonically increasing anywhere in the code.

@@ -400,16 +402,18 @@ public synchronized long time() {
if (time >= 0) {
return time;
}
return Math.max(0, System.currentTimeMillis() - startTime);
long t = TimeValue.nsecToMSec(System.nanoTime());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the assignment to t here? All of the other replacements didn't use any temporary variables (just curious)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woops, leftover from debugging; I'll put it back (no assignment to t)

@dakrone
Copy link
Member

dakrone commented May 8, 2015

Left some minor comments, but this looks good.

That TODO looks really dangerous though... I think we should fix that in a separate issue?

@mikemccand
Copy link
Contributor Author

There's a bug for this http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6458294 also, so we should make sure we don't rely on it monotonically increasing anywhere in the code.

Yeah, timestamps are really messy :) In general we should never rely on them for anything "important".

Since I'm replacing System.currentTimeMillis() here, ES should already not be relying on a monotonic clock... nanoTime should only be an improvement since it "tries harder" to be monotonic (JVM falls back to currentTimeMillis if monotonic clock isn't available).

@mikemccand
Copy link
Contributor Author

That TODO looks really dangerous though... I think we should fix that in a separate issue?

The TODO is really scary ... if clock jumps backwards by an hour suddenly writing legacy checksums can spin in a busy loop for an hour before finishing!

But: I think this code may be dead in master? Do we really ever call LegacyChecksums.writeChecksums after opening an ancient index?

@mikemccand
Copy link
Contributor Author

Really I'd love to ban the System.currentTimeMillis API (Rob's idea) but in this PR I've only addressed maybe 25% of our usage because we expose the timestamps through public APIs in the other ~75% cases.

@mikemccand
Copy link
Contributor Author

Thanks for the feedback @dakrone, I pushed another commit folding it in.

@mikemccand mikemccand added the >bug label May 8, 2015
@dakrone
Copy link
Member

dakrone commented May 8, 2015

LGTM

About whether we write the checksums, it looks like we do call it when restoring an older version of a shard from a snapshot as well as when we clean files at the end of phase1 recovery.

Maybe we can just use lastVersion + 1? Looping to set it to currentTimeMillis seems really arbitrary.

@mikemccand
Copy link
Contributor Author

Maybe we can just use lastVersion + 1? Looping to set it to currentTimeMillis seems really arbitrary.

OK, I just pushed commit to use Math.max(lastVersion+1, System.currentTimeMillis())

mikemccand added a commit that referenced this pull request May 11, 2015
System.currentTimeMillis is more vulnerable when the clock shifts.

Closes #11058
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Core/Infra/Core Core issues without another label v1.6.0 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants