New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std: Force `Instant::now()` to be monotonic #56988

Merged
merged 1 commit into from Jan 8, 2019

Conversation

@alexcrichton
Copy link
Member

alexcrichton commented Dec 19, 2018

This commit is an attempt to force Instant::now to be monotonic
through any means possible. We tried relying on OS/hardware/clock
implementations, but those seem buggy enough that we can't rely on them
in practice. This commit implements the same hammer Firefox recently
implemented (noted in #56612) which is to just keep whatever the lastest
Instant::now() return value was in memory, returning that instead of
the OS looks like it's moving backwards.

Closes #48514
Closes #49281
cc #51648
cc #56560
Closes #56612
Closes #56940

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Dec 19, 2018

r? @Kimundi

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Dec 19, 2018

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:353f0f89:start=1545244467577054256,finish=1545244468694615649,duration=1117561393
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
Setting environment variables from .travis.yml
$ export IMAGE=x86_64-gnu-llvm-6.0
---
tidy check
[00:03:00] * 568 error codes
[00:03:00] * highest error code: E0721
[00:03:00] * 244 features
[00:03:00] tidy error: /checkout/src/libstd/time.rs:192: platform-specific cfg: cfg!(target_os = "macos")
[00:03:00] tidy error: /checkout/src/libstd/time.rs:193: platform-specific cfg: cfg!(target_os = "linux")
[00:03:00] tidy error: /checkout/src/libstd/time.rs:194: platform-specific cfg: cfg!(target_os = "linux")
[00:03:01] some tidy checks failed
[00:03:01] 
[00:03:01] 
[00:03:01] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor" "--quiet"
[00:03:01] 
[00:03:01] 
[00:03:01] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:03:01] Build completed unsuccessfully in 0:00:45
[00:03:01] Build completed unsuccessfully in 0:00:45
[00:03:01] Makefile:79: recipe for target 'tidy' failed
[00:03:01] make: *** [tidy] Error 1
The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:262a6f76
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
Wed Dec 19 18:37:39 UTC 2018
---
travis_time:end:0cee41d8:start=1545244659972411142,finish=1545244659977875228,duration=5464086
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:0ad9f916
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:116c0f86
travis_time:start:116c0f86
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:09d86d7a
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@wesleywiser

This comment has been minimized.

Copy link
Member

wesleywiser commented Dec 19, 2018

Thanks for fixing this so quickly @alexcrichton!

@alexcrichton alexcrichton force-pushed the alexcrichton:monotonic-instant branch from cf6b327 to 0f132c6 Dec 19, 2018

let now = cmp::max(LAST_NOW, os_now);
LAST_NOW = now;
Instant(now)
}

This comment has been minimized.

@the8472

the8472 Dec 19, 2018

A mutex seems a bit heavy-handed. Many uses of rdtsc (where available) are for minimal-overhead, thread-local timing of functions.
Wouldn't rdtsc + some atomic ops that prevent things from going backwards be much lighter than potential thread suspension?

This comment has been minimized.

@alexcrichton

alexcrichton Dec 19, 2018

Author Member

I agree that a full-on mutex is quite a heavy hammer for this use case! I wasn't sure though how to best to minimize the cost here.

The Windows documentation at least "strongly discourages" rdtsc for handling VM migration issues as well as some supposed hardware. If that's the case I think we probably want to avoid that?

I figured it'd probably be best to start from a conservative position and we can always come in later as necessary and try to use atomics and/or different tricks.

This comment has been minimized.

@the8472

the8472 Dec 19, 2018

I didn't mean to use rdtsc directly. We can still defer to QueryPerformanceCounter/clock_gettime, i.e. the current Instant implementation which in the end boils down to rdtsc on many x86 systems.
Just tack on a sanity check/correction with atomics instead of a mutex.

I'm mostly concerned that thread-contention might unexpectedly hit people if they litter their code with instants because it used to be fast. I have no concrete examples, just experience with gratuitous use of timing functions that were fast on linux suddenly making an application slow on windows because it decided to not use rdtsc.

This comment has been minimized.

@alexcrichton

alexcrichton Dec 19, 2018

Author Member

Oh I'm all for adding a more lightweight implementation, do you have one in mind? Instant has a varying size across platforms, which makes it difficult to select an appropriate atomic and/or more lightweight method

This comment has been minimized.

@the8472

the8472 Dec 19, 2018

I had sizeof checks, some type punning and AtomicU128/U64 in mind. Beyond that it would be your standard read, check, CAS. similar to what you're now doing in the lock's critical section, except in a loop.

The mutex would still be needed as fallback if the checks don't work out.

This comment has been minimized.

@alexcrichton

alexcrichton Dec 20, 2018

Author Member

Ah ok, that's what I thought, and yeah my worry about that is that it wouldn't solve the original problem of monotonic clocks going backwards, so I'm afraid we'd still end up a the solution proposed in this PR.

We, as far as I know, don't have a great handle on how big the errors are.

This comment has been minimized.

@the8472

the8472 Dec 20, 2018

I have thought some more about optimization potential

  1. we can use relaxed atomics everywhere. Justification: One thread cannot observe another thread's Instants without some external synchronization happening, e.g. other ordered loads and stores. So until those happen only intra-thread ordering is relevant, Relaxed is sufficient for that.
  2. in the good case we only need to do a test and return from the perspective of the main sequence of instructions. There's no dependency on the writes to global state happening, so this should be friendly to instruction parallelism.
  3. we can limit the XCHG loop by bailing out early if it fails because another thread updated it to a larger value than we are trying. it doesn't prevent the cache-line from bouncing around but at least can allow multiple threads to make progress simultaneously.

It could approximately look like this:

static mut LAST_NOW: AtomicU128 = 0.into();
let last_now = LAST_NOW.load(Relaxed);  // insert type punning here
let os_now = time::Instant::now();
if likely(os_now > last_now) {
  loop {
    match LAST_NOW.compare_exchange_weak(last_now, os_now, Relaxed, Relaxed) {
      Ok(_) => break,
      Err(x) if x >= os_now => break, // some other thread is ahead of us, no need to update
      _ => {}
    }
  return os_now;
}
return last_now

It's a bit smaller hammer but still not the rubber kind. To soften it further we either need a I (don't) care about broken systems switch somewhere or use better platform detection.

This comment has been minimized.

@alexcrichton

alexcrichton Dec 21, 2018

Author Member

@the8472 yes that's all possible but 128-bit atomics are only available on a few platforms, so we can't use them generally.

This comment has been minimized.

@nikic

nikic Dec 21, 2018

Contributor

Ah ok, that's what I thought, and yeah my worry about that is that it wouldn't solve the original problem of monotonic clocks going backwards, so I'm afraid we'd still end up a the solution proposed in this PR.

Wouldn't it still make sense to try the cheaper thread-local version first and switch to a full lock if it does turn out to be insufficient? If we directly go to the lock, then we will not be able to determine whether a cheaper thread-local variant would also have been sufficient. @the8472's theory that this arises only due to migration between cores at least sounds very plausible, so I think it would be worthwhile to try this.

This comment has been minimized.

@alexcrichton

alexcrichton Dec 21, 2018

Author Member

@nikic I think it's incorrect to avoid the full lock though? If the time is less than a thread-local version than you definitely have to acquire the lock, but even if it's greater than a thread local version you need to check the lock for the global one as well. Right now the bug primarily happens on one thread, but the documented guarantees of this API are that it works across all threads

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Dec 19, 2018

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:15d5c828:start=1545245110625471104,finish=1545245113003082919,duration=2377611815
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
Setting environment variables from .travis.yml
$ export IMAGE=x86_64-gnu-llvm-6.0
---
[00:06:03]    Compiling syntax_ext v0.0.0 (/checkout/src/libsyntax_ext)
[00:06:08] error: unused import: `Duration`
[00:06:08]   --> src/librustc/util/profiling.rs:15:17
[00:06:08]    |
[00:06:08] 15 | use std::time::{Duration, Instant};
[00:06:08]    |
[00:06:08]    = note: `-D unused-imports` implied by `-D warnings`
[00:06:08] 
[00:06:34] error: aborting due to previous error
---
[00:06:34] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "build" "--target" "x86_64-unknown-linux-gnu" "-j" "4" "--release" "--locked" "--color" "always" "--features" "" "--manifest-path" "/checkout/src/rustc/Cargo.toml" "--message-format" "json"
[00:06:34] expected success, got: exit code: 101
[00:06:34] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap build
[00:06:34] Build completed unsuccessfully in 0:03:46
[00:06:34] make: *** [all] Error 1
[00:06:34] Makefile:28: recipe for target 'all' failed
The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:0af64a28
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
Wed Dec 19 18:51:56 UTC 2018

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@alexcrichton alexcrichton force-pushed the alexcrichton:monotonic-instant branch from 0f132c6 to 4ce8d27 Dec 19, 2018

@Mark-Simulacrum

This comment has been minimized.

Copy link
Member

Mark-Simulacrum commented Dec 19, 2018

Is there a reason we're using the "internal" Mutex for the implementation though? I'd kind of expect that we could just use std::sync::Mutex?

@alexcrichton

This comment has been minimized.

Copy link
Member Author

alexcrichton commented Dec 19, 2018

@Mark-Simulacrum the sync::Mutex type doesn't have a const constructor, whereas the internal mutex type does

// * https://bugzilla.mozilla.org/show_bug.cgi?id=1487778 - a similar
// Firefox bug
//
// It simply seems that this it just happens so that a lot in the wild

This comment has been minimized.

@stjepang

stjepang Dec 20, 2018

Contributor

Delete 'this'?

@the8472

This comment has been minimized.

Copy link

the8472 commented Dec 27, 2018

Maybe we should get a perf run on this for a system where actually_monotonic == false?

return Instant(os_now)
}

static LOCK: Mutex = Mutex::new();

This comment has been minimized.

@gnzlbg

gnzlbg Jan 2, 2019

Contributor

Do we have dead-code elimination in MIR debug builds ? Otherwise LLVM-IR for this code will always be emitted independently of the result of actually_monotonic, and whether that code will end up generating machine code will depend on the optimization level.

This comment has been minimized.

@alexcrichton

alexcrichton Jan 3, 2019

Author Member

We don't, now, but actually_monotonic is a function that'll be trivially inlined so LLVM will optimize this away

@alexcrichton

This comment has been minimized.

Copy link
Member Author

alexcrichton commented Jan 3, 2019

@rust-highfive rust-highfive assigned sfackler and unassigned Kimundi Jan 3, 2019

@sfackler

This comment has been minimized.

Copy link
Member

sfackler commented Jan 3, 2019

Hardware Is Bad

@bors r+

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Jan 3, 2019

📌 Commit 4ce8d27 has been approved by sfackler

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Jan 5, 2019

⌛️ Testing commit 4ce8d27 with merge 1896be8...

bors added a commit that referenced this pull request Jan 5, 2019

Auto merge of #56988 - alexcrichton:monotonic-instant, r=sfackler
std: Force `Instant::now()` to be monotonic

This commit is an attempt to force `Instant::now` to be monotonic
through any means possible. We tried relying on OS/hardware/clock
implementations, but those seem buggy enough that we can't rely on them
in practice. This commit implements the same hammer Firefox recently
implemented (noted in #56612) which is to just keep whatever the lastest
`Instant::now()` return value was in memory, returning that instead of
the OS looks like it's moving backwards.

Closes #48514
Closes #49281
cc #51648
cc #56560
Closes #56612
Closes #56940
@bors

This comment has been minimized.

Copy link
Contributor

bors commented Jan 5, 2019

💔 Test failed - status-travis

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Jan 5, 2019

The job dist-various-2 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
[01:00:21]    Compiling panic_unwind v0.0.0 (/checkout/src/libpanic_unwind)
[01:00:21] [RUSTC-TIMING] panic_unwind test:false 0.280
[01:00:21] warning: dropping unsupported crate type `dylib` for target `x86_64-unknown-cloudabi`
[01:00:21] 
[01:00:24] error[E0599]: no function or associated item named `actually_monotonic` found for type `sys::cloudabi::time::Instant` in the current scope
[01:00:24]    --> src/libstd/time.rs:182:27
[01:00:24]     |
[01:00:24] 182 |         if time::Instant::actually_monotonic() {
[01:00:24]     |            |
[01:00:24]     |            |
[01:00:24]     |            function or associated item not found in `sys::cloudabi::time::Instant`
[01:00:24]     | 
[01:00:24]    ::: src/libstd/sys/cloudabi/time.rs:8:1
[01:00:24] 8   | pub struct Instant {
[01:00:24] 8   | pub struct Instant {
[01:00:24]     | ------------------ function or associated item `actually_monotonic` not found for this
[01:00:26] error: aborting due to previous error
[01:00:26] 
[01:00:26] For more information about this error, try `rustc --explain E0599`.
[01:00:26] [RUSTC-TIMING] std test:false 5.086
---
travis_time:end:0119c2a7:start=1546672709559221610,finish=1546672709566034009,duration=6812399
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:001f5858
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:01862410
travis_time:start:01862410
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:082b124f
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Jan 8, 2019

⌛️ Testing commit 255a3f3 with merge 2f19f8c...

bors added a commit that referenced this pull request Jan 8, 2019

Auto merge of #56988 - alexcrichton:monotonic-instant, r=sfackler
std: Force `Instant::now()` to be monotonic

This commit is an attempt to force `Instant::now` to be monotonic
through any means possible. We tried relying on OS/hardware/clock
implementations, but those seem buggy enough that we can't rely on them
in practice. This commit implements the same hammer Firefox recently
implemented (noted in #56612) which is to just keep whatever the lastest
`Instant::now()` return value was in memory, returning that instead of
the OS looks like it's moving backwards.

Closes #48514
Closes #49281
cc #51648
cc #56560
Closes #56612
Closes #56940
@bors

This comment has been minimized.

Copy link
Contributor

bors commented Jan 8, 2019

☀️ Test successful - status-appveyor, status-travis
Approved by: sfackler
Pushing 2f19f8c to master...

@bors bors merged commit 255a3f3 into rust-lang:master Jan 8, 2019

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details
@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Jan 8, 2019

📣 Toolstate changed by #56988!

Tested on commit 2f19f8c.
Direct link to PR: #56988

🎉 rls on linux: test-fail → test-pass (cc @nrc @Xanewok, @rust-lang/infra).

rust-highfive added a commit to rust-lang-nursery/rust-toolstate that referenced this pull request Jan 8, 2019

📣 Toolstate changed by rust-lang/rust#56988!
Tested on commit rust-lang/rust@2f19f8c.
Direct link to PR: <rust-lang/rust#56988>

🎉 rls on linux: test-fail → test-pass (cc @nrc @Xanewok, @rust-lang/infra).

VardhanThigle pushed a commit to VardhanThigle/rust that referenced this pull request Jan 9, 2019

VardhanThigle pushed a commit to VardhanThigle/rust that referenced this pull request Jan 11, 2019

VardhanThigle pushed a commit to VardhanThigle/rust that referenced this pull request Jan 11, 2019

VardhanThigle pushed a commit to VardhanThigle/rust that referenced this pull request Jan 13, 2019

VardhanThigle pushed a commit to VardhanThigle/rust that referenced this pull request Jan 13, 2019

VardhanThigle pushed a commit to VardhanThigle/rust that referenced this pull request Jan 13, 2019

}
}

pub fn actually_monotonic() -> bool {

This comment has been minimized.

@lu-zero

lu-zero Jan 14, 2019

Contributor

@edelsonh Can you confirm ppc/ppc64 has a reliable monotonic clock?

@Forty-Bot

This comment has been minimized.

Copy link

Forty-Bot commented Jan 14, 2019

Why not switch to CLOCK_MONOTONIC_RAW on linux? CLOCK_MONOTONIC is affected by "... the incremental adjustments performed by adjtime(3) and NTP," which is likely the cause of some of your monotonicity problems.

@sanxiyn

This comment has been minimized.

Copy link
Member

sanxiyn commented Jan 14, 2019

My understanding of adjtime is that they specifically do not cause any monotonicity problems. That is, adjtime can't be the cause. If we trust manual pages.

@Forty-Bot

This comment has been minimized.

Copy link

Forty-Bot commented Jan 14, 2019

FWIW CLOCK_MONOTONIC is derived from CLOCK_REALTIME with an offset to keep it monotonic, and CLOCK_MONOTONIC_RAW is from a different tk_read_base.

@spacejam

This comment has been minimized.

Copy link

spacejam commented Jan 14, 2019

NTP can just yank a clock backwards of the higher stratum side if the delta is over 128ms. https://github.com/ntp-project/ntp/blob/stable/parseutil/dcfd.c#L1059

@Forty-Bot

This comment has been minimized.

Copy link

Forty-Bot commented Jan 14, 2019

Hm, that should fail with -EINVAL if the new offset is less than the current monotonic offset...

If anyone wants to investigate this further, I made a short program to test the different clocks on linux. Specifically, @IntrepidPig seems to be able to reproduce it.

GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Jan 15, 2019

@programmerjake

This comment has been minimized.

Copy link

programmerjake commented Jan 16, 2019

I know this is a little late, but an atomic umax would be better than the compare-exchange this currently uses. LLVM has the atomicrmw umax instruction and RISC-V and probably other architectures have an instruction for that. Using the atomic umax instruction allows the atomic operation to be executed wherever the memory is cached (TileLink has special support for that) instead of having to move the cached memory, saving time.

@vi

This comment has been minimized.

Copy link
Contributor

vi commented Jan 17, 2019

Does it affect cargo bench? Will it start benching a synchronisation primitive instead of actual code?

@programmerjake

This comment has been minimized.

Copy link

programmerjake commented Jan 18, 2019

It would have an atomic op either way.

@alexcrichton alexcrichton deleted the alexcrichton:monotonic-instant branch Jan 18, 2019

@alexcrichton

This comment has been minimized.

Copy link
Member Author

alexcrichton commented Jan 18, 2019

@vi cargo bench does indeed use Instant::now(), and afaik measurements haven't been done to evaluate the impact of this.

@Saruspete

This comment has been minimized.

Copy link

Saruspete commented Jan 20, 2019

Hello there,

It's a bit late, and I'm more on the system than dev side, but anyway here's some hints / events / configurations on x86 & Linux that may (or may not) help you work on this matter (to be aware of the bad things that can happen).

So for Linux, you have multiple timing functions (some are available since a more recent kernel)
Each of these have its caveat and the choice depends on your use case:

CS = Clock Source (incremental monotonic counter)
TO = Time offset (value to add to CS to get the real human time)
ADJ = Minor Adjustments to CS (ntp, ptp)
TZ = Timezone

Source Precision Get from Value
clock_gettime( CLOCK_REALTIME) ns vdso & syscall fallback CS + TO + ADJ
clock_gettime( CLOCK_REALTIME_COARSE) ms vdso CS + TO + ADJ
clock_gettime( CLOCK_MONOTONIC) ns vdso CS + ADJ
clock_gettime( CLOCK_MONOTONIC_COARSE) ms vdso CS + ADJ
clock_gettime( CLOCK_MONOTONIC_RAW) vdso CS
gettimeofday() us vsdo override CS + TO + ADJ + TZ
time() sec syscall compat over gettimeofday
Assembly RDTSC cpu base freq, eg 3GHz = 0.33ns memory read CS
Assembly RDTSCP cpu base freq memory read CS

On modern x86, most of the selection work is done in arch/x86/entry/vdso/vclock_gettime.c :: __vdso_clock_gettime().
vDSO aims to give faster than syscall results when available, using common values contained in the structure arch/x86/include/asm/vgtod.h:: vsyscall_gtod_data (for timing, among other things).
This structure and system-wide clocks are managed by timekeeper kernel' structure, located in time/timekeeper.c.
You may want to check the timekeeping_update() code (called by settimeofday, change_clocksource, and many other events) to update the system time values.
You can check/set the underlying timekeeper clocksource through /sys/devices/system/clocksource/clocksource0/current_clocksource.

Standard calls for time (so, the non-coarse) means to deliver precise timing, and calling for a time delta between the last timekeeper update and the effective function call. To speedup things, these standard calls also try to use direct hardware: vread_pvclock (para-virtualized) vread_hvclock (hyper-v), vread_tsc() or vendor-supplied timeeking in VMware
The coarse types will simply return the value of the last timekeeper update (which is defined by constant HZ of the kernel, most of the time 1000Hz). This means just 1 or 2 values to read, no syscall and time diff, so super fast... but also way less precise.

So, here's how I choose which one depending on the use-case:

  • REALTIME : precise wall-clock time (precise log, events timediff)
  • REALTIME_COARSE : simple wall-clock time (to the second or ms) like for system logs.
  • MONOTONIC : precise duration measurement (benchmark, security related...)
  • MONOTONIC_COARSE : Not used.
  • MONOTONIC_RAW : system-wide stable counter.
  • RDTSC : precise function timing (but not for micro-benchmark) on stable CPUs (avoid process migration and VM)
  • RDTSCP / RDTSC + [lm]fence : precise benchmark. The serializing instruction avoid CPU Reordering, that would screw the micro-benchmark results.

some pitfalls

  • Beware of "human time clocks", as they will include the Leap-Second special case. It'll also depend on the type of adjustment requested by sysadmin: let kernel step back, let the ntp daemon step back, slew the clock during the whole day, add a new second (23:59:60)
  • The 128ms clock limit for slowing the time is only valid for ntpd, but some other daemons does not have this limit, like chrony.
  • The RealTime Linux variant (whose many features were included upstream) also have special features, like a dedicated thread for doing the timekeeping, tickless CPUs, and other constraints relative to the handling of time. You might want to ask these guys, especially Thomas Gleixner and Steven Rostedt.
  • Some CPU might go offline, and the processes that were in his runqueue are migrated to another one.
  • the clock calls might get in unbound retry
  • Some very exotic x86 hardware, like Bullion or SuperDome are multiple independant systems / blades made into a single one. This might have implications on the timing measurement depending on the OS & version

Whatever the OS, at the lower level, timing may be found from multiple hardware sources:

TimeStampCounter (TSC)

An incremental counter based on each cpu, value read by RDTSC or RDTSCP
As it's CPU based, when your reading process is migrated from a socket to another, or when the underlying hardware changes (like a Virtual Machine), the TSC value may have a very different value.
Some other scenarios can also lead to issues when using it (+ cpuflags to check if the feature is safe)

  • CPU provides frequency-scaling (PState) or CPU Sleep States (CState) to dynamically adjust frequency and/or power-saving, as long as CPU is kept in ACP-S0 (cpuflag: constant_tsc).
  • TSC will stop and reset upon standby (S3) or hybernation (S4) (cpuflag: nonstop_tsc).
  • The underlying Hypervisor will stop the VM and resume it, like for motion, snapshot, etc...

High Precision Event Timer (HPET)

It's composed of a single counter (fixed frequency), and many comparators that will generate an interrupt when the counter reaches a value they are waiting for. The counter frequency should just be higher than 10MHz.
This have multiple issues, like time drift, skew, missed interrupts,

Real Time Clock (RTC)

Well, not used anymore...

I know you have to cover multiple arches, and OS types, and OS versions, so always keep a skeptical eye when you implement a low-level feature :)

A feature I do on all language I use: a timing function/class that asks for what I intend to measure, and choose the right function for me. Maybe you can integrate it directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment