fix: fence the tsc() call and prevent crash from _cycles_per_sec by BewareMyPower · Pull Request #11 · fast/fastant

BewareMyPower · 2026-05-12T13:11:43Z

fixes #5
fixes #7

Just like #7 points out, __rdtsc might cause out or order executions, see https://doc.rust-lang.org/beta/core/arch/x86/fn._rdtsc.html

The RDTSC instruction is not a serializing instruction. It does not necessarily wait until all previous instructions have been executed before reading the counter. Similarly, subsequent instructions may begin execution before the read operation is performed.

It might leads to incorrect return value of tsc(), especially when there are heavy math operation or a memory load coming up after tsc() returns, because any following instruction could be executed before tsc().

Hence, this PR leverages the rdtscp instruction and the following fence (_mm_lfence when SSE2 is enabled or fall back to compiler_fence) to prevent instruction reordering.

However, there could still be a rare case that #5 happens.

let tsc1 = tsc();
let tsc2 = tsc();

The 2nd tsc() call might be switched to a different core, which might not have the same cycle with the old core. Then tsc2 might be a value slightly smaller than tsc1. Then the overflow could happen and the application would crash like #5

BewareMyPower · 2026-05-12T13:19:46Z

After addressing the safety issue, the performance of fastant::Instant would not be better than std::Instant

Implementation	Time (ns)	vs std
`fastant::Instant::now()`	28.94 ns	~1.03x (3% slower)
`quanta::Instant::now()`	11.46 ns	~2.46x faster
`std::Instant::now()`	28.16 ns	baseline

quanta is ~2.5x faster than both fastant and std. fastant and std are essentially tied (within noise/overlap of confidence intervals).

BewareMyPower · 2026-05-12T13:39:45Z

After removing the fence, it's still much slower than quanta.

Implementation	Time (ns)	vs std
fastant	22.46 ns (22.44-22.49)	0.80x faster
quanta	11.48 ns (11.43-11.55)	0.41x faster
std	28.00 ns (28.00-28.01)	baseline

The result come from the CI in a different branch: https://github.com/BewareMyPower/fastant/actions/runs/25737350865/job/75577818458

I didn't have time looking into the source of quanta, but from LLM's analysis, it seems that the overhead of serializing of __rdtscp is the major factor.

I will mark this PR as drafted and start a discussion here. @tisonkun @andylokandy

quanta is faster than fastant for one primary reason:

`rdtsc` vs `rdtscp`

fastant uses __rdtscp (tsc_now.rs:211) — the serializing variant. It waits for all prior instructions to complete before reading the counter, guaranteeing the TSC is read after everything else in program order.

quanta uses __rdtsc (counter.rs:7) — the non-serializing variant. It just reads the counter immediately, no ordering guarantee.

On modern x86, rdtscp costs ~20+ cycles vs rdtsc at ~10 cycles. That alone accounts for most of the 2.5x gap (28.9ns → 11.5ns).

Secondary differences

	fastant	quanta
TSC instruction	`__rdtscp` (serializing)	`__rdtsc` (non-serializing)
Per-call check	`is_tsc_available()` read + branch every call	Amortized via `OnceCell` init
Hot-path math	`wrapping_sub` (anchor offset)	`saturating_sub` + `u128::mul` + shift
Calibration	`f64` pre-computed `nanos_per_cycle`	Power-of-two integer scaling (no float in hot path)

Note that fastant's wrapping_sub is applied to raw TSC values to normalize them, while quanta's saturating_sub + mul + shift converts TSC ticks → nanoseconds. Both do subtraction per call. The extra mul + shift in quanta is still cheaper than the serializing overhead of rdtscp.

BewareMyPower added 3 commits May 12, 2026 20:57

fix: fence the tsc() call and prevent crash from _cycles_per_sec

cd3f3a4

fix clippy check

e96c833

use unused_unsafe instead

efd6e91

BewareMyPower marked this pull request as draft May 12, 2026 13:39

BewareMyPower mentioned this pull request May 12, 2026

Is necessary to introduce rdtsc_ordered? #7

Open

tisonkun mentioned this pull request May 13, 2026

fix: _cycles_per_sec might crash due to overflow by reordering of rdtsc #12

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fence the tsc() call and prevent crash from _cycles_per_sec#11

fix: fence the tsc() call and prevent crash from _cycles_per_sec#11
BewareMyPower wants to merge 3 commits into
fast:mainfrom
BewareMyPower:fix-rdtsc-crash

BewareMyPower commented May 12, 2026

Uh oh!

BewareMyPower commented May 12, 2026

Uh oh!

BewareMyPower commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

BewareMyPower commented May 12, 2026

Uh oh!

BewareMyPower commented May 12, 2026

Uh oh!

BewareMyPower commented May 12, 2026

__rdtsc vs __rdtscp

Secondary differences

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`rdtsc` vs `rdtscp`