Fluctuating CPU MHz makes `tsc` profiling lie #23

nakedible · 2023-11-05T21:29:45Z

This isn't a fully debugged issue report yet, nor do I have a clean reproduction recipe yet...

Sometimes when I benchmark, attempting to force divan to use TSC, I get results where timer accuracy is 20 ns (this is Github Codespaces so extremely heavily virtualized):

$ cargo bench --bench basic -- --timer tsc --sample-size 81920
    Finished bench [optimized] target(s) in 0.02s
     Running benches/basic.rs (target/release/deps/basic-2115956f1e3e5fd7)
Timer precision: 20.03 ns
basic               fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ get_bounds       0.306 ns      │ 0.564 ns      │ 0.306 ns      │ 0.322 ns      │ 100     │ 8192000
├─ get_cold         0.306 ns      │ 0.505 ns      │ 0.306 ns      │ 0.313 ns      │ 100     │ 8192000
├─ get_sat          0.307 ns      │ 0.697 ns      │ 0.307 ns      │ 0.315 ns      │ 100     │ 8192000
├─ get_unreachable  0.306 ns      │ 0.924 ns      │ 0.338 ns      │ 0.407 ns      │ 100     │ 8192000
├─ get_unsafe       0.306 ns      │ 0.468 ns      │ 0.306 ns      │ 0.308 ns      │ 100     │ 8192000
├─ harp             1.847 ns      │ 3.775 ns      │ 1.848 ns      │ 2.088 ns      │ 100     │ 8192000
╰─ wrapping_add     1.848 ns      │ 2.068 ns      │ 1.848 ns      │ 1.862 ns      │ 100     │ 8192000

However, every now and then, divan figures out the timer accuracy is 29.85 ns, in which case I see wildly differing values:

$ cargo bench --bench basic -- --timer tsc --sample-size 81920
    Finished bench [optimized] target(s) in 0.05s
     Running benches/basic.rs (target/release/deps/basic-2115956f1e3e5fd7)
Timer precision: 29.85 ns
basic               fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ get_bounds       0.228 ns      │ 0.364 ns      │ 0.228 ns      │ 0.237 ns      │ 100     │ 8192000
├─ get_cold         0.228 ns      │ 0.823 ns      │ 0.228 ns      │ 0.238 ns      │ 100     │ 8192000
├─ get_sat          0.229 ns      │ 0.354 ns      │ 0.229 ns      │ 0.232 ns      │ 100     │ 8192000
├─ get_unreachable  0.228 ns      │ 0.502 ns      │ 0.26 ns       │ 0.26 ns       │ 100     │ 8192000
├─ get_unsafe       0.228 ns      │ 1.162 ns      │ 0.228 ns      │ 0.269 ns      │ 100     │ 8192000
├─ harp             1.769 ns      │ 3.097 ns      │ 1.892 ns      │ 1.972 ns      │ 100     │ 8192000
╰─ wrapping_add     1.77 ns       │ 3.783 ns      │ 1.77 ns       │ 1.97 ns       │ 100     │ 8192000

This happens also if I don't specify a sample size, but I wanted to fix it for this example so that it's clear the difference isn't because of a different iters value.

This is an extreme microbenchmark, but I think I was seeing similar on larger benchmarks at well.

Actually, while writing this, I think I might've figure out the issue – different CPUs on this system have different MHz, and they seem to be constantly changing.

Not sure what divan can do about that :-)

$ cat /proc/cpuinfo  | grep MHz
cpu MHz         : 3242.971
cpu MHz         : 2785.335
cpu MHz         : 3234.546
cpu MHz         : 3245.406
cpu MHz         : 2913.887
cpu MHz         : 2933.662
cpu MHz         : 3096.559
cpu MHz         : 2445.426

The text was updated successfully, but these errors were encountered:

vlovich · 2023-11-17T21:52:45Z

Yes, when benchmarking / profiling code it's not uncommon to override the governor to use the performance governor to remove the "random" clock changes that normal governors will make. That's less about TSC & more about the fact that your code is actually taking a variable amount of time. And honestly, +/- 30% for something taking < 1ns isn't surprising. At that point, the loop counter used for doing the repeated iteration within the timing code is going to be a meaningful part of the measurement. Running within a VM is going to be another contributing factor for measurement error.

nakedible · 2023-11-18T11:36:59Z

Ultimately I spent quite a while in testing different configurations for this. What I need is benchmarks for something that's in the nanosecond range, where the benchmark will reliably tell if a new version is faster or slower than the old one, where reliably means maybe only 1 in twenty mistakes, but preferably 1 in a hundred.

I tried a lot of options with divan, also on a quiet desktop PC, but I was unable to achieve such reliable benchmarks. In the end, criterion ended up being the better choice for me, even though the benchmarks are really slow.

I don't know if there's any sense in keeping this "bug" open, so I'll close it.

nvzqz · 2023-11-18T16:22:27Z

I tried a lot of options with divan, also on a quiet desktop PC, but I was unable to achieve such reliable benchmarks.

So is this issue not just with the TSC then?

vlovich · 2023-11-18T23:17:40Z

I don't think so. I have observed result instability For what it's worth as well. I suspect criterion has similar behavior and the main differentiating factor is the statistical analysis done within that crate more than the fact that they iterate so much.

nakedible · 2023-11-19T12:31:23Z

Yes, it happens also without TSC. But I need to spend a bit more time on this so I can actually give reasonable comments and not just guesswork.

nakedible changed the title ~~Sometimes gives wildly inaccurate values~~ Fluctuating CPU MHz makes tsc profiling lie Nov 5, 2023

nakedible closed this as completed Nov 18, 2023

nvzqz mentioned this issue Dec 2, 2023

Refactoring Database: const and faster loading, dropping bincode-loading whisperfish/rust-phonenumber#63

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fluctuating CPU MHz makes `tsc` profiling lie #23

Fluctuating CPU MHz makes `tsc` profiling lie #23

nakedible commented Nov 5, 2023

vlovich commented Nov 17, 2023

nakedible commented Nov 18, 2023

nvzqz commented Nov 18, 2023

vlovich commented Nov 18, 2023

nakedible commented Nov 19, 2023

Fluctuating CPU MHz makes tsc profiling lie #23

Fluctuating CPU MHz makes tsc profiling lie #23

Comments

nakedible commented Nov 5, 2023

vlovich commented Nov 17, 2023

nakedible commented Nov 18, 2023

nvzqz commented Nov 18, 2023

vlovich commented Nov 18, 2023

nakedible commented Nov 19, 2023

Fluctuating CPU MHz makes `tsc` profiling lie #23

Fluctuating CPU MHz makes `tsc` profiling lie #23