Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fluctuating CPU MHz makes tsc profiling lie #23

Closed
nakedible opened this issue Nov 5, 2023 · 5 comments
Closed

Fluctuating CPU MHz makes tsc profiling lie #23

nakedible opened this issue Nov 5, 2023 · 5 comments

Comments

@nakedible
Copy link

This isn't a fully debugged issue report yet, nor do I have a clean reproduction recipe yet...

Sometimes when I benchmark, attempting to force divan to use TSC, I get results where timer accuracy is 20 ns (this is Github Codespaces so extremely heavily virtualized):

$ cargo bench --bench basic -- --timer tsc --sample-size 81920
    Finished bench [optimized] target(s) in 0.02s
     Running benches/basic.rs (target/release/deps/basic-2115956f1e3e5fd7)
Timer precision: 20.03 ns
basic               fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ get_bounds       0.306 ns      │ 0.564 ns      │ 0.306 ns      │ 0.322 ns      │ 100     │ 8192000
├─ get_cold         0.306 ns      │ 0.505 ns      │ 0.306 ns      │ 0.313 ns      │ 100     │ 8192000
├─ get_sat          0.307 ns      │ 0.697 ns      │ 0.307 ns      │ 0.315 ns      │ 100     │ 8192000
├─ get_unreachable  0.306 ns      │ 0.924 ns      │ 0.338 ns      │ 0.407 ns      │ 100     │ 8192000
├─ get_unsafe       0.306 ns      │ 0.468 ns      │ 0.306 ns      │ 0.308 ns      │ 100     │ 8192000
├─ harp             1.847 ns      │ 3.775 ns      │ 1.848 ns      │ 2.088 ns      │ 100     │ 8192000
╰─ wrapping_add     1.848 ns      │ 2.068 ns      │ 1.848 ns      │ 1.862 ns      │ 100     │ 8192000

However, every now and then, divan figures out the timer accuracy is 29.85 ns, in which case I see wildly differing values:

$ cargo bench --bench basic -- --timer tsc --sample-size 81920
    Finished bench [optimized] target(s) in 0.05s
     Running benches/basic.rs (target/release/deps/basic-2115956f1e3e5fd7)
Timer precision: 29.85 ns
basic               fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ get_bounds       0.228 ns      │ 0.364 ns      │ 0.228 ns      │ 0.237 ns      │ 100     │ 8192000
├─ get_cold         0.228 ns      │ 0.823 ns      │ 0.228 ns      │ 0.238 ns      │ 100     │ 8192000
├─ get_sat          0.229 ns      │ 0.354 ns      │ 0.229 ns      │ 0.232 ns      │ 100     │ 8192000
├─ get_unreachable  0.228 ns      │ 0.502 ns      │ 0.26 ns       │ 0.26 ns       │ 100     │ 8192000
├─ get_unsafe       0.228 ns      │ 1.162 ns      │ 0.228 ns      │ 0.269 ns      │ 100     │ 8192000
├─ harp             1.769 ns      │ 3.097 ns      │ 1.892 ns      │ 1.972 ns      │ 100     │ 8192000
╰─ wrapping_add     1.77 ns       │ 3.783 ns      │ 1.77 ns       │ 1.97 ns       │ 100     │ 8192000

This happens also if I don't specify a sample size, but I wanted to fix it for this example so that it's clear the difference isn't because of a different iters value.

This is an extreme microbenchmark, but I think I was seeing similar on larger benchmarks at well.

Actually, while writing this, I think I might've figure out the issue – different CPUs on this system have different MHz, and they seem to be constantly changing.

Not sure what divan can do about that :-)

$ cat /proc/cpuinfo  | grep MHz
cpu MHz         : 3242.971
cpu MHz         : 2785.335
cpu MHz         : 3234.546
cpu MHz         : 3245.406
cpu MHz         : 2913.887
cpu MHz         : 2933.662
cpu MHz         : 3096.559
cpu MHz         : 2445.426
@nakedible nakedible changed the title Sometimes gives wildly inaccurate values Fluctuating CPU MHz makes tsc profiling lie Nov 5, 2023
@vlovich
Copy link

vlovich commented Nov 17, 2023

Yes, when benchmarking / profiling code it's not uncommon to override the governor to use the performance governor to remove the "random" clock changes that normal governors will make. That's less about TSC & more about the fact that your code is actually taking a variable amount of time. And honestly, +/- 30% for something taking < 1ns isn't surprising. At that point, the loop counter used for doing the repeated iteration within the timing code is going to be a meaningful part of the measurement. Running within a VM is going to be another contributing factor for measurement error.

@nakedible
Copy link
Author

Ultimately I spent quite a while in testing different configurations for this. What I need is benchmarks for something that's in the nanosecond range, where the benchmark will reliably tell if a new version is faster or slower than the old one, where reliably means maybe only 1 in twenty mistakes, but preferably 1 in a hundred.

I tried a lot of options with divan, also on a quiet desktop PC, but I was unable to achieve such reliable benchmarks. In the end, criterion ended up being the better choice for me, even though the benchmarks are really slow.

I don't know if there's any sense in keeping this "bug" open, so I'll close it.

@nvzqz
Copy link
Owner

nvzqz commented Nov 18, 2023

I tried a lot of options with divan, also on a quiet desktop PC, but I was unable to achieve such reliable benchmarks.

So is this issue not just with the TSC then?

@vlovich
Copy link

vlovich commented Nov 18, 2023

I don't think so. I have observed result instability For what it's worth as well. I suspect criterion has similar behavior and the main differentiating factor is the statistical analysis done within that crate more than the fact that they iterate so much.

@nakedible
Copy link
Author

Yes, it happens also without TSC. But I need to spend a bit more time on this so I can actually give reasonable comments and not just guesswork.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants