Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x.py bench should consider warning/informing the user if they have a configuration that may produce slower/inaccurate benchmakrs #93683

Open
thomcc opened this issue Feb 5, 2022 · 6 comments
Labels
A-contributor-roadblock Area: Makes things more difficult for new contributors to rust itself T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap)

Comments

@thomcc
Copy link
Member

thomcc commented Feb 5, 2022

Should x.py bench emit a message if settings that are bad for performance are enabled? I think my configuration was confusing me quite a bit in #90414 -- specifically rust.debug-info-std (which IIUC impacts which variables get spilled to the stack) and rust.incremental (which impacts codegen units). There are definitely others relevant here.

Note that I do think there are reasons that you may want to have these on when running the benchmark, so it should be an unobtrusive warning or message. Alternatively, it could be configured in some way automatically, if not overridden, similar to [proflie.bench] in cargo.

Discussion carried over from that PR.

@Mark-Simulacrum
Copy link
Member

I would rather not try to produce diagnostics about arbitrary configuration deltas, since that feels pretty hard to me; we have a lot of options, and the level to which they are benign is hard to tell (and somewhat dependent on local environment). In general, it is also difficult to effectively surface warnings in x.py, because we typically produce a lot of output, and so each line individually is not too noticeable.

It also seems like for benchmarking in particular, beyond just the options in config.toml, you'd presumably likely want to make sure the local machine is otherwise idle, etc. Maybe that's more obvious than config.toml changes, not sure.

@thomcc
Copy link
Member Author

thomcc commented Feb 5, 2022

Continuing from @the8472's comment: #90414 (comment)

I wonder if we could build std and dependencies with incremental but the final benchmark crate in non-incremental mode.

I'm not sure what that would help? I think there are two cases that are common for me to want (I cannot speak to others, but they may generalize):

  1. Compile and link std and the benchmark them as realistically as possible with respect to usage of std in code in the wild built with --release.
  2. Compile both together, and optimize them with full LTOed, so that I can compare with the algorithm in std versus a comparison impl in the benchmark, without additional overhead caused by which one is where.

Arguably, it would be better to always do the first, and compare separate runs. This kinda makes it a pain to regenerate benchmarks (maybe its fine, and you just would not include old benchmarks in the updates), but more importantly it causes the inaccurate results on bursty machines like some laptops (mine). Measuring everything together increases the likelihood that the numbers will be comparable with each-other by a lot, IME.

That said, I think you're asking for neither, and are interested in faster iteration? That's reasonable too (esp. if you have a stable benchmarking machine), but IDK if it makes sense as a default, since people probably expect at least one of the above situations to be the case (or it it just me? maybe my expectations don't generalize).

Anyway I think there are probably good reasons to have any possible build configuration when benchmarking, so even a warning is possibly too strong.

I will note that if the message is too subtle, I'd have missed it -- I can't check but I suspect the build said something about [optimized+debug] somewhere in the information output.

OTOH maybe the configuration/x.py/whatever should be set up so that this does something slightly more reasonable? That wouldn't help me for the debug info case, but TBH that one is my fault, since I literally forgot I turned it on. (It also may not even matter, if I'm wrong about the spills or if LLVM handles it).

@thomcc
Copy link
Member Author

thomcc commented Feb 5, 2022

It also seems like for benchmarking in particular, beyond just the options in config.toml, you'd presumably likely want to make sure the local machine is otherwise idle, etc. Maybe that's more obvious than config.toml changes, not sure.

Perhaps another set of options for benchmarking is useful? Cargo has [profile.bench], for example, and largely my handwavey notion of "it should do right thing" probably looks a similar "x.py bench should be closer to [profile.bench] unless you override it".

you'd presumably likely want to make sure the local machine is otherwise idle

This matters, is pretty obvious, and you can address it with more runs, but if the codegen settings, you can have very high confidence that one impl is faster than another, when really things are just compiled in a way that optimizes a case that wouldn't get optimized in the real world.

For example, in my case (which I don't want to lean on too hard, since my confidence that this is indeed what happened is at most 80%), the build settings seemed such that:

  • Code was consistently slower if it was implemented in std and called from the benchmarks
  • And it was faster if it was both implemented and called from the benchmarks

Presumably because one was a cross-crate call and one was not? Dunno!


Anyway, if the answer is we shouldn't do this, fair enough. Hopefully perf will notice any issues anyway?

Also, in my case, I had already kinda noticed that the numbers in std were less reliable than in a separate crate (which I used to write the code without having to constantly build std), but mostly chalked it up to "the stdlib is weird".

@the8472
Copy link
Member

the8472 commented Feb 5, 2022

My suggestion is aimed at achieving a middle-ground (perhaps even improving the pareto frontier) between benchmark noise and iteration speed. I can already disable turbo clocks and compile with 1CGU to reduce noise, but this is extremely tedious, especially when running stage1 benchmarks.

@Mark-Simulacrum
Copy link
Member

I think adding per-profile options definitely seems like it's probably too much: x.py already has a lot of knobs that confuse folks, and if we start trying to expand to per-profile options. Plus, having x.py bench blow away your whole x.py build cache (or need to cache separately) feels subtly annoying too.

Hopefully perf will notice any issues anyway?

We don't run std benchmarks anywhere right now, so not unless the functions happen to be used (and somewhat hot) in the compiler. It's a long time desire to start running those benchmarks somewhere, but that just hasn't happened yet.

I wonder if we could build std and dependencies with incremental but the final benchmark crate in non-incremental mode.

Found this comment from the PR now -- I think this may be a viable step. I'm not sure the extent to which it's a false hope -- if your function is core::str::Chars, that won't get monomorphized outside std (unless it's #[inline]), so you're still getting all the negatives of incremental mode I think. I'd not be opposed to a PR making this slight adjustment, but I think it's probably not very helpful in terms of practical improvement here.

@the8472
Copy link
Member

the8472 commented Feb 6, 2022

It might not help in this particular case, but I have had std benches involving generic code where non-incremental mode still made them less noisy.

@jyn514 jyn514 added T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) A-contributor-roadblock Area: Makes things more difficult for new contributors to rust itself labels Apr 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-contributor-roadblock Area: Makes things more difficult for new contributors to rust itself T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap)
Projects
None yet
Development

No branches or pull requests

4 participants