New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop tracy from CI benchmarks? #16856
Comments
Could run an experiment with and without tracing to quantify the cost. For Linux the code to change appears to be https://github.com/openxla/iree/blob/767a6112abffebd896ceb2f0107c0d603c2a338a/build_tools/benchmarks/run_benchmarks_on_linux.py#L87-L110 |
Another thing we could do would be to set But yeah - not sure if enough people are monitoring the current CI benchmark infra to justify the ongoing costs, and dropping trace collection would likely save quite a bit of time. |
I'm okay to drop it. But IMO it is useful in the cases you don't have the access to the devices (e.g. Pixel phones, Nvidia cards). Recently I was debugging the regressions in #16731 and I was able to download the baseline and PR tracy files to quickly spot the slowest dispatches and find the problems (in this case the regressions are large so it is easy to spot) |
the cost should only be in execution time as the same vmfbs are used - I thought we weren't bounded by execution time? having them nightly/on releases would still be useful, if we have a nightly runner - when you need them you need them, and finding out you need to compare with something older and need to generate them can waste a day |
and yeah, we should probably always have debug info on the bots - it has zero runtime overhead but does introduce a text printing step to compilation - probably worth timing. |
Watch the logs while the benchmarks run and you'll see we are definitely waiting on execution :-) |
The other thing we can do is make it opt-in for particular benchmarks. The major benefit to me is looking for improvement/regressions over time that show up in end-to-end performance, and I don't care about having it on every device or every configuration. Not having to check out an ancient branch, get things going, get it on the hardware we measure on, and perfectly replicated is important. But one run of each model architecture on each machine is worth it. We've got bigger fish to fry on our process than removing some of the tiny amount of information we actually have. |
Ran the experiment at #16857... found that wholesale dropping all the related code would be a -345 lines of code shrink, and found the following timings from comparing that PR with another PR I ran today against the same
|
What @pzread said, it is useful when you don't have the access to the devices. I'd use it to debug pixel regressions when there are needs. It mostly happens when I'm on LLVM integrate rotation. I think it is okay to drop because we can reassign those regressions to ARM and Google folks. |
yeah I don't think we care about the pixel phones anymore - I am mostly concerned with losing the ability to look at what fusion decisions/whole architecture decisions we are changing over time. If we can do that on one bot for one configuration (with the option to always add more) and only for the major architectures that'd be enough for me. I don't like the idea of losing coverage entirely, though. We have a really bad story around visibility into memory consumption, latency, and scheduling and the traces are literally all we have. |
What if we only disabled Tracy captures on PR runs? |
That seems reasonable to me. If someone wants to see them on presubmit they could even just edit the code that makes them postsubmit only (if it's a switch localized to 1-2 files). |
@pzread how do you feel about disabling Tracy on PR runs, or making it an extra opt-in? |
opt-in on PR via a label and then continuously run on main SGTM |
Make it optional and off by default on PR runs SGTM |
Our benchmarks on CI run twice, without and with Tracy. This is a substantial cost and latency hit (not quite 2x as I can see that tracy is run with 4 repetitions where non-tracy does 10). The motivation was that we could download and inspect traces from regressions caught on CI, but does anyone do that in practice? We don't inspect CI benchmark regressions every day, and when we do, usually we just reproduce the regression locally, which is often necessary anyway to resolve it.
@pzread @hanhanW @benvanik @ScottTodd
The text was updated successfully, but these errors were encountered: