Drop tracy from CI benchmarks? #16856

bjacob · 2024-03-20T19:29:32Z

Our benchmarks on CI run twice, without and with Tracy. This is a substantial cost and latency hit (not quite 2x as I can see that tracy is run with 4 repetitions where non-tracy does 10). The motivation was that we could download and inspect traces from regressions caught on CI, but does anyone do that in practice? We don't inspect CI benchmark regressions every day, and when we do, usually we just reproduce the regression locally, which is often necessary anyway to resolve it.

@pzread @hanhanW @benvanik @ScottTodd

ScottTodd · 2024-03-20T19:32:37Z

Could run an experiment with and without tracing to quantify the cost. For Linux the code to change appears to be https://github.com/openxla/iree/blob/767a6112abffebd896ceb2f0107c0d603c2a338a/build_tools/benchmarks/run_benchmarks_on_linux.py#L87-L110

ScottTodd · 2024-03-20T19:52:45Z

Another thing we could do would be to set --iree-hal-executable-debug-level=3 to embed source files, since that works now (thanks to #16757). Traces would be more useful with that.

But yeah - not sure if enough people are monitoring the current CI benchmark infra to justify the ongoing costs, and dropping trace collection would likely save quite a bit of time.

pzread · 2024-03-20T20:02:32Z

I'm okay to drop it. But IMO it is useful in the cases you don't have the access to the devices (e.g. Pixel phones, Nvidia cards).

Recently I was debugging the regressions in #16731 and I was able to download the baseline and PR tracy files to quickly spot the slowest dispatches and find the problems (in this case the regressions are large so it is easy to spot)

benvanik · 2024-03-20T20:14:08Z

the cost should only be in execution time as the same vmfbs are used - I thought we weren't bounded by execution time?

having them nightly/on releases would still be useful, if we have a nightly runner - when you need them you need them, and finding out you need to compare with something older and need to generate them can waste a day

benvanik · 2024-03-20T20:15:45Z

and yeah, we should probably always have debug info on the bots - it has zero runtime overhead but does introduce a text printing step to compilation - probably worth timing.

bjacob · 2024-03-20T20:22:34Z

Watch the logs while the benchmarks run and you'll see we are definitely waiting on execution :-)
While the vmfbs are shared, there is also the overhead of transferring and storing the traces.
Great data point @pzread , to hear you actually had a recent use case.

benvanik · 2024-03-20T20:58:17Z

The other thing we can do is make it opt-in for particular benchmarks. The major benefit to me is looking for improvement/regressions over time that show up in end-to-end performance, and I don't care about having it on every device or every configuration. Not having to check out an ancient branch, get things going, get it on the hardware we measure on, and perfectly replicated is important. But one run of each model architecture on each machine is worth it. We've got bigger fish to fry on our process than removing some of the tiny amount of information we actually have.

bjacob · 2024-03-20T21:04:25Z

Ran the experiment at #16857... found that wholesale dropping all the related code would be a -345 lines of code shrink, and found the following timings from comparing that PR with another PR I ran today against the same main commit (#16855):

CI job	With tracy capture (#16855)	Without tracy capture (#16857)	Speedup
run_benchmarks (c2-standard-60, linux, x86_64, 0, 2)	22 min 2 s	11 min 46 s	1.87x
run_benchmarks (c2-standard-60, linux, x86_64, 1, 2)	21 min 1 s	13 min 36 s	1.54x
run_benchmarks (pixel-6-pro, android, armv8.2-a, 0, 1)	21 min 12 s	11 min 5 s	1.92x

hanhanW · 2024-03-20T21:23:33Z

What @pzread said, it is useful when you don't have the access to the devices. I'd use it to debug pixel regressions when there are needs. It mostly happens when I'm on LLVM integrate rotation. I think it is okay to drop because we can reassign those regressions to ARM and Google folks.

benvanik · 2024-03-20T21:26:12Z

yeah I don't think we care about the pixel phones anymore - I am mostly concerned with losing the ability to look at what fusion decisions/whole architecture decisions we are changing over time. If we can do that on one bot for one configuration (with the option to always add more) and only for the major architectures that'd be enough for me. I don't like the idea of losing coverage entirely, though. We have a really bad story around visibility into memory consumption, latency, and scheduling and the traces are literally all we have.

bjacob · 2024-03-21T14:07:20Z

What if we only disabled Tracy captures on PR runs?

ScottTodd · 2024-03-21T16:43:56Z

What if we only disabled Tracy captures on PR runs?

That seems reasonable to me. If someone wants to see them on presubmit they could even just edit the code that makes them postsubmit only (if it's a switch localized to 1-2 files).

ScottTodd · 2024-03-26T16:09:29Z

@pzread how do you feel about disabling Tracy on PR runs, or making it an extra opt-in?

benvanik · 2024-03-26T16:12:49Z

opt-in on PR via a label and then continuously run on main SGTM

pzread · 2024-03-26T16:56:56Z

Make it optional and off by default on PR runs SGTM

bjacob added the infrastructure/benchmark Relating to benchmarking infrastructure label Mar 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drop tracy from CI benchmarks? #16856

Drop tracy from CI benchmarks? #16856

bjacob commented Mar 20, 2024 •

edited

ScottTodd commented Mar 20, 2024

ScottTodd commented Mar 20, 2024

pzread commented Mar 20, 2024 •

edited

benvanik commented Mar 20, 2024

benvanik commented Mar 20, 2024

bjacob commented Mar 20, 2024

benvanik commented Mar 20, 2024 •

edited

bjacob commented Mar 20, 2024

hanhanW commented Mar 20, 2024

benvanik commented Mar 20, 2024

bjacob commented Mar 21, 2024

ScottTodd commented Mar 21, 2024

ScottTodd commented Mar 26, 2024

benvanik commented Mar 26, 2024

pzread commented Mar 26, 2024

Drop tracy from CI benchmarks? #16856

Drop tracy from CI benchmarks? #16856

Comments

bjacob commented Mar 20, 2024 • edited

ScottTodd commented Mar 20, 2024

ScottTodd commented Mar 20, 2024

pzread commented Mar 20, 2024 • edited

benvanik commented Mar 20, 2024

benvanik commented Mar 20, 2024

bjacob commented Mar 20, 2024

benvanik commented Mar 20, 2024 • edited

bjacob commented Mar 20, 2024

hanhanW commented Mar 20, 2024

benvanik commented Mar 20, 2024

bjacob commented Mar 21, 2024

ScottTodd commented Mar 21, 2024

ScottTodd commented Mar 26, 2024

benvanik commented Mar 26, 2024

pzread commented Mar 26, 2024

bjacob commented Mar 20, 2024 •

edited

pzread commented Mar 20, 2024 •

edited

benvanik commented Mar 20, 2024 •

edited