Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop tracy from CI benchmarks? #16856

Open
bjacob opened this issue Mar 20, 2024 · 15 comments
Open

Drop tracy from CI benchmarks? #16856

bjacob opened this issue Mar 20, 2024 · 15 comments
Labels
infrastructure/benchmark Relating to benchmarking infrastructure

Comments

@bjacob
Copy link
Contributor

bjacob commented Mar 20, 2024

Our benchmarks on CI run twice, without and with Tracy. This is a substantial cost and latency hit (not quite 2x as I can see that tracy is run with 4 repetitions where non-tracy does 10). The motivation was that we could download and inspect traces from regressions caught on CI, but does anyone do that in practice? We don't inspect CI benchmark regressions every day, and when we do, usually we just reproduce the regression locally, which is often necessary anyway to resolve it.

@pzread @hanhanW @benvanik @ScottTodd

@bjacob bjacob added the infrastructure/benchmark Relating to benchmarking infrastructure label Mar 20, 2024
@ScottTodd
Copy link
Collaborator

Could run an experiment with and without tracing to quantify the cost. For Linux the code to change appears to be https://github.com/openxla/iree/blob/767a6112abffebd896ceb2f0107c0d603c2a338a/build_tools/benchmarks/run_benchmarks_on_linux.py#L87-L110

@ScottTodd
Copy link
Collaborator

Another thing we could do would be to set --iree-hal-executable-debug-level=3 to embed source files, since that works now (thanks to #16757). Traces would be more useful with that.

But yeah - not sure if enough people are monitoring the current CI benchmark infra to justify the ongoing costs, and dropping trace collection would likely save quite a bit of time.

@pzread
Copy link
Member

pzread commented Mar 20, 2024

I'm okay to drop it. But IMO it is useful in the cases you don't have the access to the devices (e.g. Pixel phones, Nvidia cards).

Recently I was debugging the regressions in #16731 and I was able to download the baseline and PR tracy files to quickly spot the slowest dispatches and find the problems (in this case the regressions are large so it is easy to spot)

@benvanik
Copy link
Collaborator

the cost should only be in execution time as the same vmfbs are used - I thought we weren't bounded by execution time?

having them nightly/on releases would still be useful, if we have a nightly runner - when you need them you need them, and finding out you need to compare with something older and need to generate them can waste a day

@benvanik
Copy link
Collaborator

and yeah, we should probably always have debug info on the bots - it has zero runtime overhead but does introduce a text printing step to compilation - probably worth timing.

@bjacob
Copy link
Contributor Author

bjacob commented Mar 20, 2024

Watch the logs while the benchmarks run and you'll see we are definitely waiting on execution :-)
While the vmfbs are shared, there is also the overhead of transferring and storing the traces.
Great data point @pzread , to hear you actually had a recent use case.

@benvanik
Copy link
Collaborator

benvanik commented Mar 20, 2024

The other thing we can do is make it opt-in for particular benchmarks. The major benefit to me is looking for improvement/regressions over time that show up in end-to-end performance, and I don't care about having it on every device or every configuration. Not having to check out an ancient branch, get things going, get it on the hardware we measure on, and perfectly replicated is important. But one run of each model architecture on each machine is worth it. We've got bigger fish to fry on our process than removing some of the tiny amount of information we actually have.

@bjacob
Copy link
Contributor Author

bjacob commented Mar 20, 2024

Ran the experiment at #16857... found that wholesale dropping all the related code would be a -345 lines of code shrink, and found the following timings from comparing that PR with another PR I ran today against the same main commit (#16855):

CI job With tracy capture (#16855) Without tracy capture (#16857) Speedup
run_benchmarks (c2-standard-60, linux, x86_64, 0, 2) 22 min 2 s 11 min 46 s 1.87x
run_benchmarks (c2-standard-60, linux, x86_64, 1, 2) 21 min 1 s 13 min 36 s 1.54x
run_benchmarks (pixel-6-pro, android, armv8.2-a, 0, 1) 21 min 12 s 11 min 5 s 1.92x

@hanhanW
Copy link
Contributor

hanhanW commented Mar 20, 2024

What @pzread said, it is useful when you don't have the access to the devices. I'd use it to debug pixel regressions when there are needs. It mostly happens when I'm on LLVM integrate rotation. I think it is okay to drop because we can reassign those regressions to ARM and Google folks.

@benvanik
Copy link
Collaborator

yeah I don't think we care about the pixel phones anymore - I am mostly concerned with losing the ability to look at what fusion decisions/whole architecture decisions we are changing over time. If we can do that on one bot for one configuration (with the option to always add more) and only for the major architectures that'd be enough for me. I don't like the idea of losing coverage entirely, though. We have a really bad story around visibility into memory consumption, latency, and scheduling and the traces are literally all we have.

@bjacob
Copy link
Contributor Author

bjacob commented Mar 21, 2024

What if we only disabled Tracy captures on PR runs?

@ScottTodd
Copy link
Collaborator

What if we only disabled Tracy captures on PR runs?

That seems reasonable to me. If someone wants to see them on presubmit they could even just edit the code that makes them postsubmit only (if it's a switch localized to 1-2 files).

@ScottTodd
Copy link
Collaborator

@pzread how do you feel about disabling Tracy on PR runs, or making it an extra opt-in?

@benvanik
Copy link
Collaborator

opt-in on PR via a label and then continuously run on main SGTM

@pzread
Copy link
Member

pzread commented Mar 26, 2024

Make it optional and off by default on PR runs SGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infrastructure/benchmark Relating to benchmarking infrastructure
Projects
None yet
Development

No branches or pull requests

5 participants