-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add structured trace profiler #2181
Conversation
!bench |
Here are the benchmark results for commit d1ed048. Benchmark Metric Change
===============================================
- liasolver maxrss 61.8% (16.5 σ)
- qsort maxrss 61.8% (16.5 σ)
- rbmap maxrss 32.1% (10.5 σ)
- stdlib instructions 4.8% (613.2 σ)
- stdlib tactic execution 8.3% (90.5 σ)
- stdlib task-clock 6.5% (47.3 σ)
- stdlib wall-clock 7.4% (50.5 σ) |
!bench |
Here are the benchmark results for commit d10d5fe. Benchmark Metric Change
===============================================
- stdlib instructions 4.1% (615.3 σ)
- stdlib tactic execution 3.2% (76.4 σ)
- stdlib task-clock 4.3% (25.7 σ) |
!bench |
Here are the benchmark results for commit a0d1720. |
Essentially no overhead anymore when not using the option, great. Since this change is purely additive (functionally) and not a critical component, I think we can merge this and then refine later on if needed. |
Interactive view of |
The
--profile
output is nice for getting a coarse summary of what time is spent on within a file or package, but not as helpful for finding specific locations, especially since the profiler categories are disjoint (so a tactic invocation spending most of the time in many small typeclass synthesis calls will never show up in the profiler). The timing information in the structured traces is better for this use case, but you need to know what trace class to activate in the first place and you likely will still get lost in its pages of output.This PR introduces a new option
trace.profiler
(andtrace.profiler.threshold
defaulting to 10ms) that is independent of the existing profiler option and aims to make finding and exploring expensive locations in traces easier by activating trace nodes not by class but by whether their (inclusive!) execution time exceeds the configured threshold.The new pretty printing option
pp.oneline
replaces anything but the first line with[...]
to keep the output easily scannable. Note that because this option works on the pretty printer level, there may still be multi-line trace nodes, but the output seems reasonable enough.pp.oneline
currently produces non-interactive traces only.Finally, as the output is otherwise regular trace output, we can in fact use @hargoniX's FlameTC explorer (after one tiny generalization) to print nice-looking and even easier-to-explore flame graphs from it.
![image](https://user-images.githubusercontent.com/109126/229504835-11d75650-47c7-40bc-87e0-93eac1b8e8b6.png)
In the end, the output structure is not too different from what e.g. perf+hotspot can give us, but on a slightly more abstract, component-based level and importantly with more information about inputs and no danger of stacks getting cut down from too-deep recursion.