New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Profiler fixes #2097
Profiler fixes #2097
Conversation
!bench |
Here are the benchmark results for commit 9fa36fc. Benchmark Metric Change
===========================================================
- tests/bench/ interpreted instructions 7.6% (922.0 σ)
- tests/bench/ interpreted task-clock 6.9% (14.6 σ)
- tests/bench/ interpreted wall-clock 8.0% (25.6 σ)
- workspaceSymbols branches 9.6% (1360.8 σ)
- workspaceSymbols instructions 10.2% (2190.1 σ)
- workspaceSymbols task-clock 15.3% (36.0 σ)
- workspaceSymbols wall-clock 15.3% (36.0 σ) |
I guess the profiling mutex is not quite insignificant! |
IIUC this adds a KVMap-lookup to every native function call (even I don't think we're going to get representative results from benchmarking core (as virtually everything is compiled). IIRC you said that mathlib didn't have a high percentage of interpreted runtime either; could you benchmark mathlib with this change (leanprover-community/mathlib4#2113 fixes the panic btw)? |
!bench |
Here are the benchmark results for commit 6431b81. |
!bench |
Here are the benchmark results for commit 42e4d74. Benchmark Metric Change
===========================================================
- tests/bench/ interpreted instructions 7.8% (3573.2 σ)
- tests/bench/ interpreted task-clock 9.3% (10.9 σ)
- tests/bench/ interpreted wall-clock 10.3% (31.3 σ)
- workspaceSymbols branches 4.8% (777.8 σ)
- workspaceSymbols instructions 5.2% (1391.1 σ)
- workspaceSymbols task-clock 12.6% (45.2 σ)
- workspaceSymbols wall-clock 12.6% (45.1 σ) |
5bf377e
to
50b30e3
Compare
!bench |
Here are the benchmark results for commit 50b30e3. |
!bench |
Here are the benchmark results for commit 50b30e3. |
!bench |
Here are the benchmark results for commit 50b30e3. |
Ohh, it's caching the failure... |
!bench |
Here are the benchmark results for commit 84d0252. Benchmark Metric Change
============================================================
- tests/bench/ interpreted instructions 4.7% (16077.6 σ)
- tests/bench/ interpreted wall-clock 5.1% (21.7 σ)
- workspaceSymbols branches 4.8% (1130.8 σ)
- workspaceSymbols instructions 5.2% (1706.4 σ)
- workspaceSymbols task-clock 10.5% (36.3 σ)
- workspaceSymbols wall-clock 10.5% (36.2 σ) |
Interesting, so even if the class doesn't do much at all when the profiler is disabled, there is a very measurable slowdown. So it seems we would either have to accept this overhead or live with wildly inaccurate I'll do a benchmark run of mathlib4 after the 2003 changes are ready. |
!bench |
Here are the benchmark results for commit 775ef14. Benchmark Metric Change
=============================================================
+ stdlib type checking -1.1% (-20.2 σ)
- tests/bench/ interpreted instructions 4.9% (15962.7 σ)
- tests/bench/ interpreted wall-clock 4.3% (21.0 σ)
- workspaceSymbols branches 4.7% (345.1 σ)
- workspaceSymbols instructions 5.2% (559.4 σ)
- workspaceSymbols task-clock 10.4% (30.7 σ)
- workspaceSymbols wall-clock 10.4% (30.7 σ) |
!bench |
I missed this the first time, we only pay the cost when entering the interpreter (though calling an interpreted closure from the interpreter via I'm tending towards merging this as it's better to make the interpreter a little slower than to have no idea how much time it actually takes. I can do another mathlib4 run with the optimization I just pushed though. |
Here are the benchmark results for commit d4bc7fc. Benchmark Metric Change
============================================================
+ stdlib type checking -1.2% (-71.4 σ)
- tests/bench/ interpreted instructions 4.3% (7660.0 σ)
- workspaceSymbols branches 4.6% (732.0 σ)
- workspaceSymbols instructions 5.0% (1335.2 σ)
- workspaceSymbols task-clock 9.4% (27.2 σ)
- workspaceSymbols wall-clock 9.4% (27.1 σ) |
Yes, please do that. The last version of this PR was a pretty significant regression, and I think we should only merge this if the impact is much smaller. IIUC leanprover-community/mathlib4#3048 (comment) correctly, it was a ~10% slowdown across the board. And tactic execution was even 50% slower. |
I'm officially declaring this PR cursed. I noticed today that on my laptop, I don't get the ~10% slowdown from above for |
2-3% on average judging from instructions/task-clock, without
Only superficially, it's time previously incorrectly detected as time spent in the interpreter. Same with import and linting, presumably, which are all categories with many interpreted extensions. |
(it's probably |
The "interpretation" metric is now more accurate by subtracting run times of invoked native code. As a side effect, an interpreted closure is now reported as a separate task (regardless of whether the caller is the interpreter or native code), but I think the cumulative metric is more important here.