Profiler fixes #2097

Kha · 2023-02-07T08:32:28Z

The "interpretation" metric is now more accurate by subtracting run times of invoked native code. As a side effect, an interpreted closure is now reported as a separate task (regardless of whether the caller is the interpreter or native code), but I think the cumulative metric is more important here.

Kha · 2023-02-07T08:32:38Z

!bench

leanprover-bot · 2023-02-07T09:04:40Z

Here are the benchmark results for commit 9fa36fc.
There were significant changes against commit 4b974fd:

  Benchmark                  Metric         Change
  ===========================================================
- tests/bench/ interpreted   instructions     7.6%  (922.0 σ)
- tests/bench/ interpreted   task-clock       6.9%   (14.6 σ)
- tests/bench/ interpreted   wall-clock       8.0%   (25.6 σ)
- workspaceSymbols           branches         9.6% (1360.8 σ)
- workspaceSymbols           instructions    10.2% (2190.1 σ)
- workspaceSymbols           task-clock      15.3%   (36.0 σ)
- workspaceSymbols           wall-clock      15.3%   (36.0 σ)

Kha · 2023-02-07T17:09:29Z

I guess the profiling mutex is not quite insignificant!

gebner · 2023-02-07T18:35:26Z

IIUC this adds a KVMap-lookup to every native function call (even Array.size?) and every pap, even if the profiler is turned off. That feels like a very high cost for typical interpreted code.

I don't think we're going to get representative results from benchmarking core (as virtually everything is compiled). IIRC you said that mathlib didn't have a high percentage of interpreted runtime either; could you benchmark mathlib with this change (leanprover-community/mathlib4#2113 fixes the panic btw)?

Kha · 2023-02-09T13:54:20Z

!bench

leanprover-bot · 2023-02-09T13:59:18Z

Here are the benchmark results for commit 6431b81.
The entire run failed.
Found no significant differences.

Kha · 2023-02-09T14:01:07Z

!bench

leanprover-bot · 2023-02-09T14:33:47Z

Here are the benchmark results for commit 42e4d74.
There were significant changes against commit 4b974fd:

  Benchmark                  Metric         Change
  ===========================================================
- tests/bench/ interpreted   instructions     7.8% (3573.2 σ)
- tests/bench/ interpreted   task-clock       9.3%   (10.9 σ)
- tests/bench/ interpreted   wall-clock      10.3%   (31.3 σ)
- workspaceSymbols           branches         4.8%  (777.8 σ)
- workspaceSymbols           instructions     5.2% (1391.1 σ)
- workspaceSymbols           task-clock      12.6%   (45.2 σ)
- workspaceSymbols           wall-clock      12.6%   (45.1 σ)

Kha · 2023-02-09T15:09:15Z

!bench

leanprover-bot · 2023-02-09T15:14:45Z

Here are the benchmark results for commit 50b30e3.
The entire run failed.
Found no significant differences.

Kha · 2023-02-09T15:23:20Z

!bench

leanprover-bot · 2023-02-09T15:23:25Z

Here are the benchmark results for commit 50b30e3.
The entire run failed.
Found no significant differences.

Kha · 2023-02-09T20:01:16Z

!bench

leanprover-bot · 2023-02-09T20:02:12Z

Here are the benchmark results for commit 50b30e3.
The entire run failed.
Found no significant differences.

Kha · 2023-02-09T20:05:04Z

Ohh, it's caching the failure...

Kha · 2023-02-09T20:06:02Z

!bench

leanprover-bot · 2023-02-09T21:05:42Z

Here are the benchmark results for commit 84d0252.
There were significant changes against commit 4b974fd:

  Benchmark                  Metric         Change
  ============================================================
- tests/bench/ interpreted   instructions     4.7% (16077.6 σ)
- tests/bench/ interpreted   wall-clock       5.1%    (21.7 σ)
- workspaceSymbols           branches         4.8%  (1130.8 σ)
- workspaceSymbols           instructions     5.2%  (1706.4 σ)
- workspaceSymbols           task-clock      10.5%    (36.3 σ)
- workspaceSymbols           wall-clock      10.5%    (36.2 σ)

Kha · 2023-02-10T07:48:14Z

Interesting, so even if the class doesn't do much at all when the profiler is disabled, there is a very measurable slowdown. So it seems we would either have to accept this overhead or live with wildly inaccurate interpretation metrics.

I'll do a benchmark run of mathlib4 after the 2003 changes are ready.

Kha · 2023-03-23T08:40:20Z

!bench

leanprover-bot · 2023-03-23T09:11:19Z

Here are the benchmark results for commit 775ef14.
There were significant changes against commit 158d58f:

  Benchmark                  Metric          Change
  =============================================================
+ stdlib                     type checking    -1.1%   (-20.2 σ)
- tests/bench/ interpreted   instructions      4.9% (15962.7 σ)
- tests/bench/ interpreted   wall-clock        4.3%    (21.0 σ)
- workspaceSymbols           branches          4.7%   (345.1 σ)
- workspaceSymbols           instructions      5.2%   (559.4 σ)
- workspaceSymbols           task-clock       10.4%    (30.7 σ)
- workspaceSymbols           wall-clock       10.4%    (30.7 σ)

Kha · 2023-03-23T18:05:21Z

!bench

Kha · 2023-03-23T18:19:15Z

IIUC this adds a KVMap-lookup to every native function call (even Array.size?) and every pap, even if the profiler is turned off.

I missed this the first time, we only pay the cost when entering the interpreter (though calling an interpreted closure from the interpreter via ap does count as entering it, yes; in theory, we could cache the value of profiler in the closure).

I'm tending towards merging this as it's better to make the interpreter a little slower than to have no idea how much time it actually takes. I can do another mathlib4 run with the optimization I just pushed though.

leanprover-bot · 2023-03-23T18:47:54Z

Here are the benchmark results for commit d4bc7fc.
There were significant changes against commit 158d58f:

  Benchmark                  Metric          Change
  ============================================================
+ stdlib                     type checking    -1.2%  (-71.4 σ)
- tests/bench/ interpreted   instructions      4.3% (7660.0 σ)
- workspaceSymbols           branches          4.6%  (732.0 σ)
- workspaceSymbols           instructions      5.0% (1335.2 σ)
- workspaceSymbols           task-clock        9.4%   (27.2 σ)
- workspaceSymbols           wall-clock        9.4%   (27.1 σ)

gebner · 2023-03-24T18:08:12Z

I'm tending towards merging this as it's better to make the interpreter a little slower than to have no idea how much time it actually takes. I can do another mathlib4 run with the optimization I just pushed though.

Yes, please do that. The last version of this PR was a pretty significant regression, and I think we should only merge this if the impact is much smaller. IIUC leanprover-community/mathlib4#3048 (comment) correctly, it was a ~10% slowdown across the board. And tactic execution was even 50% slower.

Kha · 2023-03-24T22:35:53Z

I'm officially declaring this PR cursed. I noticed today that on my laptop, I don't get the ~10% slowdown from above for workspaceSymbols but instead 450%. I can only surmise that clock_gettime overhead very much depends on the hardware. CLOCK_MONOTONIC_COARSE (~4ms precision?) could help with that but is, as usual, Linux-only.

Kha · 2023-03-24T22:37:56Z

IIUC leanprover-community/mathlib4#3048 (comment) correctly, it was a ~10% slowdown across the board

2-3% on average judging from instructions/task-clock, without --profile it should definitely be smaller. Judging the individual file regressions as a set can be misleading since these might be mostly from small files, which indeed are probably dominated by interpretation in import. Sorting the commit comparison by total value shows that regression for most big files is <1%, though interestingly the slowest file, Mathlib.GroupTheory.MonoidLocalization, leads with +25%. I wouldn't be surprised if it spent most time at a single location.

And tactic execution was even 50% slower

Only superficially, it's time previously incorrectly detected as time spent in the interpreter. Same with import and linting, presumably, which are all categories with many interpreted extensions.

Kha · 2023-03-25T15:09:41Z

interestingly the slowest file, Mathlib.GroupTheory.MonoidLocalization, leads with +25%

(it's probably [to_additive])

Kha force-pushed the fix-prof branch from 6431b81 to 42e4d74 Compare February 9, 2023 14:00

Kha force-pushed the fix-prof branch 2 times, most recently from 5bf377e to 50b30e3 Compare February 9, 2023 15:08

Kha force-pushed the fix-prof branch from 50b30e3 to 84d0252 Compare February 9, 2023 20:05

Kha force-pushed the fix-prof branch from 84d0252 to 90fb29b Compare February 10, 2023 16:08

Kha mentioned this pull request Feb 11, 2023

Profiler fixes without regressions #2106

Merged

Kha added 3 commits March 23, 2023 09:40

fix: subtract nested native execution times from interpretation category

6d89815

perf: optimize negative time_task

026a0d8

fix: ignored time_task blocks

775ef14

Kha force-pushed the fix-prof branch from 90fb29b to 775ef14 Compare March 23, 2023 08:40

Kha added a commit to Kha/mathlib4 that referenced this pull request Mar 23, 2023

experiment: profile against leanprover/lean4#2097

7a8941e

Kha added a commit to Kha/mathlib4 that referenced this pull request Mar 23, 2023

experiment: profile against leanprover/lean4#2097

95ac9a7

perf: do not convert profiling time representation until output

d4bc7fc

Kha closed this Mar 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profiler fixes #2097

Profiler fixes #2097

Kha commented Feb 7, 2023

Kha commented Feb 7, 2023

leanprover-bot commented Feb 7, 2023

Kha commented Feb 7, 2023

gebner commented Feb 7, 2023

Kha commented Feb 9, 2023

leanprover-bot commented Feb 9, 2023

Kha commented Feb 9, 2023

leanprover-bot commented Feb 9, 2023

Kha commented Feb 9, 2023

leanprover-bot commented Feb 9, 2023

Kha commented Feb 9, 2023

leanprover-bot commented Feb 9, 2023

Kha commented Feb 9, 2023

leanprover-bot commented Feb 9, 2023

Kha commented Feb 9, 2023

Kha commented Feb 9, 2023

leanprover-bot commented Feb 9, 2023

Kha commented Feb 10, 2023

Kha commented Mar 23, 2023

leanprover-bot commented Mar 23, 2023

Kha commented Mar 23, 2023

Kha commented Mar 23, 2023

leanprover-bot commented Mar 23, 2023

gebner commented Mar 24, 2023

Kha commented Mar 24, 2023

Kha commented Mar 24, 2023 •

edited

Kha commented Mar 25, 2023

Profiler fixes #2097

Profiler fixes #2097

Conversation

Kha commented Feb 7, 2023

Kha commented Feb 7, 2023

leanprover-bot commented Feb 7, 2023

Kha commented Feb 7, 2023

gebner commented Feb 7, 2023

Kha commented Feb 9, 2023

leanprover-bot commented Feb 9, 2023

Kha commented Feb 9, 2023

leanprover-bot commented Feb 9, 2023

Kha commented Feb 9, 2023

leanprover-bot commented Feb 9, 2023

Kha commented Feb 9, 2023

leanprover-bot commented Feb 9, 2023

Kha commented Feb 9, 2023

leanprover-bot commented Feb 9, 2023

Kha commented Feb 9, 2023

Kha commented Feb 9, 2023

leanprover-bot commented Feb 9, 2023

Kha commented Feb 10, 2023

Kha commented Mar 23, 2023

leanprover-bot commented Mar 23, 2023

Kha commented Mar 23, 2023

Kha commented Mar 23, 2023

leanprover-bot commented Mar 23, 2023

gebner commented Mar 24, 2023

Kha commented Mar 24, 2023

Kha commented Mar 24, 2023 • edited

Kha commented Mar 25, 2023

Kha commented Mar 24, 2023 •

edited