Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add structured trace profiler #2181

Merged
merged 10 commits into from
Apr 10, 2023
Merged

Add structured trace profiler #2181

merged 10 commits into from
Apr 10, 2023

Conversation

Kha
Copy link
Member

@Kha Kha commented Apr 3, 2023

The --profile output is nice for getting a coarse summary of what time is spent on within a file or package, but not as helpful for finding specific locations, especially since the profiler categories are disjoint (so a tactic invocation spending most of the time in many small typeclass synthesis calls will never show up in the profiler). The timing information in the structured traces is better for this use case, but you need to know what trace class to activate in the first place and you likely will still get lost in its pages of output.

This PR introduces a new option trace.profiler (and trace.profiler.threshold defaulting to 10ms) that is independent of the existing profiler option and aims to make finding and exploring expensive locations in traces easier by activating trace nodes not by class but by whether their (inclusive!) execution time exceeds the configured threshold.

$ lean Mathlib/Data/Polynomial/FieldDivision.lean -Dtrace.profiler=true -Dpp.oneline=true
[Elab.input] [5.410904s] theorem derivative_rootMultiplicity_of_root [CharZero R] {p : R[X]} {t : R} (hpt : p.IsRoot t) : [...]
  [Elab.step] [5.312021s] rcases eq_or_ne p 0 with (rfl | hp) [...]
    [Elab.step] [5.311969s] rcases eq_or_ne p 0 with (rfl | hp) [...]
      [Elab.step] [0.065163s] · simp
        [Elab.step] [0.065141s] simp
          [Elab.step] [0.065134s] simp
            [Meta.isDefEq] [0.023725s] ❌ ↑?f (OfNat.ofNat ?n) =?= ↑Polynomial.derivative 0
              [Meta.isDefEq] [0.023450s] ❌ OfNat.ofNat ?n =?= 0
                [Meta.isDefEq] [0.023335s] ❌ instOfNat =?= Zero.toOfNat0
                  [Meta.isDefEq] [0.023310s] ❌ { ofNat := ↑0 } =?= { ofNat := Zero.zero }
                    [Meta.isDefEq] [0.023263s] ❌ ↑0 =?= Zero.zero
                      [Meta.isDefEq] [0.023224s] ❌ NatCast.natCast 0 =?= Zero.zero
                        [Meta.isDefEq] [0.023184s] ❌ NonAssocSemiring.toNatCast.1 0 =?= Polynomial.zero.1
                          [Meta.isDefEq] [0.023157s] ❌ NonAssocSemiring.toNatCast.1 0 =?= { toFinsupp := 0 }
                            [Meta.isDefEq.onFailure] [0.013963s] ❌ NonAssocSemiring.toNatCast.1 0 =?= { toFinsupp := 0 }
                              [Meta.isDefEq] [0.013937s] ❌ NonAssocSemiring.toNatCast.1 0 =?= { toFinsupp := 0 }
                                [Meta.isDefEq] [0.013796s] ❌ ↑0 =?= { toFinsupp := 0 }
            [Meta.isDefEq] [0.017557s] ✅ ↑?f 0 =?= ↑Polynomial.derivative 0
              [Meta.synthInstance] [0.016535s] ✅ ZeroHomClass (R[X] →ₗ[R] R[X]) R[X] R[X]
            [Meta.synthInstance] [0.014352s] ✅ ZeroHomClass (R[X] →ₗ[R] R[X]) R[X] R[X]
      [Elab.step] [0.131247s] simp only [derivative_pow, derivative_mul, derivative_sub, derivative_X, derivative_C, sub_zero, mul_one]
        [Meta.isDefEq] [0.048698s] ✅ ↑Polynomial.derivative (?p ^ ?n) =?= ↑Polynomial.derivative ((Polynomial.X - ↑Polynomial.C t) ^ Polynomial.rootMultiplicity t p)
          [Meta.isDefEq] [0.019443s] ✅ R[X] →ₗ[R] R[X] =?= R[X] →ₗ[R] R[X]
            [Meta.isDefEq] [0.016730s] ✅ NonUnitalNonAssocSemiring.toAddCommMonoid =?= NonUnitalNonAssocSemiring.toAddCommMonoid
              [Meta.isDefEq] [0.016721s] ✅ NonAssocSemiring.toNonUnitalNonAssocSemiring.1 =?= NonAssocSemiring.toNonUnitalNonAssocSemiring.1
                [Meta.isDefEq] [0.016591s] ✅ NonUnitalNonAssocSemiring.toAddCommMonoid =?= NonUnitalNonAssocSemiring.toAddCommMonoid
                  [Meta.isDefEq] [0.016581s] ✅ (Function.Injective.nonUnitalNonAssocSemiring Polynomial.toFinsupp (_ : Function.Injective Polynomial.toFinsupp) [...] =?= (Function.Injective.nonUnitalNonAssocSemiring Polynomial.toFinsupp (_ : Function.Injective Polynomial.toFinsupp) [...]
                    [Meta.isDefEq] [0.016527s] ✅ AddCommMonoid.mk (_ : ∀ (a b : R[X]), a + b = b + a) =?= AddCommMonoid.mk (_ : ∀ (a b : R[X]), a + b = b + a)
                      [Meta.isDefEq] [0.015472s] ✅ AddCommMonoid.toAddMonoid =?= AddCommMonoid.toAddMonoid
                        [Meta.isDefEq] [0.015461s] ✅ (Function.Injective.addCommMonoid Polynomial.toFinsupp (_ : Function.Injective Polynomial.toFinsupp) [...] =?= (Function.Injective.addCommMonoid Polynomial.toFinsupp (_ : Function.Injective Polynomial.toFinsupp) [...]
                          [Meta.isDefEq] [0.015388s] ✅ AddMonoid.mk (_ : ∀ (a : R[X]), 0 + a = a) (_ : ∀ (a : R[X]), a + 0 = a) AddMonoid.nsmul =?= AddMonoid.mk (_ : ∀ (a : R[X]), 0 + a = a) (_ : ∀ (a : R[X]), a + 0 = a) AddMonoid.nsmul
          [Meta.isDefEq] [0.017521s] ✅ LinearMap.instFunLikeLinearMap =?= LinearMap.instFunLikeLinearMap
            [Meta.isDefEq.delta] [0.017476s] ✅ LinearMap.instFunLikeLinearMap =?= LinearMap.instFunLikeLinearMap
              [Meta.isDefEq] [0.015240s] ✅ NonUnitalNonAssocSemiring.toAddCommMonoid =?= NonUnitalNonAssocSemiring.toAddCommMonoid
                [Meta.isDefEq] [0.015231s] ✅ NonAssocSemiring.toNonUnitalNonAssocSemiring.1 =?= NonAssocSemiring.toNonUnitalNonAssocSemiring.1
                  [Meta.isDefEq] [0.015127s] ✅ NonUnitalNonAssocSemiring.toAddCommMonoid =?= NonUnitalNonAssocSemiring.toAddCommMonoid
                    [Meta.isDefEq] [0.015118s] ✅ (Function.Injective.nonUnitalNonAssocSemiring Polynomial.toFinsupp (_ : Function.Injective Polynomial.toFinsupp) [...] =?= (Function.Injective.nonUnitalNonAssocSemiring Polynomial.toFinsupp (_ : Function.Injective Polynomial.toFinsupp) [...]
                      [Meta.isDefEq] [0.015092s] ✅ AddCommMonoid.mk (_ : ∀ (a b : R[X]), a + b = b + a) =?= AddCommMonoid.mk (_ : ∀ (a b : R[X]), a + b = b + a)
                        [Meta.isDefEq] [0.014201s] ✅ AddCommMonoid.toAddMonoid =?= AddCommMonoid.toAddMonoid
                          [Meta.isDefEq] [0.014191s] ✅ (Function.Injective.addCommMonoid Polynomial.toFinsupp (_ : Function.Injective Polynomial.toFinsupp) [...] =?= (Function.Injective.addCommMonoid Polynomial.toFinsupp (_ : Function.Injective Polynomial.toFinsupp) [...]
                            [Meta.isDefEq] [0.014165s] ✅ AddMonoid.mk (_ : ∀ (a : R[X]), 0 + a = a) (_ : ∀ (a : R[X]), a + 0 = a) AddMonoid.nsmul =?= AddMonoid.mk (_ : ∀ (a : R[X]), 0 + a = a) (_ : ∀ (a : R[X]), a + 0 = a) AddMonoid.nsmul
        [Meta.isDefEq] [0.036693s] ✅ ↑Polynomial.derivative (?f - ?g) =?= ↑Polynomial.derivative (Polynomial.X - ↑Polynomial.C t)
          [Meta.isDefEq] [0.018497s] ✅ R[X] →ₗ[R] R[X] =?= R[X] →ₗ[R] R[X]
            [Meta.isDefEq] [0.015961s] ✅ NonUnitalNonAssocSemiring.toAddCommMonoid =?= NonUnitalNonAssocSemiring.toAddCommMonoid
              [Meta.isDefEq] [0.015952s] ✅ NonAssocSemiring.toNonUnitalNonAssocSemiring.1 =?= NonAssocSemiring.toNonUnitalNonAssocSemiring.1
                [Meta.isDefEq] [0.015822s] ✅ NonUnitalNonAssocSemiring.toAddCommMonoid =?= NonUnitalNonAssocSemiring.toAddCommMonoid
                  [Meta.isDefEq] [0.015812s] ✅ (Function.Injective.nonUnitalNonAssocSemiring Polynomial.toFinsupp (_ : Function.Injective Polynomial.toFinsupp) [...] =?= (Function.Injective.nonUnitalNonAssocSemiring Polynomial.toFinsupp (_ : Function.Injective Polynomial.toFinsupp) [...]
                    [Meta.isDefEq] [0.015762s] ✅ AddCommMonoid.mk (_ : ∀ (a b : R[X]), a + b = b + a) =?= AddCommMonoid.mk (_ : ∀ (a b : R[X]), a + b = b + a)
                      [Meta.isDefEq] [0.014808s] ✅ AddCommMonoid.toAddMonoid =?= AddCommMonoid.toAddMonoid
                        [Meta.isDefEq] [0.014798s] ✅ (Function.Injective.addCommMonoid Polynomial.toFinsupp (_ : Function.Injective Polynomial.toFinsupp) [...] =?= (Function.Injective.addCommMonoid Polynomial.toFinsupp (_ : Function.Injective Polynomial.toFinsupp) [...]
                          [Meta.isDefEq] [0.014761s] ✅ AddMonoid.mk (_ : ∀ (a : R[X]), 0 + a = a) (_ : ∀ (a : R[X]), a + 0 = a) AddMonoid.nsmul =?= AddMonoid.mk (_ : ∀ (a : R[X]), 0 + a = a) (_ : ∀ (a : R[X]), a + 0 = a) AddMonoid.nsmul
          [Meta.isDefEq] [0.017675s] ✅ LinearMap.instFunLikeLinearMap =?= LinearMap.instFunLikeLinearMap
            [Meta.isDefEq.delta] [0.017629s] ✅ LinearMap.instFunLikeLinearMap =?= LinearMap.instFunLikeLinearMap
              [Meta.isDefEq] [0.015387s] ✅ NonUnitalNonAssocSemiring.toAddCommMonoid =?= NonUnitalNonAssocSemiring.toAddCommMonoid
                [Meta.isDefEq] [0.015377s] ✅ NonAssocSemiring.toNonUnitalNonAssocSemiring.1 =?= NonAssocSemiring.toNonUnitalNonAssocSemiring.1
                  [Meta.isDefEq] [0.015273s] ✅ NonUnitalNonAssocSemiring.toAddCommMonoid =?= NonUnitalNonAssocSemiring.toAddCommMonoid
                    [Meta.isDefEq] [0.015264s] ✅ (Function.Injective.nonUnitalNonAssocSemiring Polynomial.toFinsupp (_ : Function.Injective Polynomial.toFinsupp) [...] =?= (Function.Injective.nonUnitalNonAssocSemiring Polynomial.toFinsupp (_ : Function.Injective Polynomial.toFinsupp) [...]
                      [Meta.isDefEq] [0.015239s] ✅ AddCommMonoid.mk (_ : ∀ (a b : R[X]), a + b = b + a) =?= AddCommMonoid.mk (_ : ∀ (a b : R[X]), a + b = b + a)
                        [Meta.isDefEq] [0.014329s] ✅ AddCommMonoid.toAddMonoid =?= AddCommMonoid.toAddMonoid
                          [Meta.isDefEq] [0.014319s] ✅ (Function.Injective.addCommMonoid Polynomial.toFinsupp (_ : Function.Injective Polynomial.toFinsupp) [...] =?= (Function.Injective.addCommMonoid Polynomial.toFinsupp (_ : Function.Injective Polynomial.toFinsupp) [...]
                            [Meta.isDefEq] [0.014292s] ✅ AddMonoid.mk (_ : ∀ (a : R[X]), 0 + a = a) (_ : ∀ (a : R[X]), a + 0 = a) AddMonoid.nsmul =?= AddMonoid.mk (_ : ∀ (a : R[X]), 0 + a = a) (_ : ∀ (a : R[X]), a + 0 = a) AddMonoid.nsmul
        [Meta.isDefEq] [0.020091s] ✅ ?a - 0 =?= 1 - 0
          [Meta.isDefEq] [0.019314s] ✅ 0 =?= 0
            [Meta.isDefEq] [0.019267s] ✅ Zero.toOfNat0 =?= Zero.toOfNat0
              [Meta.isDefEq.delta] [0.019253s] ✅ Zero.toOfNat0 =?= Zero.toOfNat0
                [Meta.isDefEq] [0.019221s] ✅ NegZeroClass.toZero =?= Polynomial.zero
                  [Meta.isDefEq] [0.019177s] ✅ NegZeroClass.toZero =?= { zero := { toFinsupp := 0 } }
                    [Meta.isDefEq] [0.019155s] ✅ Zero.zero =?= { toFinsupp := 0 }
                      [Meta.isDefEq] [0.019135s] ✅ Zero.zero.toFinsupp =?= 0
                        [Meta.isDefEq] [0.018959s] ✅ Zero.zero.1 =?= 0
                          [Meta.isDefEq] [0.018903s] ✅ Zero.zero.1 =?= Zero.toOfNat0.1
                            [Meta.isDefEq] [0.018875s] ✅ Zero.zero.1 =?= Zero.zero
                              [Meta.isDefEq] [0.018829s] ✅ Zero.zero.1 =?= MonoidWithZero.toZero.1
                                [Meta.isDefEq] [0.018763s] ✅ Zero.zero.1 =?= 0
                                  [Meta.isDefEq] [0.018714s] ✅ Zero.zero.1 =?= Zero.toOfNat0.1
                                    [Meta.isDefEq] [0.018685s] ✅ Zero.zero.1 =?= Zero.zero
                                      [Meta.isDefEq] [0.018638s] ✅ Zero.zero.1 =?= MulZeroClass.toZero.1
                                        [Meta.isDefEq] [0.018586s] ✅ Zero.zero.1 =?= 0
                                          [Meta.isDefEq] [0.018538s] ✅ Zero.zero.1 =?= Zero.toOfNat0.1
                                            [Meta.isDefEq] [0.018511s] ✅ Zero.zero.1 =?= Zero.zero
                                              [Meta.isDefEq] [0.018465s] ✅ Zero.zero.1 =?= AddMonoid.toZero.1
                                                [Meta.isDefEq] [0.018393s] ✅ Zero.zero.1 =?= Zero.zero
                                                  [Meta.isDefEq] [0.018344s] ✅ Zero.zero.1 =?= Finsupp.zero.1
                                                    [Meta.isDefEq] [0.018314s] ✅ Zero.zero.1 =?= { support := ∅, toFun := 0, mem_support_toFun := (_ : ∀ (x : ℕ), x ∈ ∅ ↔ OfNat.ofNat 0 x ≠ 0) }
                                                      [Meta.isDefEq] [0.016660s] ✅ Zero.zero.1.support =?= ∅
                                                        [Meta.isDefEq] [0.016634s] ✅ Zero.zero.1.1 =?= ∅
                                                          [Meta.isDefEq] [0.016585s] ✅ Zero.zero.1.1 =?= Finset.instEmptyCollectionFinset.1
                                                            [Meta.isDefEq] [0.016545s] ✅ Zero.zero.1.1 =?= Finset.empty
                                                              [Meta.isDefEq] [0.016500s] ✅ Zero.zero.1.1 =?= { val := 0, nodup := (_ : Multiset.Nodup 0) }
                                                                [Meta.isDefEq] [0.015981s] ✅ Zero.zero.1.1.val =?= 0
                                                                  [Meta.isDefEq] [0.015942s] ✅ Zero.zero.1.1.1 =?= 0
                                                                    [Meta.isDefEq] [0.015885s] ✅ Zero.zero.1.1.1 =?= Zero.toOfNat0.1
                                                                      [Meta.isDefEq] [0.015842s] ✅ Zero.zero.1.1.1 =?= Zero.zero
                                                                        [Meta.isDefEq] [0.015796s] ✅ Zero.zero.1.1.1 =?= Multiset.instZeroMultiset.1
                                                                          [Meta.isDefEq] [0.015753s] ✅ Zero.zero.1.1.1 =?= Multiset.zero
                                                                            [Meta.isDefEq] [0.015709s] ✅ Zero.zero.1.1.1 =?= ↑[]
                                                                              [Meta.isDefEq] [0.015664s] ✅ Zero.zero.1.1.1 =?= Quot.mk Setoid.r []
                                                                                [Meta.isDefEq.onFailure] [0.015593s] ✅ Zero.zero.1.1.1 =?= Quot.mk Setoid.r []
                                                                                  [Meta.synthInstance] [0.014838s] ✅ SubNegZeroMonoid R[X]
(...)

The new pretty printing option pp.oneline replaces anything but the first line with [...] to keep the output easily scannable. Note that because this option works on the pretty printer level, there may still be multi-line trace nodes, but the output seems reasonable enough.

                  [Elab.step] [0.018801s] refine if h : p.IsRoot t then ?pos✝ else ?neg✝
                    [Elab.step] [0.018756s] expected type: Polynomial.rootMultiplicity t p - 1 ≤ Polynomial.rootMultiplicity t (↑Polynomial.derivative p), term
                        if h : p.IsRoot t then ?pos✝ else ?neg✝
                      [Elab.step] [0.018747s] expected type: Polynomial.rootMultiplicity t p - 1 ≤ Polynomial.rootMultiplicity t (↑Polynomial.derivative p), term
                          let_mvar% ?m✝ := p.IsRoot t; wait_if_type_mvar% ?m✝; dite✝ ?m✝ (fun h => ?pos✝) (fun h => ?neg✝)
                        [Elab.step] [0.014474s] expected type: Polynomial.rootMultiplicity t p - 1 ≤ Polynomial.rootMultiplicity t (↑Polynomial.derivative p), term
                            wait_if_type_mvar% ?m✝; dite✝ ?m✝ (fun h => ?pos✝) (fun h => ?neg✝)
                          [Elab.step] [0.014434s] expected type: Polynomial.rootMultiplicity t p - 1 ≤ Polynomial.rootMultiplicity t (↑Polynomial.derivative p), term
                              dite✝ ?m✝ (fun h => ?pos✝) (fun h => ?neg✝)
                            [Meta.synthInstance] [0.014015s] ✅ Decidable (Polynomial.IsRoot p t)

pp.oneline currently produces non-interactive traces only.

Finally, as the output is otherwise regular trace output, we can in fact use @hargoniX's FlameTC explorer (after one tiny generalization) to print nice-looking and even easier-to-explore flame graphs from it.
image

In the end, the output structure is not too different from what e.g. perf+hotspot can give us, but on a slightly more abstract, component-based level and importantly with more information about inputs and no danger of stacks getting cut down from too-deep recursion.

src/Lean/PrettyPrinter/Formatter.lean Outdated Show resolved Hide resolved
src/Lean/Util/Trace.lean Outdated Show resolved Hide resolved
@Kha
Copy link
Member Author

Kha commented Apr 4, 2023

!bench

@leanprover-bot
Copy link
Collaborator

Here are the benchmark results for commit d1ed048.
There were significant changes against commit 4048455:

  Benchmark   Metric             Change
  ===============================================
- liasolver   maxrss              61.8%  (16.5 σ)
- qsort       maxrss              61.8%  (16.5 σ)
- rbmap       maxrss              32.1%  (10.5 σ)
- stdlib      instructions         4.8% (613.2 σ)
- stdlib      tactic execution     8.3%  (90.5 σ)
- stdlib      task-clock           6.5%  (47.3 σ)
- stdlib      wall-clock           7.4%  (50.5 σ)

@Kha
Copy link
Member Author

Kha commented Apr 7, 2023

Oh, I think I found the reason the diff doesn't load anymore...
image

@Kha
Copy link
Member Author

Kha commented Apr 7, 2023

!bench

@leanprover-bot
Copy link
Collaborator

Here are the benchmark results for commit d10d5fe.
There were significant changes against commit 4048455:

  Benchmark   Metric             Change
  ===============================================
- stdlib      instructions         4.1% (615.3 σ)
- stdlib      tactic execution     3.2%  (76.4 σ)
- stdlib      task-clock           4.3%  (25.7 σ)

@Kha
Copy link
Member Author

Kha commented Apr 8, 2023

!bench

@leanprover-bot
Copy link
Collaborator

Here are the benchmark results for commit a0d1720.
There were no significant changes against commit 4048455.

@Kha
Copy link
Member Author

Kha commented Apr 8, 2023

Essentially no overhead anymore when not using the option, great. Since this change is purely additive (functionally) and not a critical component, I think we can merge this and then refine later on if needed.

@Kha Kha changed the title Structured trace profiler Add structured trace profiler Apr 8, 2023
@Kha
Copy link
Member Author

Kha commented Apr 8, 2023

Interactive view of Mathlib.Data.Polynomial.FieldDivision, using @hargoniX' hargoniX/Flame#1 (comment):
https://www.speedscope.app/#profileURL=https://pp.ipd.kit.edu/~ullrich/tmp/flame.gz

@Kha Kha merged commit a0b960b into leanprover:master Apr 10, 2023
@Kha Kha deleted the prof-trace branch April 10, 2023 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants