Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize maybe_macroexpand #1991

Merged
merged 2 commits into from Mar 17, 2022
Merged

Optimize maybe_macroexpand #1991

merged 2 commits into from Mar 17, 2022

Conversation

rikhuijzer
Copy link
Collaborator

@rikhuijzer rikhuijzer commented Mar 17, 2022

This PR optimizes and fixes inference for maybe_macroexpand and split_funcname. This reduces the time to first X (TTFX) and running time.

The reason that better type inference causes lower TTFX is because maybe_macroexpand is called somewhere in the SessionActions.open. Via the precompile that we've put on SessionActions.open, the compiler will use type inference to figure out what methods it needs to be compile. However, if the compiler cannot infer some type when stepping through all the function bodies, it cannot compile the called methods. For example, we can make a function g which calls f but make the object that is passed into f non-inferable via Base.inferencebarrier and call precompile on the outer function:

julia> f(x) = x;

julia> g(x) = f(Base.inferencebarrier(x));

julia> precompile(g, (Int,))
true

Now, we can see that Julia didn't compile any specializations:

julia> methods(f)[1].specializations
svec()

The compilation will happen just-in-time when we call g:

julia> g(1)
1

julia> methods(f)[1].specializations
svec(MethodInstance for f(::Int64), nothing, nothing, nothing, nothing, nothing, nothing, nothing)

On the other hand, without the inferencebarrier, precompilation hits the inner and outer function straight away:

julia> a(x) = x;

julia> b(x) = a(x);

julia> precompile(b, (Int,))
true

julia> methods(a)[1].specializations
svec(MethodInstance for a(::Int64), nothing, nothing, nothing, nothing, nothing, nothing, nothing)

Another thing that saved surprisingly much compilation time is getting rid of broadcasting. It sounds a bit annoying and I love broadcasting too, but in some cases is it worth it if it means extra compilation time for every person that is running Pluto? See also Chris saying "I'm the compiler now!" (SciML/OrdinaryDiffEq.jl#1465). I don't think we have to get rid of all broadcasts, but mainly on places where type inference is falling back to Any and gives up precompiling. These cases can be quite easily found via JET.jl and @report_opt annotate_types=true SessionActions.open. Just search for failed to optimize in the output.

Anyway, time for the benchmarks:

maybe_macroexpand

Branch main

$ julia --project -ie 'using Pluto, BenchmarkTools'

julia> @time @eval Pluto.ExpressionExplorer.maybe_macroexpand(:(@time 1))
  0.375900 seconds (891.63 k allocations: 48.112 MiB, 21.58% gc time, 99.68% compilation time)

julia> @benchmark Pluto.ExpressionExplorer.maybe_macroexpand(:(@time 1))
BenchmarkTools.Trial: 10000 samples with 180 evaluations.
 Range (min  max):  588.978 ns   16.822 μs  ┊ GC (min  max): 0.00%  89.32%
 Time  (median):     628.444 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   782.516 ns ± 671.607 ns  ┊ GC (mean ± σ):  3.75% ±  4.45%

  ▇█▅▄▃▃▃▂▂▃▄▃▂▂▂▂▂▃▂▂▂▁▁ ▁▁                                    ▂
  ███████████████████████████████▇████▇█▇█▇▆▆▆▇█▇▇▆▅▆▆▅▆▅▆▆▆▅▄▄ █
  589 ns        Histogram: log(frequency) by time       1.78 μs <

 Memory estimate: 608 bytes, allocs estimate: 13.
────────────────────────────────────────────────────────────────────────────
                                                 Time          Allocations
                                           ───────────────   ───────────────
              Total measured:                   51.1s            3.81GiB

Section                            ncalls     time    %tot     alloc    %tot
────────────────────────────────────────────────────────────────────────────
import Pluto                            1    570ms    1.3%   66.2MiB    1.8%
compiletimes.jl                         1    41.8s   98.7%   3.52GiB   98.2%
  Pluto.Cell                            1   28.2ms    0.1%   3.82MiB    0.1%
  Pluto.Notebook                        1    1.33s    3.1%    164MiB    4.5%
  SessionActions.open                   1    12.8s   30.3%   1.00GiB   27.7%
  SessionActions.shutdown               1    675ms    1.6%   71.4MiB    1.9%
  Configuration.from_flat_kwargs        1    2.38s    5.6%    192MiB    5.2%
  Pluto.run                             1    6.79s   16.1%    759MiB   20.7%
────────────────────────────────────────────────────────────────────────────

Branch rh/maybe_macroexpand (this PR)

$ julia --project -ie 'using Pluto, BenchmarkTools'

julia> @time @eval Pluto.ExpressionExplorer.maybe_macroexpand(:(@time 1));
  0.101043 seconds (87.07 k allocations: 4.655 MiB, 99.93% compilation time)

julia> @benchmark Pluto.ExpressionExplorer.maybe_macroexpand(:(@time 1))
BenchmarkTools.Trial: 10000 samples with 199 evaluations.
 Range (min  max):  419.814 ns   16.994 μs  ┊ GC (min  max): 0.00%  94.32%
 Time  (median):     448.852 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   572.488 ns ± 691.028 ns  ┊ GC (mean ± σ):  5.41% ±  4.49%

  ▇█▆▄▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁                                   ▂
  ███████████████████████████████▇▇██▇▇█▇▇▇▇▇▅▇▆▇▇▇▆▆▆▆▆▅▆▆▅▅▅▅ █
  420 ns        Histogram: log(frequency) by time        1.3 μs <

 Memory estimate: 512 bytes, allocs estimate: 11.
────────────────────────────────────────────────────────────────────────────
                                                 Time          Allocations
                                           ───────────────   ───────────────
              Total measured:                   49.6s            3.73GiB

Section                            ncalls     time    %tot     alloc    %tot
────────────────────────────────────────────────────────────────────────────
import Pluto                            1    549ms    1.3%   64.7MiB    1.8%
compiletimes.jl                         1    40.3s   98.7%   3.45GiB   98.2%
  Pluto.Cell                            1   29.8ms    0.1%   3.82MiB    0.1%
  Pluto.Notebook                        1    1.33s    3.3%    164MiB    4.6%
  SessionActions.open                   1    11.9s   29.0%    950MiB   26.4%
  SessionActions.shutdown               1    647ms    1.6%   71.5MiB    2.0%
  Configuration.from_flat_kwargs        1    2.36s    5.8%    192MiB    5.3%
  Pluto.run                             1    7.05s   17.3%    759MiB   21.1%
────────────────────────────────────────────────────────────────────────────

So, that's about 50 MiB reduction in allocations on SessionActions.open.

cc: @ghaetinger

@github-actions
Copy link
Contributor

Try this Pull Request!

Open Julia and type:

julia> import Pkg
julia> Pkg.activate(temp=true)
julia> Pkg.add(url="https://github.com/fonsp/Pluto.jl", rev="rh/maybe_macroexpand")
julia> using Pluto

@@ -988,9 +992,15 @@ function maybe_macroexpand(ex::Expr; recursive = false, expand_bind = true)
end

if recursive && (result isa Expr)
Expr(result.head, maybe_macroexpand.(result.args; recursive = recursive, expand_bind = expand_bind)...)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting rid of broadcasting is totally fine if it speeds stuff up, go for loops!
BUT, did you see how it performed with map(result.args) do x maybe_macroexpand(x, recursive=recursive, expand_bind=expand_bind) end? Just curious if that takes similar compilation-related performance hits

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious if that takes similar compilation-related performance hits

Noice. Well spotted. map is fine too.

Benchmarks below. All after restarting Julia with julia --project -ie 'using Pluto'.

with broadcasting (original)

julia> @time @eval Pluto.ExpressionExplorer.maybe_macroexpand(:(@time 1));
  0.306775 seconds (894.16 k allocations: 48.205 MiB, 22.07% gc time, 99.59% compilation time)

I think that this is so bad because the types are unknown and multiple types show up in various calls in the recursion which need to be compiled just-in-time. Also, broadcasting takes long to compile. See @code_typed identity.([1, 2]) for more information.

with map

julia> @time @eval Pluto.ExpressionExplorer.maybe_macroexpand(:(@time 1));
  0.079647 seconds (86.74 k allocations: 4.639 MiB, 101.12% compilation time)

with a loop

julia> @time @eval Pluto.ExpressionExplorer.maybe_macroexpand(:(@time 1));
  0.090846 seconds (87.07 k allocations: 4.655 MiB, 99.57% compilation time)

@dralletje dralletje merged commit cba6a50 into main Mar 17, 2022
@rikhuijzer rikhuijzer deleted the rh/maybe_macroexpand branch March 25, 2022 15:26
@fonsp fonsp added the backend Concerning the julia server and runtime label Apr 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Concerning the julia server and runtime performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants