Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce time to first from_flat_kwargs #2005

Merged
merged 12 commits into from
Apr 5, 2022
Merged

Conversation

rikhuijzer
Copy link
Collaborator

@rikhuijzer rikhuijzer commented Mar 25, 2022

🎶 Another second bites the dust. Another second bites the dust. And another one gone and another one gone. 🎶

This PR reduces the time to first from_flat_kwargs by a few seconds (120 MB allocations). In effect, this reduces the time to Pluto.run by a few seconds.

The reason that this PR allows the reduction is as follows. The function from_flat_kwargs used to use Configurations.jl to pass keyword arguments for different structs to them via metaprogramming. Specifically, the profiler showed that most time was spent in

https://github.com/Roger-luo/Configurations.jl/blob/5a23569281345564b082f7b0c09a26720ff68e76/src/from_dict.jl#L274-L336

Metaprogramming is all fun and games and very powerful, but terrible for the compiler. During the precompilation phase, the compiler will compile all kinds of things except when they are hidden behind a non-trivial eval on some expression. This makes sense because the compiler cannot know whether evaluating some random expression will cause unexpected side-effects. To avoid this metaprogramming problem, I've taken the boring road of handling the keyword arguments manually. This makes the compiler very happy, that is, it is now much easier to infer the types which makes it much easier to precompile everything. Hence, the reduction in time to first X.

Benchmark

main

julia> using Pluto

julia> @time @eval Pluto.Configuration.from_flat_kwargs();
  3.835535 seconds (3.34 M allocations: 180.272 MiB, 8.76% gc time, 99.90% compilation time)

this PR

julia> using Pluto

julia> @time @eval Pluto.Configuration.from_flat_kwargs();
  0.054751 seconds (47.96 k allocations: 2.680 MiB, 99.33% compilation time)

@github-actions
Copy link
Contributor

Try this Pull Request!

Open Julia and type:

julia> import Pkg
julia> Pkg.activate(temp=true)
julia> Pkg.add(url="https://github.com/fonsp/Pluto.jl", rev="rh/from-flat-kwargs-perf")
julia> using Pluto

@rikhuijzer rikhuijzer marked this pull request as ready for review March 25, 2022 16:05
@fonsp
Copy link
Owner

fonsp commented Mar 25, 2022

@fonsp
Copy link
Owner

fonsp commented Mar 25, 2022

But!! We currently use Configurations.jl for an awesome feature: the Pluto configuration becomes a subconfiguration of PlutoSliderServer: https://github.com/JuliaPluto/PlutoSliderServer.jl#2-plutodeploymenttoml

So if we take this PR, we need to find a way to still support that 😯

@fonsp
Copy link
Owner

fonsp commented Mar 25, 2022

What do you think about improving Configurations.jl itself? It feels like theoretically, the macro could return exactly the code that you wrote, which would give the same performance? Or do macros block precompilation?

@rikhuijzer
Copy link
Collaborator Author

What do you think about improving Configurations.jl itself?

I've looked into it, but based on my experience with Pluto.jl/src/analysis/ExpressionExplorer.jl, it's really hard to get inference to infer expressions. I should look more into that.

It feels like theoretically, the macro could return exactly the code that you wrote, which would give the same performance? Or do macros block precompilation?

You've spotted a flaw in my text in the first post. Compilation can go through macro's in some cases (Julia 1.7.2):

f() = 3

macro call_f()
    return :(f())
end
g() = @call_f()

function is_precompiled(f::Function)
    method = only(methods(f))
    return !isempty(method.specializations)
end

@show precompile(g, ())

@show is_precompiled(g)
@show is_precompiled(f)
precompile(g, ()) = true
is_precompiled(g) = true
is_precompiled(f) = true

Anyway, I'll think a bit about this how we can move forward. Will probably take some days.

@Roger-luo
Copy link
Contributor

The generated function was made to improve parsing speed. If the latency is the main concern we can consider bringing the old dynamic parser back.

@rikhuijzer
Copy link
Collaborator Author

rikhuijzer commented Mar 27, 2022

The generated function was made to improve parsing speed. If the latency is the main concern we can consider bringing the old dynamic parser back.

Hmm. Yes I did say that that takes the most time, but that wouldn't be a problem if Julia could infer everything because that generated thing would be compiled during the precompilation phase anyway. Upon further inspection, it turns out that type inference starts to propagate Any types pretty quickly. To see this, I did the following:

julia> using Cthulhu, Pluto

julia> @descend Pluto.Configuration.from_flat_kwargs(; port=8000)

and disabled optimization (press o) and enabled debuginfo (press d). Then, I stepped down into the methods and looked at the types. At Configurations.from_kwargs:

https://github.com/Roger-luo/Configurations.jl/blob/5a23569281345564b082f7b0c09a26720ff68e76/src/from_kwargs.jl#L12-L16

Configurations.from_dict is called with

Configurations.from_dict($(Expr(:static_parameter, 1)), d)

where d::OrderedCollections.OrderedDict{String, Any}. This could be a problem, but it seems still okay since isconcretetype(OrderedCollections.OrderedDict{String, Any}) and isconcretetype(Expr) both holds.

The Any types start at from_dict_specialize because there the d elements are called:

https://github.com/Roger-luo/Configurations.jl/blob/5a23569281345564b082f7b0c09a26720ff68e76/src/codegen.jl#L360-L368

Here, al lot of stuff is of type Any or have a union type with AbstractDict. My guess is that if inferability could be improved in that area that that would allow us to precompile much more. However, I don't think that I will go on that deep dive.

@Roger-luo
Copy link
Contributor

Roger-luo commented Mar 27, 2022

I think for Pluto case parsing speed is unlikely to be a concern, so what if we just do a dynamic version. Have you checked any old version of Configurations for your benchmark?


yeah Dict{String, Any} is mainly because from_dict parses things from a dict which is produced from a TOML/JSON parser. So more aggressively make use of the type information from the option type definition in the from_dict_specialize would be the way to speed up the parsing (but I'm unsure about the latency since I haven't looked at the interability much)

@rikhuijzer
Copy link
Collaborator Author

I think for Pluto case parsing speed is unlikely to be a concern, so what if we just do a dynamic version. Have you checked any old version of Configurations for your benchmark?

Yes. It saved only 20% whereas this PR saves 97% or so.

yeah Dict{String, Any} is mainly because from_dict parses things from a dict which is produced from a TOML/JSON parser. So more aggressively make use of the type information from the option type definition in the from_dict_specialize would be the way to speed up the parsing (but I'm unsure about the latency since I haven't looked at the interability much)

Well, like I said I don't have time unfortunately to dive into that. I'll see whether I can fix the problem at the PlutoSliderServer side and then see whether the Pluto developers are fine with merging this PR.

@rikhuijzer
Copy link
Collaborator Author

But!! We currently use Configurations.jl for an awesome feature: the Pluto configuration becomes a subconfiguration of PlutoSliderServer: https://github.com/JuliaPluto/PlutoSliderServer.jl#2-plutodeploymenttoml

So if we take this PR, we need to find a way to still support that 😯

I've checked it and the tests in https://github.com/JuliaPluto/PlutoSliderServer.jl/blob/f8a561785503582a2f089852b1352a7143de6fe9/test/configuration.jl#L1-L12 did indeed fail after my changes in this PR. I've now fixed that in e45b11c by putting @option back on the struct definition.

@fonsp
Copy link
Owner

fonsp commented Apr 4, 2022

Can we put extra macros/generated functions/magic into Configurations.jl to generate this code automatically? Or is this what you mean with the dynamic version?

@rikhuijzer
Copy link
Collaborator Author

Can we put extra macros/generated functions/magic into Configurations.jl to generate this code automatically? Or is this what you mean with the dynamic version?

What means this here (both times)?

@fonsp
Copy link
Owner

fonsp commented Apr 4, 2022

Yeah that was confusing :)

It sounds like we get the big performance gain from writing this function by hand:

function from_flat_kwargs(;
root_url::Union{Nothing,String} = ROOT_URL_DEFAULT,
host::String = HOST_DEFAULT,
port::Union{Nothing,Integer} = PORT_DEFAULT,
launch_browser::Bool = LAUNCH_BROWSER_DEFAULT,
dismiss_update_notification::Bool = DISMISS_UPDATE_NOTIFICATION_DEFAULT,
show_file_system::Bool = SHOW_FILE_SYSTEM_DEFAULT,
notebook_path_suggestion::String = notebook_path_suggestion(),
disable_writing_notebook_files::Bool = DISABLE_WRITING_NOTEBOOK_FILES_DEFAULT,
auto_reload_from_file::Bool = AUTO_RELOAD_FROM_FILE_DEFAULT,
auto_reload_from_file_cooldown::Real = AUTO_RELOAD_FROM_FILE_COOLDOWN_DEFAULT,
auto_reload_from_file_ignore_pkg::Bool = AUTO_RELOAD_FROM_FILE_IGNORE_PKG_DEFAULT,
notebook::Union{Nothing,String,Vector{<:String}} = NOTEBOOK_DEFAULT,
init_with_file_viewer::Bool = INIT_WITH_FILE_VIEWER_DEFAULT,
simulated_lag::Real = SIMULATED_LAG_DEFAULT,
injected_javascript_data_url::String = INJECTED_JAVASCRIPT_DATA_URL_DEFAULT,
on_event::Function = ON_EVENT_DEFAULT,
require_secret_for_open_links::Bool = REQUIRE_SECRET_FOR_OPEN_LINKS_DEFAULT,
require_secret_for_access::Bool = REQUIRE_SECRET_FOR_ACESS_DEFAULT,
run_notebook_on_load::Bool = RUN_NOTEBOOK_ON_LOAD_DEFAULT,
workspace_use_distributed::Bool = WORKSPACE_USE_DISTRIBUTED_DEFAULT,
lazy_workspace_creation::Bool = LAZY_WORKSPACE_CREATION_DEFAULT,
compile::Union{Nothing,String} = COMPILE_DEFAULT,
sysimage::Union{Nothing,String} = SYSIMAGE_DEFAULT,
banner::Union{Nothing,String} = BANNER_DEFAULT,
optimize::Union{Nothing,Int} = OPTIMIZE_DEFAULT,
math_mode::Union{Nothing,String} = MATH_MODE_DEFAULT,
startup_file::Union{Nothing,String} = STARTUP_FILE_DEFAULT,
history_file::Union{Nothing,String} = HISTORY_FILE_DEFAULT,
threads::Union{Nothing,String,Int} = default_number_of_threads()
)
server = ServerOptions(;
root_url,
host,
port,
launch_browser,
dismiss_update_notification,
show_file_system,
notebook_path_suggestion,
disable_writing_notebook_files,
auto_reload_from_file,
auto_reload_from_file_cooldown,
auto_reload_from_file_ignore_pkg,
notebook,
init_with_file_viewer,
simulated_lag,
injected_javascript_data_url,
on_event
)
security = SecurityOptions(;
require_secret_for_open_links,
require_secret_for_access
)
evaluation = EvaluationOptions(;
run_notebook_on_load,
workspace_use_distributed,
lazy_workspace_creation
)
compiler = CompilerOptions(;
compile,
sysimage,
banner,
optimize,
math_mode,
startup_file,
history_file,
threads
)
return Options(; server, security, evaluation, compiler)
end
instead of using Configurations.from_flat_kwargs. Can this be automated by Configurations.jl? In theory, a macro like @give_me_the_optimized_from_flat_kwargs_function(Options) should be able to generate it with the same result, right? (Obviously we want nicer API)

@Roger-luo
Copy link
Contributor

Yes that's definitely possible. The current from kwargs interface is not specialized to each type, but just reuse the from_dict mechanism. we can write a codegen to specialize it.

@fonsp
Copy link
Owner

fonsp commented Apr 4, 2022

@Roger-luo Hey roger! We discussed this PR on the call today, and because of the pretty large performance benefit, we were thinking of merging this PR now, and reverting it in the future, after this has been fixed in Configurations.jl.

@fonsp
Copy link
Owner

fonsp commented Apr 4, 2022

For me, Julia 1.7.0:

julia> @time @eval Pluto.Configuration.from_flat_kwargs();

# before:
  0.390142 seconds (1.40 M allocations: 76.890 MiB, 7.65% gc time, 98.63% compilation time)

# after:
  0.041981 seconds (47.97 k allocations: 2.660 MiB, 96.11% compilation time)

@Roger-luo
Copy link
Contributor

Yeah, I think you can merge this first, you can always specialize the Configurations API to bypass bad generated code. I probably won't have time to work on things recently since I'm busy with a few other things.

@rikhuijzer rikhuijzer merged commit 1aaa44b into main Apr 5, 2022
@rikhuijzer rikhuijzer deleted the rh/from-flat-kwargs-perf branch April 5, 2022 16:12
@rikhuijzer
Copy link
Collaborator Author

For future reference. Current state of allocations from the Julia 1.7 Ubuntu logs:

 ────────────────────────────────────────────────────────────────────────────
                                                  Time          Allocations  
                                            ───────────────   ───────────────
               Total measured:                    986s            17.3GiB    

 Section                            ncalls     time    %tot     alloc    %tot
 ────────────────────────────────────────────────────────────────────────────
 import Pluto                            1    746ms    0.1%   61.5MiB    0.4%
 compiletimes.jl                         1    48.4s    4.9%   3.62GiB   21.1%
   Pluto.Cell                            1   79.2ms    0.0%   6.60MiB    0.0%
   Pluto.Notebook                        1    1.37s    0.1%   95.2MiB    0.5%
   SessionActions.open                   1    15.0s    1.5%   1.02GiB    6.0%
   SessionActions.shutdown               1    1.28s    0.1%    119MiB    0.7%
   Configuration.from_flat_kwargs        1   42.1ms    0.0%   2.70MiB    0.0%
   Pluto.run                             1    7.59s    0.8%    775MiB    4.4%
 ...

So at least 3.62GiB still to go. This number doesn't include the allocations for some parts of Pluto such as the loader.

For Julia 1.8-beta3, the allocations are as follows (note: don't take running time too serious, GitHub Runners don't all have the same CPU):

 ────────────────────────────────────────────────────────────────────────────
                                                  Time          Allocations  
                                            ───────────────   ───────────────
               Total measured:                    855s            14.7GiB    

 Section                            ncalls     time    %tot     alloc    %tot
 ────────────────────────────────────────────────────────────────────────────
 import Pluto                            1    776ms    0.1%   63.2MiB    0.4%
 compiletimes.jl                         1    37.7s    4.4%   2.57GiB   17.5%
   Pluto.Cell                            1   51.1ms    0.0%   62.3KiB    0.0%
   Pluto.Notebook                        1    359ms    0.0%   7.87MiB    0.1%
   SessionActions.open                   1    15.6s    1.8%    738MiB    4.9%
   SessionActions.shutdown               1    875ms    0.1%   82.5MiB    0.5%
   Configuration.from_flat_kwargs        1   17.5ms    0.0%   13.5KiB    0.0%
   Pluto.run                             1    6.64s    0.8%    690MiB    4.6%
 ...

@fonsp fonsp added the backend Concerning the julia server and runtime label Apr 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Concerning the julia server and runtime performance PlutoSliderServer
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants