-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance with less specialization #160
Conversation
Amazing stuff, I haven't gone through the code in great detail but avoiding specialization seems a clear winner.
I'm not entirely clear from my scan of the PR at what level this invalidation happens - is this reference to extended methods from another module? Do you mean that the addition of a package to an environment invalidate caches for all packages?
This was a blunt force attempt to ensure that relevant objects are attached to the module A
Base.SomeNonExportedFunctionofBase() =1
end Similarly, we want I appreciate that none of this is covered in the tests... |
Right now, it doesn't. What I meant to say (but didn't clearly spell out) was that if the LanguageServer has some persistence, and re-runs if the user adds packages to the environment, then to be safe the design here will require that you first empty the caches and start from scratch. But I guess you kind of do that now anyway now, because you presumably call
Ah, interesting! I still suspect that there may be a more efficient way to do it; for example, we could go through all the method tables and then parcel the methods, once, into their separate modules-of-definition. I recently wrote a package MethodAnalysis.jl that covers some of the same "traversal" territory of this package, so it can be used to check the timing of that kind of operation. Using the same set of packages I used above, julia> using Images, Profile, ProfileView # Revise was also loaded
Gtk-Message: 16:13:15.647: Failed to load module "canberra-gtk-module"
Gtk-Message: 16:13:15.647: Failed to load module "canberra-gtk-module"
julia> using MethodAnalysis
julia> function parcel_methods()
methdict = Dict{Module,Vector{Method}}()
covered = Base.IdSet{Any}()
visit() do item
if isa(item, Function) || isa(item, Type)
item ∈ covered && return false # don't descend any further
push!(covered, item)
return true # descend into the methods of this function/type
elseif isa(item, Method)
list = get!(Vector{Method}, methdict, item.module)
push!(list, item)
return false # don't descend into MethodInstances etc.
else
return true # descend by default (e.g., into modules)
end
end
return methdict
end
parcel_methods (generic function with 1 method)
julia> precompile(parcel_methods, ())
true
julia> @time methdict = parcel_methods()
WARNING: BaseTraits.IsFastLinearIndex is deprecated, use IsIndexLinear instead.
likely near REPL[5]:1
⋮ # ...many more...
9.769300 seconds (12.39 M allocations: 637.746 MiB, 1.23% gc time)
Dict{Module,Array{Method,1}} with 255 entries:
Xorg_libXdmcp_jll => Method[__init__() in Xorg_libXdmcp_jll at /home/tim/.julia/packages/Xorg_libXdmcp_jll/A64dc/src/wrappers/x86_64-linux-gnu.jl:27, eval(x) in Xorg_libXdmcp_jll at /home/ti…
Xorg_xkeyboard_config_jll => Method[__init__() in Xorg_xkeyboard_config_jll at /home/tim/.julia/packages/Xorg_xkeyboard_config_jll/G33Bc/src/wrappers/x86_64-linux-gnu.jl:12, eval(x) in Xorg_xkeyboar…
AbstractFFTs => Method[mul!(y::AbstractArray, p::AbstractFFTs.ScaledPlan, x::AbstractArray) in AbstractFFTs at /home/tim/.julia/packages/AbstractFFTs/mhQvY/src/definitions.jl:269, getin…
Base.Ryu => Method[append_c_digits(count, digits, buf, pos) in Base.Ryu at ryu/utils.jl:258, append_d_digits(olength, digits, buf, pos, decchar) in Base.Ryu at ryu/utils.jl:227, app…
⋮ # ...many more... That 9.8s is a pretty big chunk of the 11.3s I quote above. However, 9.8s + 2.2s (the amount of time needed once all the caches are loaded) is just 12s, so I think this makes sense. EDIT: oops, actually julia> function parcel_methods()
methdict = Dict{Module,Vector{Method}}()
visit() do item
if isa(item, Method)
list = get!(Vector{Method}, methdict, item.module)
push!(list, item)
return false # don't descend into MethodInstances etc.
else
return true
end
end
return methdict
end is fine, as MethodAnalysis already keeps track of which functions have been visited. It's no faster, because the total time is dominated by |
Fixing that might make it easier to try out new implementations with confidence. I'm on my third rewrite of Revise (timholy/Revise.jl#497), and the main reason I can do it with reasonable confidence is that its test suite is about 3700 loc. And my main worry is that I still don't have adequate coverage... Anyway, I know that writing a big test suite is a pain, but for Revise I've gotten to the point where I basically never fix a bug without first contributing a test that exposes the bug. Sometimes the test takes 3x as long to write as fixing the bug, but my sense is that it has on balance been the right decision. |
Absolutely, this was the motivation behind constructing the module tree prior to caching the symbols. Tests are eternally on my todo list but this may spur me on! Anyway, I'll have a look at the BuiltIn issue |
I think the BuiltIns issue is due to Core.bitcast === Core.Intrinsics.bitcast
parentmodule(Core.bitcast) === Core
parentmodule(Core.Intrinsics.bitcast) === Core
(:bitcast in names(Core, all = true)) === false
(:bitcast in names(Core.Intrinsics, all = true)) === true |
Actually, now that I think about it a bit more, what would you think about using MethodAnalysis in this package? First, one can trivially show that the simple implementation above gets even the non-exported names: julia> module A
foo(x) = 1
end
Main.A
julia> module B
struct Foo end
Main.A.foo(::Foo) = 2
end
Main.B
julia> methdict = parcel_methods()
⋮
julia> methdict[Main.A]
[1] eval(x) in Main.A at REPL[3]:1
[2] foo(x) in Main.A at REPL[3]:2
[3] include(x) in Main.A at REPL[3]:1
julia> methdict[Main.B]
[1] foo(::Main.B.Foo) in Main.B at REPL[4]:3
[2] Main.B.Foo() in Main.B at REPL[4]:2
[3] eval(x) in Main.B at REPL[4]:1
[4] include(x) in Main.B at REPL[4]:1 More importantly, could you use its traversal of julia> methods(findnext)
# 14 methods for generic function "findnext":
[1] findnext(pat::Base.RegexAndMatchData, str, i) in Base at regex.jl:460
[2] findnext(re::Regex, str::Union{String, SubString}, idx::Integer) in Base at regex.jl:299
[3] findnext(r::Regex, s::AbstractString, idx::Integer) in Base at regex.jl:324
[4] findnext(pred::Base.Fix2{var"#s92",Bool} where var"#s92"<:Union{typeof(==), typeof(isequal)}, B::BitArray, start::Integer) in Base at bitarray.jl:1444
[5] findnext(testf::Function, B::BitArray, start::Integer) in Base at bitarray.jl:1454
[6] findnext(pred::Base.Fix2{var"#s92",var"#s91"} where var"#s91"<:AbstractChar where var"#s92"<:Union{typeof(==), typeof(isequal)}, s::String, i::Integer) in Base at strings/search.jl:5
[7] findnext(pred::Base.Fix2{var"#s92",var"#s91"} where var"#s91"<:Union{Int8, UInt8} where var"#s92"<:Union{typeof(==), typeof(isequal)}, a::Union{Array{Int8,1}, Array{UInt8,1}}, i::Integer) in Base at strings/search.jl:25
[8] findnext(testf::Function, s::AbstractString, i::Integer) in Base at strings/search.jl:127
[9] findnext(f::Function, v::SparseArrays.AbstractSparseArray, i) in SparseArrays at /home/tim/src/julia-master/usr/share/julia/stdlib/v1.6/SparseArrays/src/abstractsparse.jl:79
[10] findnext(testf::Function, A, start) in Base at array.jl:1776
[11] findnext(B::BitArray, start::Integer) in Base at bitarray.jl:1402
[12] findnext(t::AbstractString, s::AbstractString, i::Integer) in Base at strings/search.jl:277
[13] findnext(ch::AbstractChar, string::AbstractString, ind::Integer) in Base at strings/search.jl:296
[14] findnext(A, start) in Base at array.jl:1694
julia> m = methods(findnext).ms[10]
findnext(testf::Function, A, start) in Base at array.jl:1776
julia> using MethodAnalysis
julia> mis = Core.MethodInstance[]
Core.MethodInstance[]
julia> visit(m) do item
isa(item, Core.MethodInstance) && (push!(mis, item); return false)
return true
end
julia> mis
15-element Array{Core.MethodInstance,1}:
MethodInstance for findnext(::Function, ::Array{Any,1}, ::Int64)
MethodInstance for findnext(::Base.Fix2{typeof(==),UnionAll}, ::Array{Any,1}, ::Int64)
MethodInstance for findnext(::Base.var"#62#63"{typeof(isempty)}, ::Array{Pkg.Types.VersionRange,1}, ::Int64)
MethodInstance for findnext(::Base.var"#408#410", ::Array{Any,1}, ::Int64)
MethodInstance for findnext(::Sockets.var"#1#2", ::Array{Sockets.IPv4,1}, ::Int64)
MethodInstance for findnext(::Pkg.Operations.var"#9#10"{Base.UUID}, ::Array{Pkg.Types.PackageSpec,1}, ::Int64)
MethodInstance for findnext(::Pkg.Operations.var"#6#7"{Base.UUID}, ::Array{Pkg.Types.PackageSpec,1}, ::Int64)
MethodInstance for findnext(::Pkg.Operations.var"#23#24"{Base.UUID}, ::Array{Pkg.Types.PackageSpec,1}, ::Int64)
MethodInstance for findnext(::Pkg.Operations.var"#29#34"{Base.UUID}, ::Array{Pkg.Types.PackageSpec,1}, ::Int64)
MethodInstance for findnext(::typeof(REPL.LineEdit._notspace), ::Array{UInt8,1}, ::Int64)
MethodInstance for findnext(::Pkg.Operations.var"#113#119"{Nothing}, ::Array{Pkg.Types.PackageSpec,1}, ::Int64)
MethodInstance for findnext(::Pkg.Operations.var"#113#119"{Base.UUID}, ::Array{Pkg.Types.PackageSpec,1}, ::Int64)
MethodInstance for findnext(::Base.StackTraces.var"#1#2", ::Array{Base.StackTraces.StackFrame,1}, ::Int64)
MethodInstance for findnext(::Base.Fix2{typeof(isequal),String}, ::Array{String,1}, ::Int64)
MethodInstance for findnext(::Pkg.Operations.var"#113#119"{_A} where _A, ::Array{Pkg.Types.PackageSpec,1}, ::Int64) This is a list of all the cases in which this method has already been called & compiled. For each of these, the inferred code is trivially (and very quickly) available: julia> mi = mis[2]
MethodInstance for findnext(::Base.Fix2{typeof(==),UnionAll}, ::Array{Any,1}, ::Int64)
julia> @time src = Base._uncompressed_ir(mi.def, mi.cache.inferred)
0.000048 seconds (134 allocations: 7.953 KiB)
CodeInfo(
@ array.jl:1777 within `findnext'
┌ @ abstractarray.jl:102 within `keys'
│┌ @ indices.jl:453 within `LinearIndices'
││┌ @ abstractarray.jl:75 within `axes'
│││┌ @ array.jl:155 within `size'
1 ─││││ Base.arraysize(A, 1)::Int64
│ └└└└
│ @ array.jl:1778 within `findnext'
│ ┌ @ abstractarray.jl:102 within `keys'
│ │┌ @ indices.jl:453 within `LinearIndices'
│ ││┌ @ abstractarray.jl:75 within `axes'
│ │││┌ @ array.jl:155 within `size'
│ ││││ %2 = Base.arraysize(A, 1)::Int64
│ │││└
│ │││┌ @ tuple.jl:157 within `map'
│ ││││┌ @ range.jl:326 within `OneTo' @ range.jl:317
│ │││││┌ @ promotion.jl:409 within `max'
│ ││││││┌ @ int.jl:83 within `<'
│ │││││││ %3 = Base.slt_int(%2, 0)::Bool
│ ││││││└
│ ││││││ %4 = Base.ifelse(%3, 0, %2)::Int64
│ └└└└└└
│ ┌ @ indices.jl:489 within `last'
│ │┌ @ abstractarray.jl:95 within `axes1'
│ ││┌ @ abstractarray.jl:75 within `axes'
│ │││┌ @ tuple.jl:157 within `map'
│ ││││┌ @ range.jl:326 within `OneTo' @ range.jl:317
│ │││││┌ @ promotion.jl:409 within `max'
│ ││││││┌ @ int.jl:83 within `<'
│ │││││││ %5 = Base.slt_int(%4, 0)::Bool
│ ││││││└
│ ││││││ %6 = Base.ifelse(%5, 0, %4)::Int64
│ └└└└└└
│ @ array.jl:1779 within `findnext'
│ ┌ @ operators.jl:303 within `>'
│ │┌ @ int.jl:83 within `<'
│ ││ %7 = Base.slt_int(%6, start)::Bool
│ └└
└── goto #3 if not %7
2 ─ return Base.nothing
@ array.jl:1780 within `findnext'
3 ┄ %10 = φ (#1 => _4, #8 => %19)::Int64
└── goto #9 if not true
@ array.jl:1781 within `findnext'
┌ @ array.jl:811 within `getindex'
4 ─│ %12 = Base.arrayref(true, A, %10)::Any
│ └
│ %13 = (testf)(%12)::Any
└── goto #6 if not %13
5 ─ return %10
@ array.jl:1782 within `findnext'
┌ @ promotion.jl:398 within `=='
6 ─│ %16 = (%10 === %6)::Bool
│ └
└── goto #8 if not %16
7 ─ goto #9
@ array.jl:1784 within `findnext'
┌ @ abstractarray.jl:150 within `nextind'
│┌ @ int.jl:87 within `+'
8 ─││ %19 = Base.add_int(%10, 1)::Int64
│ └└
└── goto #3
@ array.jl:1786 within `findnext'
9 ┄ return Base.nothing
) It seems like it should be possible to use this to get hints about which specific methods of a callee get used in practice. Perhaps by comparing against multiple compiled instances you could get a prioritized list of candidates? Or you could have a UI tool that lets you select which of these MethodInstances you want to use for finding the correct callee? Even if you can solve the analysis problem, there will be some pain points. For example, the implementation of the above method is function findnext(testf::Function, A, start)
i = oftype(first(keys(A)), start)
l = last(keys(A))
i > l && return nothing
while true
testf(A[i]) && return i
i == l && break
# nextind(A, l) can throw/overflow
i = nextind(A, i)
end
return nothing
end and of the calls that appear in the source, you'll see the inferred code above links to So another alternative might be to find them via their backedges. That's pretty quick too: julia> findcallee(callee, mi) = Dict(reverse.(direct_backedges(callee)))[mi]
findcallee (generic function with 1 method)
julia> using BenchmarkTools
julia> @btime findcallee($first, $mi)
830.165 μs (8092 allocations: 419.84 KiB)
MethodInstance for first(::LinearIndices{1,Tuple{Base.OneTo{Int64}}}) All you need for this are a list of functions that get called by the method, and a specific MethodInstance to resolve the dispatch. Actually, come to think of it you can do better than creating a dict and then extracting a single element: julia> function findcallee2(callee, mi)
local meth
found = false
visit(callee) do item
if isa(item, Method) && !found
meth = item
visit(meth) do methmi
if isa(methmi, Core.MethodInstance)
visit_backedges(methmi) do edge
edge === mi && (found = true)
edge === methmi # follow back only one caller
end
false
else
true
end
end
end
item === callee
end
found ? meth : nothing
end
findcallee2 (generic function with 1 method)
julia> @btime findcallee2($first, $mi)
253.462 μs (1595 allocations: 198.83 KiB)
first(iter::LinearIndices{1,R} where R<:Tuple{AbstractUnitRange{Int64}}) in Base at indices.jl:487 |
sorry for jump in,
Does this package really run user package code (in some julia session) ? I thought everything that LS tries to do is to collect information from static parsing, and so any kind of invalidation (, that is, a method body loaded into a running session gets staled when some methods in its backedges get updated) doesn't happen, no ?
So I guess |
We don't really have access to methods in this way. I'm not sure how much background you have on the general approach we take so I'll try to summarise it briefly. The key point to make is that we don't actually compile any user code (as it appears in workspace documents within the editor). Everything that is 'live' in this way is represented as CST with annotations for bindings, scopes and the references of symbols. This makes no use of the Julia lowerer/compiler, etc. (A future aim is to refactor the CST structure so as to make it more directly substitutable for Julia's AST allowing us to use some subset of those Julia internals). The SymbolServer cache then acts as the editor sessions snapshot view of the outside world - all other code within the active environment that is not 'live'. Continuing with the example above, in terms of what we could do to narrow down which particular method is being called in any given instance we're far more restricted. We have a pseudo inference pass that runs on user level bindings but nothing at the arbitrary expression level. Does that make sense? |
Yes, to create the caches the packages are loaded. This in done in a separate process to maintain that separation from the editor process |
It's basically the same approach as the DocSeeker.jl cache, except that it pulls in more than just docstrings, @aviatesk. |
So one complication is that we don't really want to load any packages in the symbol server process for ourselves, because as soon as we do we restrict in some way what packages/version combos a user might use. The general philosophy here is that only packages from the users environment get loaded in the symbol server process. |
That makes sense. For people running in vscode or Atom's terminals, you could have a small server that could send serializations of MethodInstances, but I can understand why that might not be entirely desirable.
I'll finish this off sometime without MethodAnalysis. Next couple of days are pretty busy, but I should get to it by the end of the week. Feel free to run with it yourself, of course. |
I think that before the caching part of this can be addressed, we should discuss #161. So for now I've dropped that from this PR and submitted just the
Interesting that it seems to be a bit slower on the second run. But I'm not sure that's important in practice? I also threw in more whitespace changes and some tweaks to the manual. Let me know if you'd prefer those should be in a separate PR. |
We are now using the auto formatting github actions, but we only turned it on after you opened this PR. So going forward, whatever whitespacing we do should hopefully be automatically made uniform by the bot :) I'm going to hold this PR for a few days because we are in release phase right now and need to keep I think improving these caching times is great, but just be aware that we have plans to move the whole caching into the cloud, at which point the indexing performance will matter a lot less for the end-user experience :) |
Totally understood. But to make sure it's clear, now the only thing this PR does is add
That will be great! I presume it will still useful to have good performance for |
Absolutely |
Sorry, I should have said that we are keeping |
Let's merge this PR for now ? The conflicts needs to be resolved though. |
Yes, now would be a good time to merge. Anytime before 7/1, really :) |
This marks key function arguments with atsign-nospecialize annotations, to prevent them from being compiled for many different types. Since this package has to use `isa` logic anyway, there is very little advantage in specialization and a big win in terms of compile time for avoiding it.
Should be ready now. |
Timely, I've just made a PR with the earlier aspects of this PR. Let's merge this first if possible then I'll fix resulting conflicts over there. |
Thanks. I can't merge, but perhaps someone can do the honors. |
This is "WIP" because it currently fails two tests (just for a handful ofCore.Builltin
and/orCore.Intrinsic
functions, I think). I'd appreciate some help figuring out how to fix them.This takes two main strategies, control over specialization and greater caching, to improve package performance.Several functions have to operate on arguments that have a huge number of different types: any function that takes a function or a type as an argument might, in principle, be specialized, but then used only once or a small handful of times. This marks such arguments with atsign-nospecialize annotations, to prevent them from being recompiled many times over. Since this package has to use
isa
logic anyway, there is very little advantage in specialization and a big win in terms of compile time for avoiding it.EDIT: the text below and much of the discussion below is no longer relevant to this PR (those issues will be resolved in a separate PR)
The other major strategy relies on caching. Here there are negatives, because we have to invalidate the cache if the LanguageServer has to process newly-added packages; I'm not sure if that ever happens, and if so where that invalidation should be done.
Of course, the advantage is performance. To major bottlenecks were:
Base._methods
: the issue is that two modules might define methods for the samef
(e.g.,Base.convert
might be extended by multiple packages). However,Base._methods
looks up methods across the whole session; we might as well save the result.isdefined(m, n)
wheren
came fromallnames
. This was wasteful since it was iterating over all names defined throughout the entire session rather than just the names defined in a specific module.With all that said, let's get to the numbers. Here was my test:
And here's the timing results:
As you can see, the gains are pretty decent. I did the timing on top of #159, but I don't think that really changes this test very much.