You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
MWE (I've used a connection to a private DB in this issue, so you'll have to construct your own DB in order to run locally):
using
BenchmarkTools,
JSON3,
LibPQ,
Tables
functionget_conn()
return<implement-connection>endfunctionget_data()
conn =get_conn()
result =execute(conn, "SELECT s.data::float8 FROM generate_series(0,1000000,0.1) AS s(data);"; binary_format=true)
data =columntable(result)
close(conn)
return data.data
endfunctionmy_getindex(jl_result::LibPQ.Result, row::Integer, col::Integer)
data_ptr = LibPQ.libpq_c.PQgetvalue(jl_result.result, row -1, col -1)
returnconvert(Float64, LibPQ.pqparse(Float64, data_ptr))
endfunctionget_data(c)
jl_result = c.result
returnmap(1:length(c)) do n
returnmy_getindex(jl_result, n, 1)
endend
lmp_column =get_data()
println("Current LibPQ performance")
display(@benchmarkcollect($lmp_column))
println()
println("Peformance with hack")
display(@benchmarkget_data($lmp_column))
println()
The above yields the following results:
Current LibPQ performance
BenchmarkTools.Trial: 1 sample with 1 evaluation.
Single result which took 103.319 s (4.50% GC) to evaluate,
with a memory estimate of 23.19 GiB, over 699999498 allocations.
Peformance with hack
BenchmarkTools.Trial: 5 samples with 1 evaluation.
Range (min … max): 954.222 ms … 1.141 s ┊ GC (min … max): 0.14% … 16.14%
Time (median): 998.131 ms ┊ GC (median): 4.66%
Time (mean ± σ): 1.034 s ± 89.276 ms ┊ GC (mean ± σ): 7.67% ± 7.77%
██ █ █ █
██▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁█ ▁
954 ms Histogram: frequency by time 1.14 s <
Memory estimate: 762.94 MiB, allocs estimate: 6.
Currently, LibPQ generates about 7 allocations per call to getindex (to which collect falls back). It certainly appears to be possible to elminiate essentially all allocations, as shown by the profiling results for get_data(c), which is a quick hack I used to avoid some LibPQ's internals. This change yields roughly a 100x performance improvement.
It looks the construction of a LibPQ.PQValue is at least partially responsible. Evidence:
One possible approach to solve this would be to switch from the "PQValue" typed wrapper to a typed column wrapper, which would only need to be constructed once per column (and in most cases, when parsing the whole column, the dynamic dispatch would only need to happen once as well). Another might be generating a function at runtime when constructing Result that removes the construction of PQValue and dynamic dispatch within itself.
We might also consider splitting the LibPQ Result into eager and lazy versions, where the eager performs parsing before returning and the lazy is just a very simple wrapper to PGresult and doesn't know anything about types or parsing.
Either way it sounds like we need to move away from using PQValue dispatch as the way we add new parsing methods.
I probably do not have time to implement any fixes but I am happy to facilitate anyone who does have time to learn and implement.
MWE (I've used a connection to a private DB in this issue, so you'll have to construct your own DB in order to run locally):
The above yields the following results:
Currently, LibPQ generates about 7 allocations per call to
getindex
(to whichcollect
falls back). It certainly appears to be possible to elminiate essentially all allocations, as shown by the profiling results forget_data(c)
, which is a quick hack I used to avoid some LibPQ's internals. This change yields roughly a 100x performance improvement.It looks the construction of a
LibPQ.PQValue
is at least partially responsible. Evidence:This doesn't look like it's quite enough to account for all of the allocations though, so presumably there's something else going on as well.
The text was updated successfully, but these errors were encountered: