Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

collect on LibPQ.Column allocates a lot #250

Closed
willtebbutt opened this issue Jul 26, 2022 · 1 comment
Closed

collect on LibPQ.Column allocates a lot #250

willtebbutt opened this issue Jul 26, 2022 · 1 comment

Comments

@willtebbutt
Copy link
Contributor

willtebbutt commented Jul 26, 2022

MWE (I've used a connection to a private DB in this issue, so you'll have to construct your own DB in order to run locally):

using
    BenchmarkTools,
    JSON3,
    LibPQ,
    Tables

function get_conn()
    return <implement-connection>
end

function get_data()
    conn = get_conn()
    result = execute(conn, "SELECT s.data::float8 FROM generate_series(0,1000000,0.1) AS s(data);"; binary_format=true)
    data = columntable(result)
    close(conn)
    return data.data
end

function my_getindex(jl_result::LibPQ.Result, row::Integer, col::Integer)
    data_ptr = LibPQ.libpq_c.PQgetvalue(jl_result.result, row - 1, col - 1)
    return convert(Float64, LibPQ.pqparse(Float64, data_ptr))
end

function get_data(c)
    jl_result = c.result
    return map(1:length(c)) do n
        return my_getindex(jl_result, n, 1)
    end
end

lmp_column = get_data()
println("Current LibPQ performance")
display(@benchmark collect($lmp_column))
println()

println("Peformance with hack")
display(@benchmark get_data($lmp_column))
println()

The above yields the following results:

Current LibPQ performance
BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 103.319 s (4.50% GC) to evaluate,
 with a memory estimate of 23.19 GiB, over 699999498 allocations.

Peformance with hack
BenchmarkTools.Trial: 5 samples with 1 evaluation.
 Range (min … max):  954.222 ms …   1.141 s  ┊ GC (min … max): 0.14% … 16.14%
 Time  (median):     998.131 ms              ┊ GC (median):    4.66%
 Time  (mean ± σ):      1.034 s ± 89.276 ms  ┊ GC (mean ± σ):  7.67% ±  7.77%

  ██            █                                     █      █  
  ██▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁█ ▁
  954 ms          Histogram: frequency by time          1.14 s <

 Memory estimate: 762.94 MiB, allocs estimate: 6.

Currently, LibPQ generates about 7 allocations per call to getindex (to which collect falls back). It certainly appears to be possible to elminiate essentially all allocations, as shown by the profiling results for get_data(c), which is a quick hack I used to avoid some LibPQ's internals. This change yields roughly a 100x performance improvement.

It looks the construction of a LibPQ.PQValue is at least partially responsible. Evidence:

julia> @benchmark LibPQ.PQValue{$lmp_column.oid}($lmp_column.result, 1, 1)

# BenchmarkTools.Trial: 10000 samples with 214 evaluations.
#  Range (min … max):  347.556 ns …  39.722 μs  ┊ GC (min … max): 0.00% … 98.59%
#  Time  (median):     358.355 ns               ┊ GC (median):    0.00%
#  Time  (mean ± σ):   393.463 ns ± 935.449 ns  ┊ GC (mean ± σ):  6.27% ±  2.61%

#   ▄██▇▆▅▄▄▅▇▇▅▃▂▃▃▂▁▁ ▁▁▁▂▂▂▁▁▁▂▁▁▁                    ▁▂       ▃
#   █████████████████████████████████▇██▇▆▆▃▆▅▅▄▁▃▄▄▄▅▄▆████▆▅▇▇█ █
#   348 ns        Histogram: log(frequency) by time        490 ns <

#  Memory estimate: 176 bytes, allocs estimate: 4.

This doesn't look like it's quite enough to account for all of the allocations though, so presumably there's something else going on as well.

@iamed2
Copy link
Collaborator

iamed2 commented Jul 26, 2022

One possible approach to solve this would be to switch from the "PQValue" typed wrapper to a typed column wrapper, which would only need to be constructed once per column (and in most cases, when parsing the whole column, the dynamic dispatch would only need to happen once as well). Another might be generating a function at runtime when constructing Result that removes the construction of PQValue and dynamic dispatch within itself.

We might also consider splitting the LibPQ Result into eager and lazy versions, where the eager performs parsing before returning and the lazy is just a very simple wrapper to PGresult and doesn't know anything about types or parsing.

Either way it sounds like we need to move away from using PQValue dispatch as the way we add new parsing methods.

I probably do not have time to implement any fixes but I am happy to facilitate anyone who does have time to learn and implement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants