Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large select queries running 5-10x slower than with psql #259

Closed
robsmith11 opened this issue Oct 26, 2022 · 7 comments
Closed

Large select queries running 5-10x slower than with psql #259

robsmith11 opened this issue Oct 26, 2022 · 7 comments

Comments

@robsmith11
Copy link

robsmith11 commented Oct 26, 2022

I realize this report is vague, but I'm curious whether this performance is to be expected with LibPQ.jl and whether there's any low hanging fruit that get the performance closer to that of psql.

I'm running a simple select query that returns 10-100 million records with 3 non-null Float64 values. When I run the query over a not-so-fast wifi connection, psql consistently returns the results 10x faster than with LibPQ.jl (CPU usage is low). When I run the query over a 10 Gbit LAN connection, psql is consistently 5x faster (CPU usage is maxed out). binary_format=true and not_null=true didn't make a significant difference.

@robsmith11
Copy link
Author

As a more fair comparison, I ran the same query with python's psycopg2 execute() and fetchall() and the timings are consistent with psql (5-10x faster than LibPQ.jl).

@iamed2
Copy link
Collaborator

iamed2 commented Oct 26, 2022

Are you on the latest version of LibPQ.jl? 1.14 introduced some significant performance improvements. Are you measuring parse time or just transit time (i.e. time for execute to finish).

@robsmith11
Copy link
Author

[194296ae] LibPQ v1.14.1 on a recent nightly.
I was measuring both the transit time and time to convert to Tables.rowtable, which I thought would be a lot faster, but if is split to two operations the time breakdown is 70/30 over the slower connection. So it looks like both transferring and parsing are each slower with LibPQ.jl than the total time with psycopg2.

@iamed2
Copy link
Collaborator

iamed2 commented Oct 26, 2022

Tables.columntable should be significantly faster. Parsing entire columns at once will be a type-stable operation, and we explicitly test for performance on that operation since 1.14.

On the transit time, it would be helpful for replication to have a sense of the schema being pulled. Can this be replicated by pulling 10 million rows of a single double precision column containing 3 unique non-null values?

Are your connections using SSL?

@robsmith11
Copy link
Author

Ah you're right, after making a minimal example locally on my laptop and testing more carefully, nearly all the time is taken up by Tables.rowtable.

But while it's true that Tables.columntable is extremely fast. It seems like that's because it's a lazy operation. If I want to actually do anything with the columns, they must first be parsed which allocates for every record.

For example:

postgres=# create table t (x integer, y integer); insert into t (select *,1 from generate_series(1,10000000));
CREATE TABLE
INSERT 0 10000000
julia> c = LibPQ.Connection("postgresql://postgres@localhost:5432/postgres");

julia> @time r = LibPQ.execute(c, "select * from t", binary_format=false, not_null=true);
  1.643387 seconds (85 allocations: 4.484 KiB)

julia> @time t = Tables.columntable(r);
  0.000096 seconds (23 allocations: 1.109 KiB)

julia> @time sum(t.x);
  1.116090 seconds (20.00 M allocations: 686.646 MiB, 8.03% gc time)

julia> typeof(t)
NamedTuple{(:x, :y), Tuple{LibPQ.Column{Int32, 0x00000017, Int32, LibPQ.Result{false}, LibPQ.ParseType{Int32}}, LibPQ.Column{Int32, 0x00000017, Int32, LibPQ.Result{false}, LibPQ.ParseType{Int32}}}}

(everything was already run once to compile)

Also, it seems like Tables.rowtable is much slower than it should be. In this example, I had to run it on a smaller table because I was impatient, but extrapolating the timing it would take about 500 seconds to complete.

@robsmith11
Copy link
Author

Ah nevermind, I forgot to set binary_format=true in the last set of tests. When I used that, the results of Tables.columntable don't need additional parsing and are fast.

I also noticed that it's much (orders of magnitude) faster to to Tables.rowtable(Tables.columntable(r)) than just calling Tables.rowtable(r).

@iamed2
Copy link
Collaborator

iamed2 commented Oct 27, 2022

I also noticed that it's much (orders of magnitude) faster to to Tables.rowtable(Tables.columntable(r)) than just calling Tables.rowtable(r).

This is a really good catch, we should be able to make a change to get Tables to do that automatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants