Postgres hanging on large DDL commands #431

zapov · 2015-11-25T08:32:21Z

I created a PR while ago: #365 because I had a problem with executing SQL scripts via JDBC. Recently Postgres started hanging while executing those scripts.
This happens on Windows/Linux and is reproducible with those scripts. I experimented with changing buffer sizes (send/receive) but it's not really a solution (while some scripts pass, other fails).

Looking at the JDBC code it looks like it splits those large sql statements into smaller parts and tries to send them using extended protocol. It looks like Postgres doesn't cope well with that... so first obvious solution looked like just send it using simple protocol (which includes sql length - so there would be no issues).

I still think that's far better solution than to rewrite that query into smaller parts.
Is there any reason I'm missing why we don't use simple protocol for non prepared statements. Most queries are executed with prepared statements (to pass in the parameters)... so I can't think of a good reason not to use simple v3 protocol in that case.

Executing those scripts using v2 protocol hangs also.

Regards,
Rikard

vlsi · 2015-11-25T08:35:54Z

Can you please provide a test case that exhibits your problem?

zapov · 2015-11-25T08:40:57Z

I've put the link in the PR back up again. And there is a simple code snippet which reproduces the issue. Or do you want a PR with specific test?
Because we create a lot of different scripts.... and that particular script worked on v2 protocol, but some others didn't.

I wanted to discuss that before submitting tests, but I can submit the test if that's what you need.

vlsi · 2015-11-25T08:42:42Z

Or do you want a PR with specific test?

Exactly.
I would prefer if you would address the issue as well, however having a PR that reproduces the issue is like having a half of the solution.

zapov · 2015-11-25T09:15:29Z

I've pushed the test to: #432
Still, I'm interested why not just use simple protocol for non prepared statements... this is unrelated to the issue at hand.

vlsi · 2015-11-25T09:17:24Z

I'm interested why not just use simple protocol for non prepared statements.

The smaller the code base, the smaller number of bugs we have.

vlsi · 2015-11-25T09:31:06Z

@zapov , you mean that when running simple query, postgresql supports semicolon-separated compound queries and never hangs, don't you?
If that is true, it makes sense to skip query split at jdbc side.

Does it require a specific backend version?

zapov · 2015-11-25T09:33:35Z

Yes, that is correct. No, it should work on all versions.

vlsi · 2015-11-25T09:34:41Z

Are you sure lots of select 1; select 1; select 1; select 1; select 1; select 1; ... (or some other repeated command) do not reproduce the issue with current implementation?

vlsi · 2015-11-25T09:37:56Z

Yes, that is correct. No it should work on all versions.

Then something like #365 might be worth including, however I do not like the way you hijack pendingExecuteQueue https://github.com/pgjdbc/pgjdbc/pull/365/files#diff-a6603b20557a46ead4a33cbf18159d34R1869 and pendingDescribePortalQueue: https://github.com/pgjdbc/pgjdbc/pull/365/files#diff-a6603b20557a46ead4a33cbf18159d34R2013

Can you please retry #365 without adding SimpleQuery field? (https://github.com/pgjdbc/pgjdbc/pull/365/files#diff-a6603b20557a46ead4a33cbf18159d34R2302)

zapov · 2015-11-25T09:39:53Z

Yes.
Although, select 1; select 1 doesn't really work as you might expect. It returns only the last query. But that's known and expected behavior.

I opened this issue for discussion if it's a good idea to use simple protocol for non prepared queries and extended protocol for prepared queries.

My previous PR was as such so I don't introduce any changes to the JDBC, but that's why it was a bit hacky. Can redo it if we are settled on the proposed behavior

vlsi · 2015-11-25T11:14:09Z

Although, select 1; select 1 doesn't really work as you might expect. It returns only the last query

Does it hang?
I wonder what makes your sql script special in terms of hanging.

I opened this issue for discussion if it's a good idea to use simple protocol for non prepared queries and extended protocol for prepared queries.

I see two aspects:

functional.

It seems "simple protocol" somehow solves "hanging on large SQLs" issue.
That is good provided that behavior is documented.

There might be similar issue with PreparedStatements, so we might need to solve that issue with extended protocol anyway.

non-functional.

Issue	Simple protocol	Extended protocol
Split query at jdbc side	+ not required	- required
Network roundtrips	+ probably, less	- probably, more
Send full sql on each execution	- have to	+ can send just statement name
Reuse execution plan	- not able	+ able to reuse plan
Memory consumption at backend	- probably higher	+ probably less

So, it makes sense to benchmark the impact of using simple protocol for non-prepared statements.

zapov · 2015-11-30T01:36:46Z

So I took a look at this more closely over the weekend... and came to the conclusion that as things are currently implemented, it doesn't seem worth changing non-prepared statements to simple protocol ;(

Not sure where to go from here...
The workaround Dave proposed (using v2 of the protocol - which uses simple protocol) has other issues. On 9.4 it throws warning that about autocommit (even since we don't specify anything related) and 9.5 throws error:

9.4 ERROR: SET AUTOCOMMIT TO OFF is no longer supported
9.5 ERROR: unrecognized configuration parameter "autocommit"

which I guess is kind-of ok since v2 is probably going to be deprecated.

I actually implemented change in the driver... and almost all of the test pass - only few issues with BEGIN/transactions behavior which I didn't look too closely ATM.

So now I'm thinking about using public API outside of javax interfaces to get such a behavior, but it seems that it all ends up in the same place... so not too much different than my implementation of simple protocol for non-prepared statements.

Regarding benches... I actually maintain one DB driver bench where JDBC is tested explicitly and implicitly (and I've noticed few performance regressions on some places in last few versions).

I'll probably take a look at v2 again next.

btw. I think it would be better to refactor the code in such a way that I can introduce such change without modifications of the current codebase - but that's unrelated discussion.

vlsi · 2015-11-30T05:11:47Z

and came to the conclusion that as things are currently implemented, it doesn't seem worth changing non-prepared statements to simple protocol

what is the showstopper there?

zapov · 2015-12-02T11:36:59Z

There is no real showstopper ... as in it can be refactored... just it looks too costly and risky.

I still think thats the best design (simple protocol for non-prepared queries, extended for prepared), but I couldn't change the current codebase cleanly. So I'll look into what can be done with hanging (either by calculating the whole query size before.... or something else). I tried to reproduce the bug with smaller test (which builds up the large query) but couldn't.

Issues I had while refactoring:

changing the processing loop for v3 to support SP felt like a hack. It would be more clean to have separate methods for extended/simple protocol processing loop, but that would feel like code duplication - although I think a worth one
the relationship between query executor, statements and connection didn't feel right. I wanted to use: https://github.com/pgjdbc/pgjdbc/blob/master/org/postgresql/jdbc2/AbstractJdbc2Connection.java#L372 but it goes through the statement execute instead of executing query on the executor.
statement executeWithFlags looks has too many assumptions... and I needed to split it... but this would result in many duplications

probably some others I can't remember just now.
Anyway... when I get to do a performance optimizations of one of my libraries which uses JDBC... i'll probably revisit SP changes. Until then it looks like it's easier to resolve hanging issues on the extended protocol.

zapov · 2015-12-08T23:48:35Z

So I found some time to finish up this CR. It can be seen in PR: #450

I maintain DB driver bench at: https://github.com/ngs-doo/dal-benchmark
and here are some numbers from it

test type	JDBC 1202	JDBC latest binary	JDBC latest simple	Revenj 1202	Revenj latest binary	Revenj latest simple
bulk_insert	900	813	798	146	159	176
bulk_update	796	864	898	224	237	244
loop_insert_half	4593	4629	4556	4636	4788	5007
loop_update_half	4622	4617	4507	4789	5235	4859
search_all	767	685	743	1753	1699	1674
search_subset	556	547	560	878	899	929
find_many	463	469	453	599	960	546
find_one	428	433	413	866	1370	723
report	645	665	652	602	664	676

While the numbers are not stable, on average they show if there are some regressions. Only noticeable regression is for Revenj find_many/find_one which uses simple protocol. It seems it became a lot slower on binary protocol, but it's fast again when simple queries is enabled.

All JDBC queries are using prepared statements.
Actual queries are vastly different so they lead to different numbers in some queries.

vlsi · 2015-12-09T06:33:00Z

Only noticeable regression is for Revenj find_many/find_one which uses simple protocol. It seems it became a lot slower on binary protocol

@zapov , did you investigate why is the regression?
Have you considered using jmh as a benchmark harness?

zapov · 2015-12-09T10:20:09Z

I looked into it now.
It seems I changed few things so there is no real regression.
The issue is that I run the old test without prepareThreshold=-1 and the new with that argument.
Reran it now and i get 9s (for large number of runs) across the versions without PT arg. My patched version with simple queries enabled (and without PT) runs at 7.5s.
With PT=-1 forcebinary is set to true which results in every simple query being prepared before executing. This is the cause of regression, not changes in driver.
It runs around 20+s with PT=-1

The bench is used for comparison with .NET so I need to gather formatted output somehow, so it's not an ideal candidate for JMH.

Thinking about it some more, even if we enable simple queries by default (in my patch) it will still be much slower due to PT=-1 argument. It seems to my that SQ should take precedence over PT=-1 for non-prepared statements.

vlsi · 2015-12-09T12:04:12Z

@zapov , it would be nice to have less tunable parameters as no one knows what values to set.

If "text query always improved performance", it would make sense to enable the feature by default.

Am I right your main concern is "thread stuck due to non-consumed input"?
Did you look into "using separate thread" approach to handle that?

zapov · 2015-12-09T12:20:28Z

I agree that there should be less parameters which goes in hand if non-prepared statements use simple queries. I'm fine with refactoring into that if we agree on that behavior.

Also I think PT=-1 should not force non-prepared statements into prepared statements. Not sure if forceBinary should do that, but explicit forceBinary is much better than PT=-1. I don't think PT parameter should have any influence on non-prepared statements.

If you look into the numbers, it's interesting to notice the difference between search_all numbers.
Prepared statements with binary protocol is 3x faster than prepared statements with text protocol, so it's not only about prepared statements/non-prepared statements. For simple queries (eg find part) overhead of non-prepared statements is not really large, so yeah, I would say using simple queries on non-prepared statements mostly improves performance.

I think prepared statements/non-prepared statements map naturally to Postgres v3 protocol extended/simple queries. And plus it will solves my main problem of having large sql query hang.

Not sure I'm understanding you fully, but yeah, Java thread get's stuck on sending data over the wire. Even if I start another thread to look onto the one executing the commands, it doesn't really solve my problem. The only thing I can do is kill that thread - which means I still can't apply that SQL to the server. And the simple solution to that problem is using simple queries which include length as part of the command, so Postgres copes with it.

vlsi · 2015-12-09T12:40:55Z

forceBinary

binary != prepared.

search_all numbers... Prepared statements with binary protocol is 3x faster than prepared statements with text protocol

Can you point to the code what is "prepared statements with text protocol"?
I do not quite understand what do you mean by "text protocol".

vlsi · 2015-12-09T12:46:35Z

Not sure I'm understanding you fully, but yeah, Java thread get's stuck on sending data over the wire.

I believe the thread gets stuck just because there is no thread that performs reads while the thread does its writes of the large query.

Here's description:

pgjdbc/org/postgresql/core/v3/QueryExecutorImpl.java

Line 207 in a194c5e

// Deadlock avoidance:

Here's a workaround for batch executions:

pgjdbc/org/postgresql/core/v3/QueryExecutorImpl.java

Line 1137 in a194c5e

flushIfDeadlockRisk(subquery, disallowBatching, trackingHandler, flags);

I wonder what if we add dedicated thread pool to perform reads, so no deadlock happens.
Well, that opens quite a new can of worms, but it might be interesting if that would solve those deadlocks.

vlsi · 2015-12-09T12:52:28Z

For simple queries (eg find part) overhead of non-prepared statements is not really large, so yeah, I would say using simple queries on non-prepared statements mostly improves performance.

Did you mean "using text queries on non-prepared statements mostly improves performance"?
It does confuse me when you say "simple" as there's already org.postgresql.core.v3.SimpleQuery.

zapov · 2015-12-09T13:32:08Z

When I say simple queries I'm referencing Postgres terminology. On v3 Postgres supports simple and extended queries.

I'm referring to this:

pgjdbc/org/postgresql/jdbc2/AbstractJdbc2Connection.java

Line 149 in 0e5ba31

forcebinary = true;

PT=-1 caused forceBinary to be enabled.

Postgres supports both binary and text protocol.
Extended queries support both.
Simple queries support only text protocol.

Ah I see now what you mean (reading using another thread). Did not try that.

vlsi · 2015-12-09T13:36:20Z

PT=-1 caused forceBinary to be enabled. I'm referring to this:

I wish I never saw that. That looks strange indeed.
So, the true way to use "always prepare" is PT=-42, isn't it?

zapov · 2015-12-10T06:40:58Z

So I looked at this today again... and managed to get it working... although it required a separate thread for flushing the output stream and buffering the entire stream in memory before sending.
I experimented with processing messages if they were available while looping through the subqueries collection, but it just kept hanging at the different place depending on the size of the output buffer.
I tried reading responses in separate thread, but it corrupted the message somehow since I got an error from Postgres about invalid message.

Also, I noticed strange behavior in this construct:

pgjdbc/org/postgresql/core/v3/QueryExecutorImpl.java

Line 1104 in a194c5e

sendSync();

since if I understand that correctly, that will actually commit transaction if queries are run in autocommit mode.

Right now the easiest solution for my problem is to get an instance of PGStream, but it doesn't seem available anywhere.
Any objections to exposing it in QueryExecutor?

vlsi · 2015-12-10T07:41:51Z

I tried reading responses in separate thread, but it corrupted the message somehow since I got an error from Postgres about invalid message

Can you share the code? I thought offloading "read path" to a separate thread should be the easiest approach.

Right now the easiest solution for my problem is to get an instance of PGStream, but it doesn't seem available anywhere. Any objections to exposing it in QueryExecutor?

Do you mean QueryExecutor.getPGStream?
I do not think that would be good. However, I do not see where are you heading, so it is hard to judge.

zapov · 2015-12-10T07:52:50Z

Sorry, don't have it around anymore. Only committed the final version ;(

Yeah, thats what I mean. I wanted an escape hatch so I can handle it myself without introducing additional complexity into the driver.

vlsi · 2015-12-10T07:55:12Z

I wanted an escape hatch so I can handle it myself without introducing additional complexity into the driver

If that works from functional point of view, we can then refactor it if required.

zapov · 2015-12-10T09:07:33Z

Yeah, it worked. If you are interested how that looked: ngs-doo/dsl-compiler-client@9195a04

Btw. it would be nice if there was a way to get connection without a cast, but not really a big issue.
Let me know if you want me to make a PR for it.

vlsi · 2015-12-10T09:11:56Z

Ah, you mean "get PGStream to the application so it could execute simple query"? Then it sounds very bad.
I'm not against using "simple query" (Q...) for .createStatement.

It thought you need PGStream for multithreaded code to work.

zapov · 2015-12-10T10:25:20Z

Ok. Tnx for your help. I guess I'll find some other workaround

vlsi · 2015-12-10T10:42:25Z

I guess as @davecramer returns back, something like #450 can be merged.

zapov closed this as completed Dec 10, 2015

polobo mentioned this issue Apr 15, 2016

feat: Support for Simple queries on v3 protocol. #450

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Postgres hanging on large DDL commands #431

Postgres hanging on large DDL commands #431

zapov commented Nov 25, 2015

vlsi commented Nov 25, 2015

zapov commented Nov 25, 2015

vlsi commented Nov 25, 2015

zapov commented Nov 25, 2015

vlsi commented Nov 25, 2015

vlsi commented Nov 25, 2015

zapov commented Nov 25, 2015

vlsi commented Nov 25, 2015

vlsi commented Nov 25, 2015

zapov commented Nov 25, 2015

vlsi commented Nov 25, 2015

zapov commented Nov 30, 2015

vlsi commented Nov 30, 2015

zapov commented Dec 2, 2015

zapov commented Dec 8, 2015

vlsi commented Dec 9, 2015

zapov commented Dec 9, 2015

vlsi commented Dec 9, 2015

zapov commented Dec 9, 2015

vlsi commented Dec 9, 2015

vlsi commented Dec 9, 2015

vlsi commented Dec 9, 2015

zapov commented Dec 9, 2015

vlsi commented Dec 9, 2015

zapov commented Dec 10, 2015

vlsi commented Dec 10, 2015

zapov commented Dec 10, 2015

vlsi commented Dec 10, 2015

zapov commented Dec 10, 2015

vlsi commented Dec 10, 2015

zapov commented Dec 10, 2015

vlsi commented Dec 10, 2015

Postgres hanging on large DDL commands #431

Postgres hanging on large DDL commands #431

Comments

zapov commented Nov 25, 2015

vlsi commented Nov 25, 2015

zapov commented Nov 25, 2015

vlsi commented Nov 25, 2015

zapov commented Nov 25, 2015

vlsi commented Nov 25, 2015

vlsi commented Nov 25, 2015

zapov commented Nov 25, 2015

vlsi commented Nov 25, 2015

vlsi commented Nov 25, 2015

zapov commented Nov 25, 2015

vlsi commented Nov 25, 2015

zapov commented Nov 30, 2015

vlsi commented Nov 30, 2015

zapov commented Dec 2, 2015

zapov commented Dec 8, 2015

vlsi commented Dec 9, 2015

zapov commented Dec 9, 2015

vlsi commented Dec 9, 2015

zapov commented Dec 9, 2015

vlsi commented Dec 9, 2015

vlsi commented Dec 9, 2015

vlsi commented Dec 9, 2015

zapov commented Dec 9, 2015

vlsi commented Dec 9, 2015

zapov commented Dec 10, 2015

vlsi commented Dec 10, 2015

zapov commented Dec 10, 2015

vlsi commented Dec 10, 2015

zapov commented Dec 10, 2015

vlsi commented Dec 10, 2015

zapov commented Dec 10, 2015

vlsi commented Dec 10, 2015