Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgres hanging on large DDL commands #431

Closed
zapov opened this issue Nov 25, 2015 · 32 comments
Closed

Postgres hanging on large DDL commands #431

zapov opened this issue Nov 25, 2015 · 32 comments

Comments

@zapov
Copy link
Contributor

zapov commented Nov 25, 2015

I created a PR while ago: #365 because I had a problem with executing SQL scripts via JDBC. Recently Postgres started hanging while executing those scripts.
This happens on Windows/Linux and is reproducible with those scripts. I experimented with changing buffer sizes (send/receive) but it's not really a solution (while some scripts pass, other fails).

Looking at the JDBC code it looks like it splits those large sql statements into smaller parts and tries to send them using extended protocol. It looks like Postgres doesn't cope well with that... so first obvious solution looked like just send it using simple protocol (which includes sql length - so there would be no issues).

I still think that's far better solution than to rewrite that query into smaller parts.
Is there any reason I'm missing why we don't use simple protocol for non prepared statements. Most queries are executed with prepared statements (to pass in the parameters)... so I can't think of a good reason not to use simple v3 protocol in that case.

Executing those scripts using v2 protocol hangs also.

Regards,
Rikard

@vlsi
Copy link
Member

vlsi commented Nov 25, 2015

Can you please provide a test case that exhibits your problem?

@zapov
Copy link
Contributor Author

zapov commented Nov 25, 2015

I've put the link in the PR back up again. And there is a simple code snippet which reproduces the issue. Or do you want a PR with specific test?
Because we create a lot of different scripts.... and that particular script worked on v2 protocol, but some others didn't.

I wanted to discuss that before submitting tests, but I can submit the test if that's what you need.

@vlsi
Copy link
Member

vlsi commented Nov 25, 2015

Or do you want a PR with specific test?

Exactly.
I would prefer if you would address the issue as well, however having a PR that reproduces the issue is like having a half of the solution.

@zapov
Copy link
Contributor Author

zapov commented Nov 25, 2015

I've pushed the test to: #432
Still, I'm interested why not just use simple protocol for non prepared statements... this is unrelated to the issue at hand.

@vlsi
Copy link
Member

vlsi commented Nov 25, 2015

I'm interested why not just use simple protocol for non prepared statements.

The smaller the code base, the smaller number of bugs we have.

@vlsi
Copy link
Member

vlsi commented Nov 25, 2015

@zapov , you mean that when running simple query, postgresql supports semicolon-separated compound queries and never hangs, don't you?
If that is true, it makes sense to skip query split at jdbc side.

Does it require a specific backend version?

@zapov
Copy link
Contributor Author

zapov commented Nov 25, 2015

Yes, that is correct. No, it should work on all versions.

@vlsi
Copy link
Member

vlsi commented Nov 25, 2015

Are you sure lots of select 1; select 1; select 1; select 1; select 1; select 1; ... (or some other repeated command) do not reproduce the issue with current implementation?

@vlsi
Copy link
Member

vlsi commented Nov 25, 2015

Yes, that is correct. No it should work on all versions.

Then something like #365 might be worth including, however I do not like the way you hijack pendingExecuteQueue https://github.com/pgjdbc/pgjdbc/pull/365/files#diff-a6603b20557a46ead4a33cbf18159d34R1869 and pendingDescribePortalQueue: https://github.com/pgjdbc/pgjdbc/pull/365/files#diff-a6603b20557a46ead4a33cbf18159d34R2013

Can you please retry #365 without adding SimpleQuery field? (https://github.com/pgjdbc/pgjdbc/pull/365/files#diff-a6603b20557a46ead4a33cbf18159d34R2302)

@zapov
Copy link
Contributor Author

zapov commented Nov 25, 2015

Yes.
Although, select 1; select 1 doesn't really work as you might expect. It returns only the last query. But that's known and expected behavior.

I opened this issue for discussion if it's a good idea to use simple protocol for non prepared queries and extended protocol for prepared queries.

My previous PR was as such so I don't introduce any changes to the JDBC, but that's why it was a bit hacky. Can redo it if we are settled on the proposed behavior

@vlsi
Copy link
Member

vlsi commented Nov 25, 2015

Although, select 1; select 1 doesn't really work as you might expect. It returns only the last query

Does it hang?
I wonder what makes your sql script special in terms of hanging.

I opened this issue for discussion if it's a good idea to use simple protocol for non prepared queries and extended protocol for prepared queries.

I see two aspects:

  1. functional.

It seems "simple protocol" somehow solves "hanging on large SQLs" issue.
That is good provided that behavior is documented.

There might be similar issue with PreparedStatements, so we might need to solve that issue with extended protocol anyway.

  1. non-functional.
Issue Simple protocol Extended protocol
Split query at jdbc side + not required - required
Network roundtrips + probably, less - probably, more
Send full sql on each execution - have to + can send just statement name
Reuse execution plan - not able + able to reuse plan
Memory consumption at backend - probably higher + probably less

So, it makes sense to benchmark the impact of using simple protocol for non-prepared statements.

@zapov
Copy link
Contributor Author

zapov commented Nov 30, 2015

So I took a look at this more closely over the weekend... and came to the conclusion that as things are currently implemented, it doesn't seem worth changing non-prepared statements to simple protocol ;(

Not sure where to go from here...
The workaround Dave proposed (using v2 of the protocol - which uses simple protocol) has other issues. On 9.4 it throws warning that about autocommit (even since we don't specify anything related) and 9.5 throws error:

  • 9.4 ERROR: SET AUTOCOMMIT TO OFF is no longer supported
  • 9.5 ERROR: unrecognized configuration parameter "autocommit"

which I guess is kind-of ok since v2 is probably going to be deprecated.

I actually implemented change in the driver... and almost all of the test pass - only few issues with BEGIN/transactions behavior which I didn't look too closely ATM.

So now I'm thinking about using public API outside of javax interfaces to get such a behavior, but it seems that it all ends up in the same place... so not too much different than my implementation of simple protocol for non-prepared statements.

Regarding benches... I actually maintain one DB driver bench where JDBC is tested explicitly and implicitly (and I've noticed few performance regressions on some places in last few versions).

I'll probably take a look at v2 again next.

btw. I think it would be better to refactor the code in such a way that I can introduce such change without modifications of the current codebase - but that's unrelated discussion.

@vlsi
Copy link
Member

vlsi commented Nov 30, 2015

and came to the conclusion that as things are currently implemented, it doesn't seem worth changing non-prepared statements to simple protocol

what is the showstopper there?

@zapov
Copy link
Contributor Author

zapov commented Dec 2, 2015

There is no real showstopper ... as in it can be refactored... just it looks too costly and risky.

I still think thats the best design (simple protocol for non-prepared queries, extended for prepared), but I couldn't change the current codebase cleanly. So I'll look into what can be done with hanging (either by calculating the whole query size before.... or something else). I tried to reproduce the bug with smaller test (which builds up the large query) but couldn't.

Issues I had while refactoring:

  • changing the processing loop for v3 to support SP felt like a hack. It would be more clean to have separate methods for extended/simple protocol processing loop, but that would feel like code duplication - although I think a worth one
  • the relationship between query executor, statements and connection didn't feel right. I wanted to use: https://github.com/pgjdbc/pgjdbc/blob/master/org/postgresql/jdbc2/AbstractJdbc2Connection.java#L372 but it goes through the statement execute instead of executing query on the executor.
  • statement executeWithFlags looks has too many assumptions... and I needed to split it... but this would result in many duplications

probably some others I can't remember just now.
Anyway... when I get to do a performance optimizations of one of my libraries which uses JDBC... i'll probably revisit SP changes. Until then it looks like it's easier to resolve hanging issues on the extended protocol.

@zapov
Copy link
Contributor Author

zapov commented Dec 8, 2015

So I found some time to finish up this CR. It can be seen in PR: #450

I maintain DB driver bench at: https://github.com/ngs-doo/dal-benchmark
and here are some numbers from it

test type JDBC 1202 JDBC latest binary JDBC latest simple Revenj 1202 Revenj latest binary Revenj latest simple
bulk_insert 900 813 798 146 159 176
bulk_update 796 864 898 224 237 244
loop_insert_half 4593 4629 4556 4636 4788 5007
loop_update_half 4622 4617 4507 4789 5235 4859
search_all 767 685 743 1753 1699 1674
search_subset 556 547 560 878 899 929
find_many 463 469 453 599 960 546
find_one 428 433 413 866 1370 723
report 645 665 652 602 664 676

While the numbers are not stable, on average they show if there are some regressions. Only noticeable regression is for Revenj find_many/find_one which uses simple protocol. It seems it became a lot slower on binary protocol, but it's fast again when simple queries is enabled.

All JDBC queries are using prepared statements.
Actual queries are vastly different so they lead to different numbers in some queries.

@vlsi
Copy link
Member

vlsi commented Dec 9, 2015

Only noticeable regression is for Revenj find_many/find_one which uses simple protocol. It seems it became a lot slower on binary protocol

@zapov , did you investigate why is the regression?
Have you considered using jmh as a benchmark harness?

@zapov
Copy link
Contributor Author

zapov commented Dec 9, 2015

I looked into it now.
It seems I changed few things so there is no real regression.
The issue is that I run the old test without prepareThreshold=-1 and the new with that argument.
Reran it now and i get 9s (for large number of runs) across the versions without PT arg. My patched version with simple queries enabled (and without PT) runs at 7.5s.
With PT=-1 forcebinary is set to true which results in every simple query being prepared before executing. This is the cause of regression, not changes in driver.
It runs around 20+s with PT=-1

The bench is used for comparison with .NET so I need to gather formatted output somehow, so it's not an ideal candidate for JMH.

Thinking about it some more, even if we enable simple queries by default (in my patch) it will still be much slower due to PT=-1 argument. It seems to my that SQ should take precedence over PT=-1 for non-prepared statements.

@vlsi
Copy link
Member

vlsi commented Dec 9, 2015

@zapov , it would be nice to have less tunable parameters as no one knows what values to set.

If "text query always improved performance", it would make sense to enable the feature by default.

Am I right your main concern is "thread stuck due to non-consumed input"?
Did you look into "using separate thread" approach to handle that?

@zapov
Copy link
Contributor Author

zapov commented Dec 9, 2015

I agree that there should be less parameters which goes in hand if non-prepared statements use simple queries. I'm fine with refactoring into that if we agree on that behavior.

Also I think PT=-1 should not force non-prepared statements into prepared statements. Not sure if forceBinary should do that, but explicit forceBinary is much better than PT=-1. I don't think PT parameter should have any influence on non-prepared statements.

If you look into the numbers, it's interesting to notice the difference between search_all numbers.
Prepared statements with binary protocol is 3x faster than prepared statements with text protocol, so it's not only about prepared statements/non-prepared statements. For simple queries (eg find part) overhead of non-prepared statements is not really large, so yeah, I would say using simple queries on non-prepared statements mostly improves performance.

I think prepared statements/non-prepared statements map naturally to Postgres v3 protocol extended/simple queries. And plus it will solves my main problem of having large sql query hang.

Not sure I'm understanding you fully, but yeah, Java thread get's stuck on sending data over the wire. Even if I start another thread to look onto the one executing the commands, it doesn't really solve my problem. The only thing I can do is kill that thread - which means I still can't apply that SQL to the server. And the simple solution to that problem is using simple queries which include length as part of the command, so Postgres copes with it.

@vlsi
Copy link
Member

vlsi commented Dec 9, 2015

forceBinary

binary != prepared.

search_all numbers... Prepared statements with binary protocol is 3x faster than prepared statements with text protocol

Can you point to the code what is "prepared statements with text protocol"?
I do not quite understand what do you mean by "text protocol".

@vlsi
Copy link
Member

vlsi commented Dec 9, 2015

Not sure I'm understanding you fully, but yeah, Java thread get's stuck on sending data over the wire.

I believe the thread gets stuck just because there is no thread that performs reads while the thread does its writes of the large query.

Here's description:

Here's a workaround for batch executions:

flushIfDeadlockRisk(subquery, disallowBatching, trackingHandler, flags);

I wonder what if we add dedicated thread pool to perform reads, so no deadlock happens.
Well, that opens quite a new can of worms, but it might be interesting if that would solve those deadlocks.

@vlsi
Copy link
Member

vlsi commented Dec 9, 2015

For simple queries (eg find part) overhead of non-prepared statements is not really large, so yeah, I would say using simple queries on non-prepared statements mostly improves performance.

Did you mean "using text queries on non-prepared statements mostly improves performance"?
It does confuse me when you say "simple" as there's already org.postgresql.core.v3.SimpleQuery.

@zapov
Copy link
Contributor Author

zapov commented Dec 9, 2015

When I say simple queries I'm referencing Postgres terminology. On v3 Postgres supports simple and extended queries.

I'm referring to this:

PT=-1 caused forceBinary to be enabled.

Postgres supports both binary and text protocol.
Extended queries support both.
Simple queries support only text protocol.

Ah I see now what you mean (reading using another thread). Did not try that.

@vlsi
Copy link
Member

vlsi commented Dec 9, 2015

PT=-1 caused forceBinary to be enabled. I'm referring to this:

I wish I never saw that. That looks strange indeed.
So, the true way to use "always prepare" is PT=-42, isn't it?

@zapov
Copy link
Contributor Author

zapov commented Dec 10, 2015

So I looked at this today again... and managed to get it working... although it required a separate thread for flushing the output stream and buffering the entire stream in memory before sending.
I experimented with processing messages if they were available while looping through the subqueries collection, but it just kept hanging at the different place depending on the size of the output buffer.
I tried reading responses in separate thread, but it corrupted the message somehow since I got an error from Postgres about invalid message.

Also, I noticed strange behavior in this construct:

since if I understand that correctly, that will actually commit transaction if queries are run in autocommit mode.

Right now the easiest solution for my problem is to get an instance of PGStream, but it doesn't seem available anywhere.
Any objections to exposing it in QueryExecutor?

@vlsi
Copy link
Member

vlsi commented Dec 10, 2015

I tried reading responses in separate thread, but it corrupted the message somehow since I got an error from Postgres about invalid message

Can you share the code? I thought offloading "read path" to a separate thread should be the easiest approach.

Right now the easiest solution for my problem is to get an instance of PGStream, but it doesn't seem available anywhere. Any objections to exposing it in QueryExecutor?

Do you mean QueryExecutor.getPGStream?
I do not think that would be good. However, I do not see where are you heading, so it is hard to judge.

@zapov
Copy link
Contributor Author

zapov commented Dec 10, 2015

Sorry, don't have it around anymore. Only committed the final version ;(

Yeah, thats what I mean. I wanted an escape hatch so I can handle it myself without introducing additional complexity into the driver.

@vlsi
Copy link
Member

vlsi commented Dec 10, 2015

I wanted an escape hatch so I can handle it myself without introducing additional complexity into the driver

If that works from functional point of view, we can then refactor it if required.

@zapov
Copy link
Contributor Author

zapov commented Dec 10, 2015

Yeah, it worked. If you are interested how that looked: ngs-doo/dsl-compiler-client@9195a04

Btw. it would be nice if there was a way to get connection without a cast, but not really a big issue.
Let me know if you want me to make a PR for it.

@vlsi
Copy link
Member

vlsi commented Dec 10, 2015

Ah, you mean "get PGStream to the application so it could execute simple query"? Then it sounds very bad.
I'm not against using "simple query" (Q...) for .createStatement.

It thought you need PGStream for multithreaded code to work.

@zapov
Copy link
Contributor Author

zapov commented Dec 10, 2015

Ok. Tnx for your help. I guess I'll find some other workaround

@zapov zapov closed this as completed Dec 10, 2015
@vlsi
Copy link
Member

vlsi commented Dec 10, 2015

I guess as @davecramer returns back, something like #450 can be merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants