No closing connections mode #1845

zcolleen · 2023-12-21T23:50:30Z

This is a big story of performance issue with PgBouncer.

In highload systems there are many context cancelations. When context canceles , pgx closes connection asynchronously. And here the performance issue of PgBouncer comes. When there are a lot of connections closings, bouncer starts to serve a lot of closings and he cant manage with other payload and the whole system starts failing. This issue was disscused many times but its still not resolved, so, unfortunately, its impossible to use pgxpool in highload with bouncer with deadline in context.

In our company we solved this issue and we were using fork of pgx v4 https://github.com/ozontech/pgx/tree/ozontech-patch but v4 is quit old, so i decided to make patch in v5 and come to you with it.

This issue is very hard to reproduce at home as it is connected with highload, but at least i made this example https://github.com/zcolleen/pgx_test . I started postgres with bouncer and ran the program. I have chosen the timeout when i would have about 20 percent of traffic with context cancelations (i made a kind of "metrics", handler /failed/deadline is showing percent of cancelations) . As time passes, the percent is growing because bouncer cant manage with this type of load, so at the end percent is about 90. Than i changed pgx to my fork and with it the percent of canceles is not growing, staying as it was at start.

The patch is quiet simple. When context canceles, we return error to user but we do not close connection. Opposite, we clean the socket asynchronously from data that was left from previous request and just return it to the pool. As practice shows, for pgbouncer its much more easier than closing connections, so, even if you have a lot of context canceles, the whole system wont fail.
I made a new option in config that changes mechanics, by default the behaviour is not changed.

I hope you would have some time to take a look at this pr and give some feedback. Thanks in advance!

P.S. This mechanics is not something new. database/sql is doing quit the same things.

jackc · 2023-12-23T17:54:30Z

Ugg... I had a long reply written and somehow going back and forth between the "Conversion" and "Files Changed" tabs I lost it. So sorry that this reply might be a bit terse.

pgx used to work this way years ago. It led to a lot of edge cases that were very difficult to be sure were reliably handled. They were also almost impossible to test. In addition, any context cancellation can cause an unrecoverable situation (e.g. interrupted write over TLS is fatal).

The current behavior is intentional. I agree it would be better if a context cancellation didn't close the underlying connection but it is harder than it looks. I don't see any obvious bugs with the PR, but this is not a "no obvious bugs" feature -- it is a "obviously no bugs" feature.

I'm not actually sure what that would look like though -- which is why I switched to closing connections instead of improving the recover connections on context cancellation code that existed.

My concerns are more general than specific, but I do have a little concrete feedback regarding the PR:

It seems to put behavior that belongs in the pool in the connection (e.g. sessionResetQuery seems to be a pool level concern).
It seems there should be a way to not increase the public interface as much (or not at all).
I'm not sure how I feel about this being a mode. While it does preserve existing behavior and should mean that any bugs can be avoided by not using this mode -- it also kind of doubles the surface area for context cancellation because there are two paths. Ideally, there would be only one path.

But as I said above, I don't really have any better ideas either...

zcolleen · 2024-01-02T22:18:42Z

@jackc thank you for your feedback! I left only one public function LaunchCleanup that does all the job of cleaning. I removed cleanup when transaction is commited/rollbacked with already canceled context as its a corner case and happens very rarely on practice and its hard to handle this case without overincreasing public interfaces. I have also added tests on new functionality.

About option concern, i think we have only one way here to leave this as an option and if it will be widely used we could make it as default behaviour. As you said there will be two paths but we need them both, with recover for pgbouncer and with closings for working with tls for example. Also i think for now we dont want to break everything again with switching back to
recovering as default.

We need cleanup only for pgbouncer as it is single-threaded and poor-performing but, unfortunately, it is already an industry standard thats why i hope we could somehow solve this issue as a lot of highload systems need this fix.

jackc · 2024-01-13T18:13:10Z

I'm still not convinced this is the right approach. And while, unfortunately, I still do not have a complete picture of what the right approach is, my thoughts have coalesced a little bit.

I don't think that this should be a mode. If it is correct and reliable it would always be preferable to the current close on error behavior. The only reason for a mode is to limit the damage if there was a bug. This change would be backwards compatible. It is and will always be possible for a context cancellation to break the connection (e.g. an interrupted write over TLS) and it is and will always be possible for a context cancellation to not break the connection (e.g. the context is cancelled before a query even starts). So calling code always needs to handle an interrupted query resulting in a broken connection or the connection still being usable. This change would simply increase the fraction of non-broken connections.

I'd like to approach this primarily from the point of view of pgconn not of pgxpool. There may need to be a hook for pgxpool to determine the state of a released connection but I don't think it should need anything more than that. And ideally, someone using pgconn directly would never need to explicitly call any recovery logic.

I think we would want to implement this cancellation recovery logic at only very specific places where we can be sure of the state of the connection and how to clean it up. For example, only while waiting / reading results from a Query call. In that case we know for sure how to recover from the interruption -- read until ReadyForQuery. This background read would happen in place of asyncClose(). It could wait for X amount of time then fallback to closing the connection. I would envision updating PgConn.lock() to block while the recovery was in progress. Possibly a WaitForResync() would let the application or pool track this more closely.

I think this is as simple as we could get, but yet it still has complicated edge cases. In many cases it would be best to call CancelRequest() to free up resources on the server and to get the connection usable as soon as possible. But great care needs to be taken to avoid race conditions such as the cancel signal meant for query N actually cancelling query N+1. Also, since it requires an entirely new TCP connection, it may be desirable to wait N milliseconds to see if the query would finish naturally.

Exec, SendBatch, pipeline mode, and other cancellable methods could be implemented later if it could be figured out how to handle it reliably.

Or it may even be desirable to make the cancellation recovery process entirely customizable via a application provided function. Not exactly sure what that would look like, but I am pretty sure that has been requested at some point in the past.

zcolleen · 2024-01-30T22:54:48Z

@jackc thank you for your feedback and explanations! I have remade this pr and put all recover logic inside pgconn. I have put recover logic into ResultReader and MultiResultReader when we are trying to read new messages (but in fact we could put recover in every place where we are trying to read data cause as far as i know every postgres command ends with ReadyForQuery message. But to start with it is in ResultReader and MultiResultReader only).

I wasn`t able to make all recovery process customizable but i made the part where we send cancel request customizable. Cancel request is by default but there is an option to custom OnRecover function or disable it completely, because in quick queries it is most likely that query is executed and response is sent before cancel request is sent, so it would just waste resources, so we would like to have an option to disable sending cancel requests. I think i remember the issue where cancellation recovery process was requested to be customizable because of the cancel requests, someone would like to disable them.

P.S. Unfortunately i didnt understand the last part of your feedback Exec, SendBatch, pipeline mode, and other cancellable methods could be implemented later as in pgconn we have only Exec methods so we can implement recovery only there. Сould you please explain this part?

zcolleen · 2024-02-03T21:15:38Z

Sorry, just didnt merge master, fixed build failure, checked that https://github.com/zcolleen/pgx/actions/runs/7769417704 , now everything should be fine with tests

jackc · 2024-02-03T21:16:21Z

I looked over the code and it is an improvement to have the logic in pgconn. But as I mentioned in a previous reply, I'm still unconvinced this is the correct overall approach.

This was bringing back vague memories so I looked back in the previous versions. We had something like this in v3. There was async cancel query and Conn.WaitUntilReady(). But concurrency is really hard to reason about. That's why it was removed. It adds a large number of edge cases. You clearly have found and thought about a number of them. Methods like TxStatus now have to concern themselves with the background recovery, pgConn.status is now accessed atomically, there's also some conditional logic based on whether the connection was in a transaction, etc. But I have no idea if all of them are covered.

I don't see anything specifically wrong. But I also am having difficulty fully understanding how it works. Given my previous experience with finding it difficult to reason about a solution to this problem I am really nervous about adding the feature.

Maybe if we step back a bit there is another approach that will solve your underlying issue with PgBouncer and context cancellation.

I think one of the fundamental issues is there are multiple valid ways an application may want a context cancellation to be handled.

One is to unblock the Go code right now. Another is to ask the PostgreSQL server to cancel the query but do not unblock the Go code. Another approach is to wait a short time to see if the query will complete naturally, then send a cancel request, then wait a while longer, then finally interrupt the connection if the server-side cancellation hasn't finished.

Right now, pgx takes the first approach exclusively -- unblock the Go code at all costs. It uses SetDeadline and interrupts network I/O. However, as we've seen, recovering from an arbitrary network interruption is very difficult.

I think that recovering from arbitrary network interruption is too complex. However, I think we should be able to get the desired functionality without recovering from network interruption.

Perhaps we should consider exposing a function or object that allows the calling application to control what happens when the context is canceled. It could wait, send a cancellation request, or close the connection when and as it saw fit.

That should allow directly handling all cases except where the caller wants to immediately unblock the Go code and wants to preserve the connection. But that can be handled by the caller using a goroutine or wrapper object.

Another advantage of this approach is that alternative means for query cancellation could be used. e.g. another connection could be used to call pg_cancel_backend. That could drastically reduce the load of establishing the ephemeral connections to cancel requests. (This may not be usable with PgBouncer due to it not exposing the actual PostgreSQL PID though.)

jackc · 2024-02-04T04:41:20Z

Here's a proof of concept of the approach I described: #1894.

It doesn't prevent adding recovery the ability to recover from a deadline interrupting a network read, but I think it makes it unnecessary.

zcolleen · 2024-02-04T09:58:19Z

Okay, now i get your idea. But for me it has some disadvantages:

If i want to simply disable HandleCancel or define it with other functionality, that means we wont get read error and that means we dont get context cancelation error. As for me it is strange behaviour of the library because if context is canceled we should return error and there should be no logical connection between Handler and returning error.
If i want to disable HandleCancel (what i will do in case of pgbouncer) i will have to write a lot of boilerplated code with goroutine and context checking, and everybody with the same problem would have to do the same. Moreover this code would be less performing as we have to start goroutine on each request rather than just to start it on recovery process.
It brings slight inconsistency because this api works only when context watcher is started but it is not working when context already canceled (but this point is not as important as first two)

But if you are sure we are not going by recovery road and not giving it second chances than lets move to your idea.

zcolleen · 2024-02-04T10:58:24Z

In fact i would have to write the entire library to support context cancelation with disabled Handler

jackc · 2024-02-05T03:35:16Z

If i want to simply disable HandleCancel or define it with other functionality, that means we wont get read error and that means we dont get context cancelation error. As for me it is strange behaviour of the library because if context is canceled we should return error and there should be no logical connection between Handler and returning error.

I don't think this is accurate. HandleCancel doesn't change the return value of the query method.

If i want to disable HandleCancel (what i will do in case of pgbouncer) i will have to write a lot of boilerplated code with goroutine and context checking, and everybody with the same problem would have to do the same. Moreover this code would be less performing as we have to start goroutine on each request rather than just to start it on recovery process.

The default HandleCancel implementation has no changes in behavior from existing. It sets a deadline when the context is canceled and clears it at the end of the query method -- exactly as the current logic does.

It brings slight inconsistency because this api works only when context watcher is started but it is not working when context already canceled (but this point is not as important as first two)

It's true that this only happens when the context is canceled during the query, not when it is canceled before the query, but again, that is the same behavior that exists now. There is no need for canceling a query or recovering from an interruption because the query is never sent.

But if you are sure we are not going by recovery road and not giving it second chances than lets move to your idea.

I am leaning against adding recovery logic at this time. But the proposed HandleCancel logic doesn't directly affect that. In fact, I don't think there would even be any merge conflicts between the two PRs. HandleCancel is about avoiding the connection even getting in a state that it needs recovery. In the future it would still be possible to add recovery logic though my inclination is for it to be some sort of wrapper than part of the core connection logic.

zcolleen · 2024-02-05T23:21:08Z

I don't think this is accurate. HandleCancel doesn't change the return value of the query method.

I mean if i would redefine Handler, for example put long deadline to wait after cancel for the query to finish naturally, in this case i wont get any context canceled error.

Thats why i thought of expanding Handler interface just a little so that HandleUnwatchAfterCancel returns error and teach Unwatch to return error. With this we could at least get delayed error of context cancelation if we want to.
#1901 (Unwatch error is checked only in result reader and in multiresult reader but it could be added everywhere else)

What do you think?

jackc · 2024-02-06T02:18:46Z

So the problem is that if a handler delays interrupting the query such that the query is successfully completed that the query still returns a context error even though no error should be returned?

So the problem isn't that Handler changes the returned error -- the problem is it doesn't or can't change the returned error. Do I understand correctly?

zcolleen · 2024-02-06T21:25:30Z

Opposite. I made this short example to clarify everything https://github.com/zcolleen/pgx_test/blob/no-error-example/main.go .

In this example error is not returned while , as i see the correct behaviour of the library , there should be a context error .

jackc · 2024-02-07T01:22:45Z

Ah, I see now.

I actually don't think there should be an error. Returning the context error says the query failed because of a context cancellation. But the query did complete successfully. If the caller wants to know if the context is canceled they can check the context itself after the fact.

zcolleen · 2024-02-07T22:30:40Z

Returning the context error says the query failed because of a context cancellation. But the query did complete successfully.

Thats true on the one hand, but on the other it makes using library very inconvenient because user would have to check context after each query. Thats why i thought of adding error to HandleUnwatchAfterCancel() so that if you want to get error after context canceled, return it (you may return any other error, not only context canceled error). But if you dont want to get error, just return nil in HandleUnwatchAfterCancel (that is by default). Thats how i see this feature.

But anyway the best outcome for me is pr with recover because, while it lets the query finish, it keeps two core features:
the control is returned to the caller immediately and the context error is returned. Or even both of this prs, that would be perfect

jackc · 2024-02-09T02:37:06Z

Thats true on the one hand, but on the other it makes using library very inconvenient because user would have to check context after each query

I guess I still don't understand this use case. Can you give me an example or pseudo code where it would be useful to know that the context was canceled while the query method was executing in spite of the fact that the query method succeeded?

zcolleen · 2024-02-10T00:24:53Z

Sure https://github.com/zcolleen/pgx_test/blob/context-check-example/main.go .
If we could add error that is returned from rows.Err, than we wont have to check context after each query we make.

That is also why the pr with recover is better for this case, because if we return immediately after context is canceled, we are not holding any other resources such as grpc or http connection more than deadline in context.

jackc · 2024-02-10T23:18:13Z

Thanks, I see what's going on. But IMO the correct solution is for someLongMathOperationOrSomeLongExternalCall() to accept a context and handle its own cancellation. Otherwise, a context cancellation right as the database call is finished will still be missed.

zcolleen · 2024-02-11T23:12:10Z

the correct solution is for someLongMathOperationOrSomeLongExternalCall() to accept a context and handle its own cancellation

That is the correct one, but we do not always know that we have to handle cancellation on our own.

Imagine that someLongMathOperationOrSomeLongExternalCall is a call to redis/kafka/any other external resource (lets say redis.Set(ctx, "key", "val") for example). And you use some external library to call that resource. It could be easily that this library is not checking context before writing some data to (for example redis) socket. But you think that library handles context cancelation correctly.

As the developer of a wrapper above pgx which gives pre-configured pgx to work with bouncer, i wouldnt want that my wrapper code relies on code that my users would write after pgx query call. So if there would be an option to return error from pgx i would definitely use it, to return error to my wrapper users.

zcolleen · 2024-04-21T09:12:22Z

Hi @jackc! Any updates here? Have you made up your mind about the approach to this issue? Do you think we could merge this pr into some experemental branch and give it an experemental tag to see if its working good in production?

jackc · 2024-05-08T13:15:33Z

Any updates here? Have you made up your mind about the approach to this issue? Do you think we could merge this pr into some experemental branch and give it an experemental tag to see if its working good in production?

@zcolleen Unfortunately, I still think it is too difficult to reason about. I've spent enough hours tracking down subtle concurrency issues in pgx to error on the side of caution for future changes. Sorry, but I don't think this PR can be merged.

zcolleen · 2024-05-08T13:21:56Z

@jackc well, anyway thanks for discussion and spent time on this issue

zcolleen force-pushed the master branch 4 times, most recently from d60b138 to 8e99ffb Compare December 25, 2023 23:57

no closing connections mode

ba600c1

zcolleen force-pushed the master branch from 8e99ffb to ba600c1 Compare December 27, 2023 23:43

sleygin added 18 commits January 17, 2024 01:31

recover logic added

2193820

recover logic done

c838123

pgxpool recover logic added

0377f39

pgxpool recover tests added

5c950bb

pgxpool recover tests added

b47e124

pgxpool recover tests added

3e087d2

comments added

01c2ccd

tests fixed

6f7f1b6

some race fixes

b199193

race fixed

56e5ec5

race fixed

cc7f7cd

some test fixes

c9b46fa

some windows tests fixes

11a4bfc

some windows tests fixes

e4c2f81

some windows tests fixes

d141783

some windows tests fixes

0fea39b

some crdb tests fixes

d9c00b8

some crdb tests fixes

47fe27c

sleygin added 2 commits February 4, 2024 00:05

Merge github.com:jackc/pgx

183f87b

fix merge issues

78a3a82

sleep added to cancel request

c3be767

jackc closed this May 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No closing connections mode #1845

No closing connections mode #1845

zcolleen commented Dec 21, 2023

jackc commented Dec 23, 2023

zcolleen commented Jan 2, 2024

jackc commented Jan 13, 2024

zcolleen commented Jan 30, 2024

zcolleen commented Feb 3, 2024

jackc commented Feb 3, 2024

jackc commented Feb 4, 2024

zcolleen commented Feb 4, 2024

zcolleen commented Feb 4, 2024

jackc commented Feb 5, 2024

zcolleen commented Feb 5, 2024 •

edited

Loading

jackc commented Feb 6, 2024

zcolleen commented Feb 6, 2024

jackc commented Feb 7, 2024

zcolleen commented Feb 7, 2024

jackc commented Feb 9, 2024

zcolleen commented Feb 10, 2024

jackc commented Feb 10, 2024

zcolleen commented Feb 11, 2024

zcolleen commented Apr 21, 2024

jackc commented May 8, 2024

zcolleen commented May 8, 2024

No closing connections mode #1845

No closing connections mode #1845

Conversation

zcolleen commented Dec 21, 2023

jackc commented Dec 23, 2023

zcolleen commented Jan 2, 2024

jackc commented Jan 13, 2024

zcolleen commented Jan 30, 2024

zcolleen commented Feb 3, 2024

jackc commented Feb 3, 2024

jackc commented Feb 4, 2024

zcolleen commented Feb 4, 2024

zcolleen commented Feb 4, 2024

jackc commented Feb 5, 2024

zcolleen commented Feb 5, 2024 • edited Loading

jackc commented Feb 6, 2024

zcolleen commented Feb 6, 2024

jackc commented Feb 7, 2024

zcolleen commented Feb 7, 2024

jackc commented Feb 9, 2024

zcolleen commented Feb 10, 2024

jackc commented Feb 10, 2024

zcolleen commented Feb 11, 2024

zcolleen commented Apr 21, 2024

jackc commented May 8, 2024

zcolleen commented May 8, 2024

zcolleen commented Feb 5, 2024 •

edited

Loading