-
Notifications
You must be signed in to change notification settings - Fork 503
Fix: FATAL in function disconnect_client(): bad client state: 0 #846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Did you test with Shouldn't you Add a comment explaining why you're cleaning the link (extract the words from the commit message). It is faster to read the code than to find the exact change reference in the commit logs. |
During normal operation, cancel clients get closed because their linked server finished sending the cancel request. But this is not always the case. It's possible for the client to disconnect itself, which would cause the client object to be freed, before the linked server is. In those cases the client was not unlinked from the server, thus when the server was then closed it would try to close the already freed client again. This would then result in a fatal error. In passing this adds a test for cancel_wait_timeout. With the help of that test and some hacky code changes (including ones in postgres) I was able to reliably reproduce this issue.
Good call. I don't think I had cassert enabled. I enabled it now and reran my manual tests. All is good.
No, send_term is only part of the query protocol. The cancelation protocol would not understand it. It only expects 32 bits for the PID and 32bits for the secret, and then a connection close.
Done |
Ok.
Hmm. Ok. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…uncer#846) During normal operation, cancel clients get closed because their linked server finished sending the cancel request. But this is not always the case. It's possible for the client to disconnect itself, which would cause the client object to be freed, before the linked server is. In those cases the client was not unlinked from the server, thus when the server was then closed it would try to close the already freed client again. This would then result in a fatal error. In passing this adds a test for cancel_wait_timeout. With the help of that test and some hacky code changes (including ones in postgres) I was able to reliably reproduce this issue. (cherry picked from commit 3869a85)
During normal operation, cancel clients get closed because their linked server finished sending the cancel request. But this is not always the case. It's possible for the client to disconnect itself, which would cause the client object to be freed, before the linked server is. In those cases the client was not unlinked from the server, thus when the server was then closed it would try to close the already freed client again. This would then result in a fatal error. In passing this adds a test for cancel_wait_timeout. With the help of that test and some hacky code changes (including ones in postgres) I was able to reliably reproduce this issue. (cherry picked from commit 3869a85)
During normal operation, cancel clients get closed because their linked
server finished sending the cancel request. But this is not always the
case. It's possible for the client to disconnect itself, which would
cause the client object to be freed, before the linked server is. In
those cases the client was not unlinked from the server, thus when the
server was then closed it would try to close the already freed client
again. This would then result in a fatal error.
In passing this adds a test for cancel_wait_timeout. With the help of
that test and some hacky code changes (including ones in postgres) I was
able to reliably reproduce this issue. An actual test for this issue cannot
be implemented currently, because libpq (and thus also psycopg) requires
waiting for a cancel request to complete.
Fixes #801
Related to #717 and #815