-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closing connection and/or channel hangs NotifyPublish is used #21
Comments
Could this be a duplicate of #10? |
Could be. Note that the above is related to closing connection/channel but the root cause might be the same. |
It still reproduces on main but I am confused as to what the test is trying to exhibit. It doesn't reproduce anywhere near as frequently as you said it does on your machine for me, but I was able to still get a deadlock and this stack dump from it. I think there's a race condition in the test between closing the channel and closing the connection, so the test itself is non-deterministic. It would be helpful if the test was a little more concise and clear in what the condition for the deadlock is. |
The example can be modified to run simpler tests, in addition to the original one:
Variation 2: In the "close connection" function, add
|
Here, you can find dumps of the full test (dump0_full_test.log), the variation 1 (dump1_no_close_channel_func.log) and variation 2 (dump2_120_second_before_conn_close.log) |
Hi, can you confirm that the issue is still present? I tried several time to execute it to reproduce it but never been able both with the handle goroutine active or not. Sometimes happens this panic which seems due to te test itself
|
Yes, the exact same test halts most of the time on my machine. Is there any additional information that I can provide to help with it? |
Do you think this issue is also related to #59? |
Hi @samyonr I don't think so, the issue in #59 happened just when there was a non-graceful termination of the connection (like a network issue) on that case the library was sending synchronously the error message back to the caller through the go-channel returned by the notifiedClose which in that case wasn't consuming causing the deadlock. In your case you are terminating gracefully the connection and you are not notified by the notifyClose. |
Hi @samyonr, I think I was able to reproduce this deadlock, even if for me wasn't really trivial, I had to look at the code and force it to happen. Apparently as you said the situation is indeed similar to #59 and it is due to the synchronous nature of the library when managing channels. When one goroutine is closing the rabbitmq channel the shutdown function of the rabbitmq channel needs to acquire the confirm mutex when it closes its confirm structure: Line 143 in 900561c
https://github.com/rabbitmq/amqp091-go/blob/main/confirms.go#L104 But if at the same time we are sending the confirmation in the channel and there is no-gorutine consuming the channel the confirm gets blocked here while acquiring the lock and preventing the Shutdown to continue: I think this is due because here:
Can happen that the closeChan get notified and return the function preventing the confirms to continue to be consumed. Could you try something similar to the example provided for @59 and see what happens?
|
Hi @DanielePalaia, thanks for investigating it. I've tried your example and it indeed fixes the issue. Also, the explanation makes much sense. Thanks. For me, the main conclusion here is that when dealing with [go] channels, better to keep listening to them while they are open (i.e. don't stop listening due to external events, like connection close. It's relevant when using this library and as a general approach). This is especially true for unbuffered channels, but relevant to buffered channels as well. |
Hi, we extended the doc to include this case in #68. This will be updated on the next release. Thanks! |
I'm using NotifyPublish to get confirms.
At the end of the run, I'm trying to close the connection and the associate channel, but it hangs indefinitely.
Here's a test to reproduce the issue (reproduces 80-90% of the time):
Here is the result:
=== RUN TestCloseHandBug
provider_test.go:125: connected
provider_test.go:224: sending. id=1, data=hello world!
provider_test.go:133: disconnecting
provider_test.go:187: closing channel. id=1
l <- confirmation
infunc (c *confirms) confirm(confirmation Confirmation)
c.destructor.Do(func() {
infunc (c *Connection) shutdown(err *Error)
, originating in connection.closec.destructor.Do(func() {
infunc (c *Connection) shutdown(err *Error)
, originating in channel.closec.m.Lock()
infunc (c *confirms) Close() error
(originating ingo c.shutdown(&Error{
)Notes:
=== RUN TestCloseHandBug
provider_test.go:125: connected
provider_test.go:225: sending. id=1, data=hello world!
provider_test.go:133: disconnecting
provider_test.go:187: closing channel. id=1
provider_test.go:142: disconnected
--- PASS: TestCloseHandBug (1.67s)
PASS
The text was updated successfully, but these errors were encountered: