-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finite LINGER for Zeromq blocks #1132
Comments
Can you do a pull request with your proposed changes? |
Just a small update to this issue, when the GUI experienced a freeze when trying to teardown the flowgraph I attached to the process with gdb and got the following traceback which suggests that it is zmq waiting on unsent messages: Again, I cannot reliably reproduce this behavior. It appears to occur randomly after some number of flowgraph builds and teardowns. |
Experiencing a similar issue, flowgraph teardowns during top_block.stop() are locking up processes, related portion of backtrace:
@jmcorgan I'm not a C dev, but I believe a
I believe this is related to this behavior as well: zeromq/libzmq@90ea11c (as the actual timeout value in my backtrace is showing as -1, which would be consistent with the indefinite blocking behavior listed above). Related test case showing example usage of zmq_setsockopt with ZMQ_LINGER: https://github.com/zeromq/libzmq/blob/master/tests/test_connect_rid.cpp#L62 |
Re: gnuradio#1132 the default value of ZMQ_Linger was incorrect in the ZMQ docs, and is currently -1, which causes sockets with pending messages to block indefinitely during tear down, causing flow graphs to lock up indefinitely during top_block.stop() if outbound data is pending.
Closes: gnuradio#1132 Per the ZMQ documentation update, the docs originally listed the default of ZMQ_LINGER as 30 seconds, however the real default was -1. This caused the behavior of blocking indefinitely on top_block.stop() while the socket waited for abandoned messages to be read by a client. Ideally this value should be configurable, I've opened gnuradio#3872 as follow up.
Backport of gnuradio#3866 to maint-3.8 Closes: gnuradio#1132 Per the ZMQ documentation update, the docs originally listed the default of ZMQ_LINGER as 30 seconds, however the real default was -1. This caused the behavior of blocking indefinitely on top_block.stop() while the socket waited for abandoned messages to be read by a client. Ideally this value should be configurable, I've opened gnuradio#3872 as follow up.
Backport of gnuradio#3866 to maint-3.8 Closes: gnuradio#1132 Per the ZMQ documentation update, the docs originally listed the default of ZMQ_LINGER as 30 seconds, however the real default was -1. This caused the behavior of blocking indefinitely on top_block.stop() while the socket waited for abandoned messages to be read by a client. Ideally this value should be configurable, I've opened gnuradio#3872 as follow up.
Backport of #3866 to maint-3.8 Closes: #1132 Per the ZMQ documentation update, the docs originally listed the default of ZMQ_LINGER as 30 seconds, however the real default was -1. This caused the behavior of blocking indefinitely on top_block.stop() while the socket waited for abandoned messages to be read by a client. Ideally this value should be configurable, I've opened #3872 as follow up.
I have a dynamic distributed application where multiple flowgraphs communicate using zeromq blocks. It is often the case that from test to test the set of flowgraphs operating changes, requiring complete teardown of one flowgraph and construction of new one in its place. I cannot reliably produce a minimal example of this behavior, but almost surely with some number of reconfigurations (under 10) a flowgraph will fail to exit. The prevailing theory is that ZMQ is hanging during destruction due to unserved messages in its queue. The theory was somewhat validated by placing a finite LINGER sockopt in the zeromq base_impl constructor. After this change flowgraphs reliably cleaned up. LINGER seems like a reasonable option to make available to users.
The text was updated successfully, but these errors were encountered: