Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finite LINGER for Zeromq blocks #1132

Closed
jaredd opened this issue Dec 6, 2016 · 3 comments
Closed

Finite LINGER for Zeromq blocks #1132

jaredd opened this issue Dec 6, 2016 · 3 comments
Labels

Comments

@jaredd
Copy link
Contributor

jaredd commented Dec 6, 2016

I have a dynamic distributed application where multiple flowgraphs communicate using zeromq blocks. It is often the case that from test to test the set of flowgraphs operating changes, requiring complete teardown of one flowgraph and construction of new one in its place. I cannot reliably produce a minimal example of this behavior, but almost surely with some number of reconfigurations (under 10) a flowgraph will fail to exit. The prevailing theory is that ZMQ is hanging during destruction due to unserved messages in its queue. The theory was somewhat validated by placing a finite LINGER sockopt in the zeromq base_impl constructor. After this change flowgraphs reliably cleaned up. LINGER seems like a reasonable option to make available to users.

@jmcorgan jmcorgan added the ZMQ label Dec 6, 2016
@jmcorgan
Copy link
Contributor

Can you do a pull request with your proposed changes?

@jaredd
Copy link
Contributor Author

jaredd commented Feb 15, 2017

Just a small update to this issue, when the GUI experienced a freeze when trying to teardown the flowgraph I attached to the process with gdb and got the following traceback which suggests that it is zmq waiting on unsent messages:
#0 0x00007f235424bfdd in poll () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007f22f9630a6a in ?? () from /usr/lib/x86_64-linux-gnu/libzmq.so.3
#2 0x00007f22f961d2f7 in ?? () from /usr/lib/x86_64-linux-gnu/libzmq.so.3
#3 0x00007f22f961463c in ?? () from /usr/lib/x86_64-linux-gnu/libzmq.so.3
#4 0x00007f22f9645388 in zmq_ctx_term () from /usr/lib/x86_64-linux-gnu/libzmq.so.3
#5 0x00007f22f987054e in close (this=0x37adbc0) at /usr/include/zmq.hpp:288
#6 ~context_t (this=0x37adbc0, __in_chrg=) at /usr/include/zmq.hpp:281
#7 gr::zeromq::base_impl::~base_impl (this=0x334e5f8, __vtt_parm=,
__in_chrg=) at /usr/local/src/gnuradio/gr-zeromq/lib/base_impl.cc:54
#8 0x00007f22f987cdb9 in ~base_sink_impl (
__vtt_parm=0x7f22f9a95778 <VTT for gr::zeromq::push_sink_impl+24>, this=,
__in_chrg=) at /usr/local/src/gnuradio/gr-zeromq/lib/base_impl.h:47
#9 ~push_sink_impl (this=0x334e5f0, __in_chrg=, __vtt_parm=)
at /usr/local/src/gnuradio/gr-zeromq/lib/push_sink_impl.h:34
#10 gr::zeromq::push_sink_impl::~push_sink_impl (this=0x334e5f0, __in_chrg=,
__vtt_parm=) at /usr/local/src/gnuradio/gr-zeromq/lib/push_sink_impl.h:34

Again, I cannot reliably reproduce this behavior. It appears to occur randomly after some number of flowgraph builds and teardowns.

@MattMills
Copy link
Contributor

MattMills commented Oct 17, 2020

Experiencing a similar issue, flowgraph teardowns during top_block.stop() are locking up processes, related portion of backtrace:

#0  0x00007efcd77efaff in __GI___poll (fds=0x7ffe480eee20, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007efc09e4e56b in ?? () from /usr/lib/x86_64-linux-gnu/libzmq.so.5
#2  0x00007efc09e292be in ?? () from /usr/lib/x86_64-linux-gnu/libzmq.so.5
#3  0x00007efc09e1715c in ?? () from /usr/lib/x86_64-linux-gnu/libzmq.so.5
#4  0x00007efc09e6f7ce in zmq_ctx_term () from /usr/lib/x86_64-linux-gnu/libzmq.so.5
#5  0x00007efc09eba413 in ?? () from /usr/lib/x86_64-linux-gnu/libgnuradio-zeromq.so.3.8.1
#6  0x00007efc09ecaf1d in ?? () from /usr/lib/x86_64-linux-gnu/libgnuradio-zeromq.so.3.8.1
#7  0x00007efcd690eeea in gr::edge::~edge() () from /usr/lib/x86_64-linux-gnu/libgnuradio-runtime.so.3.8.1
#8  0x00007efcd690f3cc in gr::flowgraph::clear() () from /usr/lib/x86_64-linux-gnu/libgnuradio-runtime.so.3.8.1
#9  0x00007efcd691c487 in gr::hier_block2_detail::disconnect_all() () from /usr/lib/x86_64-linux-gnu/libgnuradio-runtime.so.3.8.1
#10 0x00007efcd691ad81 in gr::hier_block2::~hier_block2() () from /usr/lib/x86_64-linux-gnu/libgnuradio-runtime.so.3.8.1
#11 0x00007efcd693f5ed in gr::top_block::~top_block() () from /usr/lib/x86_64-linux-gnu/libgnuradio-runtime.so.3.8.1
#12 0x00007efcd6a2e30a in ?? () from /usr/lib/python3/dist-packages/gnuradio/gr/_runtime_swig.so
#13 0x00007efcd69f8d37 in ?? () from /usr/lib/python3/dist-packages/gnuradio/gr/_runtime_swig.so
#14 0x00000000005d31e8 in _Py_Dealloc (op=<optimized out>) at ../Objects/object.c:2215
... remainder of backtrace excluded...

@jmcorgan I'm not a C dev, but I believe a const int zero = 0; and a d_socket.setsockopt(ZMQ_LINGER, &zero, sizeof(&zero)); would be appropriate somewhere around here: https://github.com/gnuradio/gnuradio/blob/master/gr-zeromq/lib/base_impl.cc#L74, according to the documentation for ZMQ_LINGER, http://api.zeromq.org/4-2:zmq-setsockopt#toc24, specifally:

A value of -1 specifies an infinite linger period. Pending messages shall not be discarded after a call to zmq_disconnect() or zmq_close(); attempting to terminate the socket's context with zmq_ctx_term() shall block until all pending messages have been sent to a peer.

I believe this is related to this behavior as well: zeromq/libzmq@90ea11c (as the actual timeout value in my backtrace is showing as -1, which would be consistent with the indefinite blocking behavior listed above).

Related test case showing example usage of zmq_setsockopt with ZMQ_LINGER: https://github.com/zeromq/libzmq/blob/master/tests/test_connect_rid.cpp#L62

MattMills added a commit to MattMills/gnuradio that referenced this issue Oct 19, 2020
Re: gnuradio#1132 the default value of ZMQ_Linger was incorrect in the ZMQ docs, and is currently -1, which causes sockets with pending messages to block indefinitely during tear down, causing flow graphs to lock up indefinitely during top_block.stop() if outbound data is pending.
MattMills added a commit to MattMills/gnuradio that referenced this issue Oct 27, 2020
Closes: gnuradio#1132
Per the ZMQ documentation update, the docs originally listed the default
of ZMQ_LINGER as 30 seconds, however the real default was -1.

This caused the behavior of blocking indefinitely on top_block.stop()
while the socket waited for abandoned messages to be read by a client.

Ideally this value should be configurable, I've opened gnuradio#3872 as follow up.
@mormj mormj closed this as completed in 15efb1e Oct 28, 2020
MattMills added a commit to MattMills/gnuradio that referenced this issue Nov 2, 2020
Backport of gnuradio#3866 to maint-3.8
Closes: gnuradio#1132
Per the ZMQ documentation update, the docs originally listed the default
of ZMQ_LINGER as 30 seconds, however the real default was -1.

This caused the behavior of blocking indefinitely on top_block.stop()
while the socket waited for abandoned messages to be read by a client.

Ideally this value should be configurable, I've opened gnuradio#3872 as follow up.
MattMills added a commit to MattMills/gnuradio that referenced this issue Nov 2, 2020
Backport of gnuradio#3866 to maint-3.8
Closes: gnuradio#1132
Per the ZMQ documentation update, the docs originally listed the default
of ZMQ_LINGER as 30 seconds, however the real default was -1.

This caused the behavior of blocking indefinitely on top_block.stop()
while the socket waited for abandoned messages to be read by a client.

Ideally this value should be configurable, I've opened gnuradio#3872 as follow up.
dkozel pushed a commit that referenced this issue Dec 16, 2020
Backport of #3866 to maint-3.8
Closes: #1132
Per the ZMQ documentation update, the docs originally listed the default
of ZMQ_LINGER as 30 seconds, however the real default was -1.

This caused the behavior of blocking indefinitely on top_block.stop()
while the socket waited for abandoned messages to be read by a client.

Ideally this value should be configurable, I've opened #3872 as follow up.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants