-
Notifications
You must be signed in to change notification settings - Fork 843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tornado adapter: passive exchange.declare scenario where close hangs #945
Comments
Problem seems to have appeared between 0.9.14 and 0.10.0b2. Our code has not changed between those two releases. |
If you can provide a self-contained code sample and instructions to reproduce, that would help out. I'm not sure what you mean by "passively declare", for instance. If you find that |
Okay, I have a simplified test case here: https://gist.github.com/sjlongland/a8acc05e399ec1700d61fe11aba2f0f4 Basically, pypi releases 0.9.14 works on Python 2.7 only (not sure if it's Python 3 compatible, this could be a known issue); 0.10.0 works on both, 0.11.2 fails on both. I'm not seeing the connection getting closed due to double-ups on In our code base, it could also be a race condition, which is a hard thing to reproduce. This isn't the exact problem I saw in the original report, but on the other hand, when someone says |
@sjlongland I can reproduce this issue using your code and only 3 channels. Everything exits as expected when 2 channels are used. |
@sjlongland could you please edit your original description as it not the issue that your test code demonstrates? Another option would be to open a new GitHub issue for the |
Yeah, my other code base actually opens one channel per queue / exchange, an idea I got from QAMQP. It's a device driver which creates an exchange per device… and in the situation where we had the problem, there's nearly 200 devices. |
This is a tricky one, but I do see some cases with three channels where it runs successfully, and cases where it doesn't, with packet captures for each. If you're interested, I forked your test code here and addressed one bug in it ( |
If Pika receives a server-initiated Channel.Close method, it will not process blocked Channel.Close methods which could hang application code. See https://gist.github.com/lukebakken/ab49531e504dfd8f4c133fcb9994a7d7#file-testpika-py for code to reproduce. Fixes #945
@sjlongland if you'd like to test out my fix in the |
On 15/02/18 02:48, Luke Bakken wrote:
@sjlongland <https://github.com/sjlongland> if you'd like to test out my
fix in the |pika-945| branch
<https://github.com/pika/pika/tree/pika-945>, I would appreciate it.
Yep, I'll see if I can get that bundled into a Debian package and try it
out on the production instance affected.
It'll either work fine, or fail (safely), and in the latter case
roll-back is easy; we just install the python-pika package from the
official Debian repositories.
--
Stuart Longland (aka Redhatter, VK4MSL)
I haven't lost my mind...
...it's backed up on a tape somewhere.
|
I'm just trying that branch now… seems I get a 404 message looking for it. I'll try the |
The build breaks:
|
That's strange. What commands are you running? |
Well, the offending line appears to be this:
It doesn't get a chance to run anything else as it dies right there. |
@sjlongland since I am not seeing this error, nor do we see it on the build servers, I can't diagnose unless I know exactly what commands you're running in your environment. If you can provide those, great, otherwise there should be a beta release of |
Okay, we actually build The diff between this branch; and current
The other changes there concern a memory leak we had with The above branch gets built as a Debian package
That's just importing the Pika library, nothing else. That's where it fails. |
@sjlongland - could you add |
As per the advice in pika#945 (comment) This adds the `pika.compat` package to the distribution, which appears to have been absent in previous builds. Hopefully resolves upstream issue 945.
No worries, just doing a build now. |
Okay, looks like there's now an issue to address regarding the parameters we supply to
But otherwise, we're over the hump caused by |
I'll chase the above bug down in the morning, but I think now any further problems are small API changes that needs to be taken into account in my code. |
Thanks for letting me know, I will address the |
Fixes the issue raised here: #945 (comment)
Okay, so further testing today (on a dev instance, not production this time), it seems we're not out of the woods:
It seems declaring queues and exchanges is now broken because Python is confusing arguments.
and
I get the same thing if all arguments are keyword arguments (which is what I originally had). |
Okay, found that one easy enough…
It might be nice to have
Then |
PS @sjlongland thank you for all the testing you have been doing. |
No problems there… it's a package that my workplace depends rather heavily on, so it's in my interest to get things right. :-) |
Fixes the issue raised here: #945 (comment) (cherry picked from commit 48d8339)
…d while it's still running. Implement reenrancy tests IOLoopStartReentrancyNotAllowedTestSelect and IOLoopCloseBeforeStartReturnsTestSelect. Alexis Turpin (1): small typo in heartbeat Andrew Tonko (1): Aspell check fixes Aurélien Bompard (19): Add tests for the Twisted adapter Fix or ignore Pylint warnings Remove leftover classes Refactor the twisted connection classes TwistedChannel: don't use name-mangling Implement review suggestions The `disable-msg` tag is deprecated, use `disable` Handle expected disconnects The correct method to call to abort is loseConnection() Add a wrapper for the close() method on the connection Twisted: make the channel_closed method private Twisted: add some useful logging to ClosableDeferredQueue Improve the TwistedChannel class Don't conflict on attribute and method names Add logging in the Twisted example script Twisted example script: use confirm delivery Twisted: add a property for compatibility Fix Twisted tests on Python2 Twisted channel: relay the consumer_tags property Chris Laws (1): Fix examples for new API Daniil Fedotov (5): Examples to describe connection recovery. Indentation fixes, mention retry library in examples. Readme formatting Example of recovery with multiple hosts. Reduce duplication in connection recovery examples Darren Demicoli (15): BlockingConnection tune for heartbeat=0 properly Allowing heartbeat to be callable Fixing tests and adding new callable heartbeat test Fixing comments Fixing test for heartbeat checker Adding test for callable heartbeat parameter Rewording/fixing documentation strings Fixes to docstrings Further fixes to docstrings Adding assert in heartbeat callback and fixing comment Fix parameter test heartbeat callback function name Rework heartbeat callback checking logic Passing connection instance as first argument of heartbeat callback Fixing tests with connection instance and added ConnectionParameters() constructor test with heartbeat Fixing test error Gabriele Santomaggio (1): Change the rabbitmq-discuss link discussion Guillaume MICHEL (1): - Fix setup.py included packages Géry Ogam (24): Update the client and server configuration Change "SSL" to "TLS" Correct a typo, remove an unnecessary variable and print Update async_test_base.py Use positional args Use single quotes Use AMQPS port Change certificate and private key paths to that of the pika/testdata directory Remove periods in initialisms Correct the implementation of comparison methods Correct the implementation of comparison methods Add comparison methods to the Parameters class Replace type function with __class__ attribute for Python 2 classic-style classes Add and improve comparison unit tests for connection parameters and credentials classes, correct instance checks in comparison methods Invert operand order Add comparison tests for the _Timeout class, remove implicitly defined __le__ method by total_ordering, correct my erroneous inversion of operands for the __lt__ method of the _Timeout class that made select_connection tests fail Implement the 6 comparison methods for the Timeout class instead of relying on functools.total_ordering which is not properly implemented on Python 2 and made the Python 2 tests fail Update README.rst Update README.rst Update README.rst Update README.rst Update README.rst Update README.rst Improve docstrings consistency Jeremy Cline (1): Add an example for Twisted authentication with x509 certs Kamil Gałuszka (1): chore: add AsyncioConnection to pika package Luis Fernando Gomes (1): Fix setup package missing Luke Bakken (117): Consistently validate callback and nowait parameters (#941) Add method completion callback for basic.consume (#954) Ensure prefetch_size and prefetch_count are 0 or greater (#948) Ensure blocked Channel.Close methods are processed Document why self._blocking is checked and use isinstance() to determine type of method Add test for pika/pika#945. Without call to _drain_blocked_methods_on_remote_close, this new test will hang Add comments Add @vitaly-krugl documentation from PR Update comments from PR review Minor test changes pika.compat should be a module, not package Remove sudo requirement, use latest RabbitMQ Use variables for RabbitMQ version, enable caching Changes from PR review Fail fast if nosetests fails Minor changes from PR review Add a few more URLParameters tests Use Travis CI RabbitMQ until this is fixed: travis-ci/travis-ci#6302 Revert "Use Travis CI RabbitMQ until this is fixed: travis-ci/travis-ci#6302" Blocked method should not prevent closing a channel Ensure exceptions in marshaling happen earlier Ensure body frames are marshaled prior to output. Drain blocked methods differently on client-requested close Fix test broken from rebasing against master Fix unit tests broken by PR merge that had not been rebased on master Remove time.sleep call Raise exception when all _fd_events arrays are empty Remove use of _thread_id to check to see if poller should be woken up Add NOTE to BlockingConnection.add_callback_threadsafe docstring PR review updates Modify HeartbeatChecker tests to test that half the heartbeat interval is used within HeartbeatChecker, and implement change Add Pika version to connection logging Fixup for #1055 Copy example from stable branch Use ack_message from README Undo changes not related to this PR Changes from PR #1057 review PR 1072 review changes PR review change Heartbeat will use two timers, set up unit tests Separate intervals for sending and checking heartbeats. pylint PR review PR review fixes. Use "timeout" parameter name as suggested (patiently!) by @vitaly-krugl Pika heartbeat sending and checking now behaves exactly as the Java client Add comment with link to Bunny implementation for heartbeat connection check interval Rather than double the negotiated timeout to calculate the check interval, use 5 additional seconds pylint Pin Erlang at 20.3 because the newest from ESL (21) does not work with RMQ pylint Remove TwistedConnection Remove TODOs as catching Exception seems fine in that case Remove one more TODO Ensure that mis-passed tuples are formatted correctly Update erlang and RabbitMQ version Add CHANGELOG Better parsing of URL parameter "ssl_options" spelling Fix test missed in merge de1b40e Validate parameters to basic_consume Validate parameters to queue_declare Validate parameters to exchange_declare Validate parameters to queue_bind Add TODO of API changes Validate parameters to basic_cancel Validate parameters to basic_get Move validation functions to their own module. Use validation code in validators module Various renames and migration of validation code to pika/validators Add test for validation of basic_qos parameters Add test for validation of basic_return parameters Add test for validation of confirm_delivery parameters Add test for validation of exchange_bind, exchange_delete, exchange_unbind and flow legacy parameters Add test for validation of queue_delete, queue_unbind and queue_purge parameters yapf formatting 1.0.0b2 preparation Remove warnings when invalid SSL URL params are used Rename all_channels to global_qos Add validation for exchange_bind Additional changes from #1112 Additional changes from #1112 Remove deprecated connection parameters Remove deprecated immediate flag Remove legacy basic_publish and rename publish to basic_publish Remove deprecated DEFAULT_HEARTBEAT_INTERVAL variable Remove deprecated heartbeat_interval parameter Remove what appears to be the last of the deprecated stuff Update async consumer to back off on reconnect Ensure that codegen translates "global" into "global_qos" Re-generate spec.py Remove :py:class Use RabbitMQ 3.7.11 Remove _adapter_get_write_buffer_size from Connection Rename _adapter_get_write_buffer_size to _get_write_buffer_size Fix reconnect in async consumer example Make the GitHub issue template more to the point Update the issue template since it does not use mono font Remove extra whitespace Make pollers non-reentrant yapf formatting Add calls to basic_qos in examples where appropriate Fix tests and always allow close() to be called Run yapf and pylint on pika/adapters code Run yapf and pylint on code pylint changes yapf and pylint changes yapf and pylint changes yapf and pylint changes yapf and pylint changes yapf and pylint changes yapf and pylint changes yapf and pylint changes Run pylint and yapf on examples Rename add_timeout to call_later Use monotonic time where supported 1.0.0b2 Merge stable to master in progress 1.0.0 CHANGELOG Marat Sharafutdinov (6): Rename Qos.global_ to Qos.global_qos Fix augmented assignments Fix arguments with mutable default values Remove duplicate test Remove redundant parentheses Fix prints Martin Åsell (6): Add support for type "x" Added test, added encode Python 2.7 fixture, bytes is str Fixture Reformat FIELD_TBL_ENCODED Added test for decoding bytes Maxim (1): Tornado Consumer example in docs is fixed Michael Klishin (8): Rename no_ack => auto_ack to match Java and .NET clients (#955) Bump default socket connection timeout to 10s Wording, recovery example usability improvements Don't set prefetch for examples that use automatic ack mode Link to the acknowledgement mode guide here Use RabbitMQ 3.7.12 Wording Travis: use RabbitMQ 3.7.13 Rasmus Wriedt Larsen (1): Consistently use error log level when connecting fails (#952) Tamu (1): Correct small typo Tim Abbott (1): pika: Don't automatically import twisted and tornado. Vitaly Kruglikov (106): Implemented BlockingConnection test TestViabilityOfMultipleTimeoutsWithSameDeadlineAndCallback Implemented async connection test case TestViabilityOfMultipleTimeoutsWithSameDeadlineAndCallback Fix timer id collision in SelectConnection. Added debug code in add_timeout to display current time Use heap to manage timeouts in select_connection adapter. Fix up pylint warnings Log unexpected kevent in KQueuePoller._map_event fixup Fix merge error Move timer to select_connection's IOLoop fixup Fix invalid syntax Fixed incorrect calls to get_wait_seconds Add get_wait_seconds, process_timeouts args to PollPoller constructor signature. Fix _get_poller args in select_connection_ioloop_tests.py Implemented unit tests for select_connection's _Timeout and _Timer classes. Resolve pylint findings in new unit test Fix minor code formatting Added newline at the end Use weak references to save callbacks passed by select_connection's IOLoop to the poller in order to prevent cyclical references that interfere with garbage collection. Don't use weakref.proxy to store references to callback instance methods, since instance methods have temporary references Fix up select_connection._Timer._num_cancellations when processing tmeouts Minor comment fix Implement thread-safe IOLoop.add_callback() in select_connection adapter Fleshed out `close()` methods in select_connection's IOLoop and friends. Flesh out BlockingConnection.add_callback_threadsafe(); Fix blocking connection unit tests broken by prior commit. Implemented BlockingConnection acceptance tests for add_callback_threadsafe Implemented BlockingConnection acceptance test TestBasicConsumeWithAckFromAnotherThread Instrumented add_callback_threadsafe and TestBasicConsumeWithAckFromAnotherThread with debug logging to help debug test failure. Add missing timer.start() call to request message ACK Implemented BlockingConnection acceptance test TestConsumeGeneratorWithAckFromAnotherThread Use BlockingChannel.cancel() instead of basic_cancel to cancel the generator-based consumer. Fixed call to BlockingChannel.cancel() from TestConsumeGeneratorWithAckFromAnotherThread Implemented async adapter acceptance tests for add_callback_threadsafe. Release ioloop resources after class AsyncTestCase returns from ioloop.start. Fixed broken comparison of string with message body on python3 in the new "ack from other thread" BlockingConnection tests. Close the connection's ioloop before setting self.connection to None in AsyncTestCase. Close async adapter's ioloop from finally after start exits in AsyncTestCase. Add close method in asyncio_connection's IOLoopAdapter Provide async acceptance tests with a private IOLoop for each adapter so that Compare message bodies to byte literals for compatibility with python3 Close ioloops in tests to prevent resource leaks Close select_connection pollers in a couple more tests to eliminate corresponding RresourceWarning messages when running Python3 tests Implemented unit tests for select_connection's _Timer.close(), IOLoop.close(), poller close methods. Remove comment promising thread-safety from select_connection's IOLoop.stop and _PollerBase.stop. Emit warning from select_connection's IOLoop.stop() method when called add unittest cleanup callbacks to cancel threading.Timer instances in newly-added tests. Implemented BlockingConnection acceptance test TestConsumeGeneratorCancelEncountersCancelFromBroker Trying to get Basic.Cancel from server in TestConsumeGeneratorCancelEncountersCancelFromBroker Implemented TestConsumeGeneratorInterruptedByCancelFromBroker and simplified Eliminate the `stop_ioloop_on_close` arg, which was unnecessary and even error-prone feature Fixed check in test_tornado_connection_call_parent. Added mention of `add_callback_threadsafe()` in the FAQ. Revert "Fixed check in test_tornado_connection_call_parent." accidentally committed to wrong branch Fixed assertion in test_tornado_connection_call_parent to account for the removed arg stop_ioloop_on_close. In select_connection.py, raise AssertionError if loop is being Fixed infinite recursion in new tests that override AsyncTestCase._instantiate_connection() In `Connection.close`, don't call `self._close_channels()` if there are In TestAddCallbackThreadsafeRequestBeforeIOLoopStarts, don't attempt Facilitate safe cleanup after ioloop stops running in async adapter TestIOLoopStopBeforeIOLoopStarts acceptance test. Improved arg documentation of some callbacks in connection.py and channel.py. Renamed BaseConnection._handle_events to BaseConnection._handle_connection_sock_events Removed unused legacy arg write_only from BaseConnection._handle_connection_sock_events() Removed unused legacy `error` arg from BaseConnection._handle_connection_sock_events Cleaning up socket error handling. Just log info upon ERROR event in _handle_connection_sock_events. Attempt to fix leaking of a socket in async_adapter_tests by allowing Renamed _handle_connection_sock_error to _handle_connection_socket_error Added TODO note about the use of read instead of recv on SSL socket. Fix up minor pylint warnings. Pass connection instance as first arg to user callbacks registered via Facilitate dispatch of pending events in TestConnectionRegisterForBlockAndUnblock Facilitate dispatch of pending events in TestConnectionRegisterForBlockAndUnblock Use standard ssl.SSLContext instead of homegrown values in pika.SSLOptions. In URLParameters, use ssl.SSLSocket class instead of ssl.wrap_socket function to parse In test_good_parameters, cast ssl.PROTOCOL_SSLv23 and other enums to int to work around Add urlparse and url_parse_qs to pica.compat and substitute uses in code. Implemented BlockingConnection tests that exercise consumer behavior on incoming Channel.Close and Cancel from broker. Rename AsyncTestCase.should_test_tls() to AsyncTestCase.should_use_tls() Fix issue #708: don't perform graceful AMQP Connection.Close upon heartbeat timeout. Temporarily disabled failfast to see what else may break on CI. Revert "Temporarily disabled failfast to see what else may break on CI." Summary of New features: Implemented connection timeout tests in async_adapter_tests.py. Updated TwistedProtocolConnection to match the current BaseConnection interface. Still no clue whether it works since Added blurb in README concerning connection retries when using multiple connection params instances. Remove vestages of knowledge about "connection workflow" from Connection class. It shoud all be abstracted by the adapter. Rename _adapter_disconnect_stack, _adapter_disconnect, and friends for consistency. Rebase TwistedProtocolConnection on Connection instead of BaseConnection since Gracefully close connection in TestCreateConnectionViaDefaultConnectionWorkflow Fixed socketpair leak in test BlockingConnectionTests.test_process_io_for_connection_setup() Referenced TwistedProtocolConnection as an example of Updated on_open_error_callback and on_close_callback connection callbacks in examples In asyncio_consumer.rst example, call self._connection.ioloop.run_forever() instead Implemented async_adapter_tests.TestClosingAChannelPermitsBlockedRequestToComplete. Implemented unit tests: channel_tests.test_no_side_effects_from_send_method_error() and Backward-incompatible improvements: Fix race condition between BlockingConnection.add_callback_threasafe() and BlockingConnection._cleanup(). Backward-incompatible cleanup for v1.0.0: Backward-incompatible change for pika v1.0.0 Added note abot the limit of what we can do in BlockingConnection.add_callback_threadsafe(). Fixed indentation and enhanced description in README section "Extending to support additional I/O frameworks". Fixed issue 1036 - access to nonexistent attribute Connection.heartbeat (it was changed to _hearbeat_checker) Updated thread-safety disclosure in README. Enhance documentation of add_on_connection_blocked_callback and add_on_connection_unblocked_callback. Assert if select_connection's poller close is called while it's still running. Implement reenrancy tests IOLoopStartReentrancyNotAllowedTestSelect and IOLoopCloseBeforeStartReturnsTestSelect. elbaro (1): Typo meartbeats - heartbeats
Updated bug description:
When a channel is closed due to a passive declaration of an exchange that does not exist, processing of the resulting channel closure hangs.
Original description (kept so the context of earlier comments is not lost):
Not sure exactly when it started happening, but lately, we've been getting this message, a lot, when an AMQP client tries to passively declare a number of exchanges that are not present.
Correct behaviour is that the AMQP broker closes the channel with a "not found" exception, and our code retries opening the channel. It seems
pika
trips up on the latter bit, confusing the AMQP broker which then responds by closing the entire connection.We're using the Tornado client adapter on Python 2.7.
I'll also point out that the "Connection is closed but not stopping IOLoop" is annoying; the
IOLoop
does not belong topika
, so it isn'tpika
's place to say whether theIOLoop
should be running or stopped.The text was updated successfully, but these errors were encountered: