Initial POC of RabbitMQ transport (WIP)#60775
Initial POC of RabbitMQ transport (WIP)#60775devkits wants to merge 3 commits intosaltstack:masterfrom
Conversation
|
Looking for high-level drive-by comments. Thanks. |
salt/cli/caller.py
Outdated
There was a problem hiding this comment.
question is this the right place for this? I'm not super familiar with the transport layer, but it seems weird to have a ZeroMQCaller for rabbitmq. Or is it just poor/out-of-date naming? 🤔
tests/pytests/functional/transport/rabbitmq/test_async_pub_channel.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
FWIW this can be moved to the file level. I think it's
pytest_marks = [pytest.mark.xfail(...)]
There are other examples throughout the test code
tests/pytests/functional/transport/rabbitmq/test_async_pub_channel.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
question I've seen this code in the test_async_pub_channel - would it be possible to refactor this somewhat?
Also, I wonder if it's possible to extract some/most/all of this functionality into the salt.transport.rabbitmq layer. If there's a lot of commonality between spinning up a pub/sub, especially if the only real difference is how the messages are handled.
There was a problem hiding this comment.
These tests were adopted from similar tests that test zeromq transport.
There's plenty of opportunity to refactor and eliminate some code duplication. Note: one school of thought on the Java side of things is that duplication in test code is more acceptable than product code. However, I'll try to refactor this more.
salt/transport/rabbitmq.py
Outdated
There was a problem hiding this comment.
I'm assuming this is the rabbitmq lib - if we end out making pika an optional dependency this will need a ImportError guard.
There was a problem hiding this comment.
Makes sense in general, but without pika this entire module is useless, so the import guard should probably be around salt.transport.rabbitmq
salt/transport/rabbitmq.py
Outdated
There was a problem hiding this comment.
suggestion __del__ is never guaranteed to be called in Python - if guarantees are desired then with blocks are required.
salt/transport/rabbitmq.py
Outdated
There was a problem hiding this comment.
suggestion same or suggestion here - ... = correlation_id or uuid4().hex. It won't call the function unless correlation_id is falsey.
salt/transport/rabbitmq.py
Outdated
There was a problem hiding this comment.
question would it also make sense to make the queue types configurable? I don't know a lot about rabbitmq so that might be a ridiculous suggestion 🙃
salt/transport/rabbitmq.py
Outdated
There was a problem hiding this comment.
If this is more than just WIP debugging, it may make sense to dump this to trace instead 🤔 Might warrant some further discussion about how noisy things should be. I'd probably check other transport implementations for guidance there.
There was a problem hiding this comment.
After I posted this review, I changed all log statements in rabbitmq.py to dump as INFO; pika library itself is very noisy and dumps a lot of stuff in DEBUG, to the point that it's hard to make sense of what is happening due to noise. So yes, we need to tweak the log levels before this review is ready, but for now the most useful log level for the POC seems to be INFO.
salt/transport/rabbitmq.py
Outdated
There was a problem hiding this comment.
note Not sure if it's desirable, but internal functions are (re)created each time the outer function is called, i.e. each call to _on_queue_bind will create a new copy of _callback_wrapper function. This is fine if it's only meant to be called once, but if the outer function is called 100x then you'll be creating 100x new instances of the function (that may or may not be discarded, depending on what references are kept 🙃 )
salt/transport/rabbitmq.py
Outdated
There was a problem hiding this comment.
note I have no clue if it comes into play for RMQ, but seeing this brought it to mind - I know that there are certain times that we want FIPS complicance, and we've got some flags for that. If that's a consideration for any type of crypts that RMQ needs to blocklist, then that would be something to account for.
salt/transport/rabbitmq.py
Outdated
There was a problem hiding this comment.
suggestion this could also follow the pattern.
io_loop = kwargs.get("io_loop") or tornado.ioloop.IOLoop.current()
typical dict.get fallback would have it as kwargs.get("io_loop", fallback_value) but unlike the or, or ternary statements, the arguments to the function are evaluated first, so if there's a cost associated with .current() then it will be paid every __new__.
salt/transport/rabbitmq.py
Outdated
There was a problem hiding this comment.
suggestion pretty sure that this can be replaced with loop_instance_map = cls.instance_map.setdefault(io_loop, wearef.WeakValueDictionary())
see >>> help(dict.setdefault) for more info.
salt/transport/rabbitmq.py
Outdated
There was a problem hiding this comment.
question would copy.deepcopy(opts) be more appropriate?
salt/transport/rabbitmq.py
Outdated
There was a problem hiding this comment.
suggestion or suggestion applies here as well.
There was a problem hiding this comment.
Sure. Note that some of this implementation came from zeromq.py. One of the next steps is to refactor common transport code into a base class.
salt/transport/rabbitmq.py
Outdated
There was a problem hiding this comment.
☝️ 😂 Yeah, that's exactly what I was talking about 🤣
salt/transport/rabbitmq.py
Outdated
There was a problem hiding this comment.
note I see a lot of tornado stuff, and I don't recall seeing a lot of tornado-based testing. I don't know if that's covered by the existing tests, but especially for things like authentication we should have some tests around that 👍
salt/transport/rabbitmq.py
Outdated
There was a problem hiding this comment.
question: do other things do anything with the send_queue? Perhaps it would make more sense to message = self.send_queue.pop(0)? (also, the collections module contains actual queues, if that would be a better approach here?)
salt/transport/rabbitmq.py
Outdated
There was a problem hiding this comment.
suggestion with the fallback, this if statement is unnecessary. The one just after this is already guarding for None, so... that's really all that's necessary.
(work-in-progress, but looking for early high-level feedback before the review gets even larger).
The new transport can be turned on by adding these lines to the corresponding master and minion config files:
transport: rabbitmq
transport_rabbitmq_address: localhost
transport_rabbitmq_auth: {username: user, password: bitnami}
Notes:
- RabbitMQ design considerations:
- username/password auth for now
- connection and channel reuse per thread
- RPC pattern is implemented with reply_to queues. Eventually we'll probably use a single queue per consumer and use correlation_id instead o a reply_to queue.
- fanout exchange is used for message brpadcast
- direct exchange is used when replying
- Mark tests that test rabbitmq transport as "pytest.mark.xfail". RMQ transport is in POC; so we kip RMQ tests (for now) until RMQ dependencies are dealt with in the CI/CD pipeline.
- there are a number of TODOs in this review; all should be addressed before review is ready for prime time
- there are gaps in the implementation that will be addressed. This work is tracked in SSC-978
- a number of edge cases need to be tested/addressed (e.g. connection error recovery, etc.)
- added functional tests
- added integration tests
Testing Done: functional tests with on master and one minion
Initial POC of RabbitMQ transport (work-in-progress, but looking for early feedback before the review gets too large).
- there are a number of TODOs in this review; all should be addressed before review is ready for prime time
- there are gaps in the implementation that will be addressed. This work is tracked in SSC-978
- a number of edge cases need to be tested/addressed (e.g. connection error recovery, etc.)
- added functional tests
- added integration tests
Testing Done: functional tests with on master and one minion
- Addressed some/most code review comments - Installed pre-commit hook and addressed issues identified (formatting, etc.) - Added connection recovery to RMQ*ConnectionWrapper* classes - Removed use of blocking connection in a separate thread. Now using non-blocking connection with custom io_loop Testing Done: functional tests with on master and one minion Testing Done: - added functional tests
0c1d5f2 to
2aaca69
Compare
…this review:
- Separated RMQ topology creation and consumption so that exchanges, queues, etc. can be created out-of-band. It is controlled by this config setting:
transport_rabbitmq_create_topology_ondemand: True
- Refactored POC so that messages are published to a "publisher exchange" and consumed from a "consumer exchange/consumer queue".
- RPC pattern (where publisher and consumer share the same connection) was refactored to use RMQ's "direct-reply-to" pattern (https://www.rabbitmq.com/direct-reply-to.html
- Moved RMQ topology object names and declaration metadata into config file. Config settings are as follows:
- transport_rabbitmq_publisher_exchange_name: salt_master_exchange
- transport_rabbitmq_publisher_exchange_declare_arguments:
- transport_rabbitmq_consumer_exchange_name: salt_master_exchange
- transport_rabbitmq_consumer_exchange_declare_arguments:
- transport_rabbitmq_consumer_queue_name: salt_minion_queue
- transport_rabbitmq_consumer_queue_declare_arguments: { "x-expires": 600000, "x-max-length": 10000, "x-queue-type": "quorum", "x-queue-mode": "lazy", "x-message-ttl": 259200000}
- Started using quorum queues with expiration and message-ttol of 72 hours, e.g. configurable queue declaration looks like this: { "x-expires": 600000, "x-max-length": 10000, "x-queue-type": "quorum", "x-queue-mode": "lazy", "x-message-ttl": 259200000}
- Added additional error checking for cases when RMQ topology does not exist when Salt tries to use it.
Testing Done:
- Updated and ran functional tests. Specifically:
- test_pub_server_channel.test_publish_to_pubserv_ipc
- test_async_pub_channel.test_rmq_connection_recovery
- test_async_pub_channel.test_publish_to_pubserv_ipc_async_collector
- Ran an e2e test of master sending a ping to a minion
What does this PR do?
Initial POC of RabbitMQ transport. The goal is to test out (internally) this new transport as a candidate for RaaS/Salt on SaaS and solicit feedback early feedback.
(work-in-progress, but looking for early high-level feedback before the review gets even larger).
The new transport can be turned on by adding these lines to the corresponding master and minion config files:
transport: rabbitmq
transport_rabbitmq_address: localhost
transport_rabbitmq_auth: {username: user, password: bitnami}
Notes:
RabbitMQ design considerations:
Mark tests that test rabbitmq transport as "pytest.mark.xfail". RMQ transport is in POC; so we kip RMQ tests (for now) until RMQ dependencies are dealt with in the CI/CD pipeline.
there are a number of TODOs in this review; all should be addressed before review is ready for prime time
there are gaps in the implementation that will be addressed. This work is tracked in SSC-978
a number of edge cases need to be tested/addressed (e.g. connection error recovery, etc.)
added functional tests
added integration tests
Testing Done: functional tests with on master and one minion
What issues does this PR fix or reference?
Fixes: VMware SSC-978
Merge requirements satisfied?
[NOTICE] Bug fixes or features added to Salt require tests.
Commits signed with GPG?
No
Please review Salt's Contributing Guide for best practices.
See GitHub's page on GPG signing for more information about signing commits with GPG.