Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running RabbitMQ on Docker Swarm using DNS Discovery #1454

Closed
RehanSaeed opened this issue Dec 18, 2017 · 3 comments
Closed

Running RabbitMQ on Docker Swarm using DNS Discovery #1454

RehanSaeed opened this issue Dec 18, 2017 · 3 comments
Labels
mailing list material This belongs to the mailing list (rabbitmq-users on Google Groups) not-enough-information Please provide more information by starting a mailing list thread.

Comments

@RehanSaeed
Copy link

I am trying to run RabbitMQ 3.7.0 on Docker Swarm using DNS based discovery. My RabbitMQ instances start but are unable to discover each other. Here is what I've tried:

rabbitmq.conf

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_dns
cluster_formation.dns.hostname = rabbitmq

docker-stack.yml

version: '3.3'

services:
  rabbitmq:
    image: rabbitmq:3.7.0
    deploy:
      replicas: 3
    environment:
      - RABBITMQ_DEFAULT_USER=admin
      - RABBITMQ_DEFAULT_PASS=password
      - RABBITMQ_ERLANG_COOKIE=foobar
      - RABBITMQ_USE_LONGNAME=true
    #healthcheck:
    #  test: ["rabbitmqctl", "node_health_check"]
    #  interval: 60s
    #  timeout: 5s
    #  retries: 3
    networks:
      - myoverlay
    ports:
      - mode: host
        target: 25672
        published: 25672
      - mode: host
        target: 15672
        published: 15672
      - mode: host
        target: 5672
        published: 5672
      - mode: host
        target: 4369
        published: 4369

networks:
  myoverlay:

Container Logs

2017-12-18 12:23:00.377 [info] <0.184.0>
 node           : rabbit@b133becbbdce
 home dir       : /var/lib/rabbitmq
 config file(s) : /etc/rabbitmq/rabbitmq.conf
 cookie hash    : Dt5w5AHl+BRD0+/VlXZGuQ==
 log(s)         : <stdout>
 database dir   : /var/lib/rabbitmq/mnesia/rabbit@b133becbbdce
2017-12-18 12:23:02.471 [info] <0.196.0> Memory high watermark set to 1578 MiB (1655075635 bytes) of 3946 MiB (4137689088 bytes) total
2017-12-18 12:23:02.475 [info] <0.198.0> Enabling free disk space monitoring
2017-12-18 12:23:02.475 [info] <0.198.0> Disk free limit set to 50MB
2017-12-18 12:23:02.479 [info] <0.200.0> Limiting to approx 1048476 file handles (943626 sockets)
2017-12-18 12:23:02.479 [info] <0.201.0> FHC read buffering:  OFF
2017-12-18 12:23:02.479 [info] <0.201.0> FHC write buffering: ON
2017-12-18 12:23:02.480 [info] <0.184.0> Node database directory at /var/lib/rabbitmq/mnesia/rabbit@b133becbbdce is empty. Assuming we need to join an existing cluster or initialise from scratch...
2017-12-18 12:23:02.480 [info] <0.184.0> Configured peer discovery backend: rabbit_peer_discovery_dns
2017-12-18 12:23:02.480 [info] <0.184.0> Will try to lock with peer discovery backend rabbit_peer_discovery_dns
2017-12-18 12:23:02.480 [info] <0.184.0> Peer discovery backend rabbit_peer_discovery_dns does not support registration, skipping randomized startup delay.
2017-12-18 12:23:02.481 [info] <0.184.0> Addresses discovered via A records of rabbitmq: 10.0.1.22, 10.0.1.7, 10.0.1.12
2017-12-18 12:23:02.484 [info] <0.184.0> Addresses discovered via AAAA records of rabbitmq:
2017-12-18 12:23:02.484 [info] <0.184.0> All discovered existing cluster peers: rabbit@bridge-apps-test_rabbitmq.3.txyqdhq5tag4ypvqxmrrv7xux.bridge-apps-test_bridgeoverlay, rabbit@9d08e408217e.bridge-apps-test_bridgeoverlay, rabbit@b133becbbdce
2017-12-18 12:23:02.484 [info] <0.184.0> Peer nodes we can cluster with: rabbit@bridge-apps-test_rabbitmq.3.txyqdhq5tag4ypvqxmrrv7xux.bridge-apps-test_bridgeoverlay, rabbit@9d08e408217e.bridge-apps-test_bridgeoverlay
2017-12-18 12:23:02.490 [warning] <0.184.0> Could not auto-cluster with node rabbit@bridge-apps-test_rabbitmq.3.txyqdhq5tag4ypvqxmrrv7xux.bridge-apps-test_bridgeoverlay: {badrpc,nodedown}
2017-12-18 12:23:02.496 [warning] <0.184.0> Could not auto-cluster with node rabbit@9d08e408217e.bridge-apps-test_bridgeoverlay: {badrpc,nodedown}
2017-12-18 12:23:02.496 [warning] <0.184.0> Could not successfully contact any node of: rabbit@bridge-apps-test_rabbitmq.3.txyqdhq5tag4ypvqxmrrv7xux.bridge-apps-test_bridgeoverlay,rabbit@9d08e408217e.bridge-apps-test_bridgeoverlay (as in Erlang distribution). Starting as a blank standalone node...
2017-12-18 12:23:02.499 [info] <0.33.0> Application mnesia exited with reason: stopped
2017-12-18 12:23:02.516 [info] <0.33.0> Application mnesia started on node rabbit@b133becbbdce
2017-12-18 12:23:02.590 [info] <0.184.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2017-12-18 12:23:02.622 [info] <0.184.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2017-12-18 12:23:02.653 [info] <0.184.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2017-12-18 12:23:02.654 [info] <0.184.0> Peer discovery backend rabbit_peer_discovery_dns does not support registration, skipping registration.
2017-12-18 12:23:02.655 [info] <0.184.0> Priority queues enabled, real BQ is rabbit_variable_queue
2017-12-18 12:23:02.659 [info] <0.373.0> Starting rabbit_node_monitor
2017-12-18 12:23:02.687 [info] <0.184.0> message_store upgrades: 1 to apply
2017-12-18 12:23:02.687 [info] <0.184.0> message_store upgrades: Applying rabbit_variable_queue:move_messages_to_vhost_store
2017-12-18 12:23:02.687 [info] <0.184.0> message_store upgrades: No durable queues found. Skipping message store migration
2017-12-18 12:23:02.688 [info] <0.184.0> message_store upgrades: Removing the old message store data
2017-12-18 12:23:02.688 [info] <0.184.0> message_store upgrades: All upgrades applied successfully
2017-12-18 12:23:02.721 [info] <0.184.0> Management plugin: using rates mode 'basic'
2017-12-18 12:23:02.722 [info] <0.184.0> Adding vhost '/'
2017-12-18 12:23:02.740 [info] <0.407.0> Making sure data directory '/var/lib/rabbitmq/mnesia/rabbit@b133becbbdce/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L' for vhost '/' exists
2017-12-18 12:23:02.745 [info] <0.407.0> Starting message stores for vhost '/'
2017-12-18 12:23:02.746 [info] <0.411.0> Message store "628WB79CIFDYO9LJI6DKMI09L/msg_store_transient": using rabbit_msg_store_ets_index to provide index
2017-12-18 12:23:02.747 [info] <0.407.0> Started message store of type transient for vhost '/'
2017-12-18 12:23:02.747 [info] <0.414.0> Message store "628WB79CIFDYO9LJI6DKMI09L/msg_store_persistent": using rabbit_msg_store_ets_index to provide index
2017-12-18 12:23:02.748 [warning] <0.414.0> Message store "628WB79CIFDYO9LJI6DKMI09L/msg_store_persistent": rebuilding indices from scratch
2017-12-18 12:23:02.749 [info] <0.407.0> Started message store of type persistent for vhost '/'
2017-12-18 12:23:02.750 [info] <0.184.0> Creating user 'admin'
2017-12-18 12:23:02.753 [info] <0.184.0> Setting user tags for user 'admin' to [administrator]
2017-12-18 12:23:02.755 [info] <0.184.0> Setting permissions for 'admin' in '/' to '.*', '.*', '.*'
2017-12-18 12:23:02.762 [info] <0.454.0> started TCP Listener on [::]:5672
2017-12-18 12:23:02.766 [info] <0.184.0> Setting up a table for connection tracking on this node: tracked_connection_on_node_rabbit@b133becbbdce
2017-12-18 12:23:02.769 [info] <0.184.0> Setting up a table for per-vhost connection counting on this node: tracked_connection_per_vhost_on_node_rabbit@b133becbbdce
2017-12-18 12:23:02.770 [info] <0.33.0> Application rabbit started on node rabbit@b133becbbdce
2017-12-18 12:23:02.770 [info] <0.33.0> Application amqp_client started on node rabbit@b133becbbdce
2017-12-18 12:23:02.770 [info] <0.33.0> Application cowboy started on node rabbit@b133becbbdce
2017-12-18 12:23:02.771 [info] <0.33.0> Application rabbitmq_web_dispatch started on node rabbit@b133becbbdce
2017-12-18 12:23:02.773 [info] <0.33.0> Application rabbitmq_management_agent started on node rabbit@b133becbbdce
2017-12-18 12:23:02.802 [info] <0.520.0> Management plugin started. Port: 15672
2017-12-18 12:23:02.802 [info] <0.626.0> Statistics database started.
 completed with 3 plugins.
2017-12-18 12:23:03.093 [info] <0.5.0> Server startup complete; 3 plugins started.
 * rabbitmq_management
 * rabbitmq_management_agent
 * rabbitmq_web_dispatch
2017-12-18 12:23:03.142 [warning] <0.32.0> lager_error_logger_h dropped 91 messages in the last second that exceeded the limit of 100 messages/sec
2017-12-18 12:23:17.812 [info] <0.740.0> accepting AMQP connection <0.740.0> (10.2.0.164:53602 -> 172.18.0.7:5672)
2017-12-18 12:23:17.812 [warning] <0.740.0> closing AMQP connection <0.740.0> (10.2.0.164:53602 -> 172.18.0.7:5672):

According to the logs, it is able to doscover the other nodes in my Docker Swarm but is unable to connect to the FQDN's. Interestingly, when I run hostname inside the RabbitMQ containers they do not return the FQDN but the short name:

rabbitmq@b133becbbdce:/$ hostname
b133becbbdce
rabbitmq@b133becbbdce:/$ hostname -f
b133becbbdce

Have the RabbitMQ team got RabbitMQ working on Docker Swarm? Is this a supported scenario? I'm assuming you'd want it to be. There are docs for Kubernetes but not Swarm.

@michaelklishin
Copy link
Member

Thank you for your time.

Team RabbitMQ uses GitHub issues for specific actionable items engineers can work on. This assumes two things:

  1. GitHub issues are not used for questions, investigations, root cause analysis, discussions of potential issues, etc (as defined by this team)
  2. We have a certain amount of information to work with

We get at least a dozen of questions through various venues every single day, often quite light on details.
At that rate GitHub issues can very quickly turn into a something impossible to navigate and make sense of even for our team. Because of that questions, investigations, root cause analysis, discussions of potential features are all considered to be mailing list material by our team. Please post this to rabbitmq-users.

Getting all the details necessary to reproduce an issue, make a conclusion or even form a hypothesis about what's happening can take a fair amount of time. Our team is multiple orders of magnitude smaller than the RabbitMQ community. Please help others help you by providing a way to reproduce the behavior you're
observing, or at least sharing as much relevant information as possible on the list:

  • Server, client library and plugin (if applicable) versions used
  • Server logs
  • A code example or terminal transcript that can be used to reproduce
  • Full exception stack traces (not a single line message)
  • rabbitmqctl status (and, if possible, rabbitmqctl environment output)
  • Other relevant things about the environment and workload, e.g. a traffic capture

Feel free to edit out hostnames and other potentially sensitive information.

When/if we have enough details and evidence we'd be happy to file a new issue.

Thank you.

@michaelklishin
Copy link
Member

DNS-based discovery requires that the hostname specified in the config resolves to at least one A or AAAA record that can a reverse DNS resolution query would work on. That's it. Docker Swarm doesn't change the way DNS works, so it shouldn't matter.

How you manage DNS records in a Docker environment is a question that is in no way specific to RabbitMQ.

@michaelklishin
Copy link
Member

2017-12-18 12:23:02.480 [info] <0.184.0> Peer discovery backend rabbit_peer_discovery_dns does not support registration, skipping randomized startup delay.
2017-12-18 12:23:02.481 [info] <0.184.0> Addresses discovered via A records of rabbitmq: 10.0.1.22, 10.0.1.7, 10.0.1.12
2017-12-18 12:23:02.484 [info] <0.184.0> Addresses discovered via AAAA records of rabbitmq:
2017-12-18 12:23:02.484 [info] <0.184.0> All discovered existing cluster peers: rabbit@bridge-apps-test_rabbitmq.3.txyqdhq5tag4ypvqxmrrv7xux.bridge-apps-test_bridgeoverlay, rabbit@9d08e408217e.bridge-apps-test_bridgeoverlay, rabbit@b133becbbdce
2017-12-18 12:23:02.484 [info] <0.184.0> Peer nodes we can cluster with: rabbit@bridge-apps-test_rabbitmq.3.txyqdhq5tag4ypvqxmrrv7xux.bridge-apps-test_bridgeoverlay, rabbit@9d08e408217e.bridge-apps-test_bridgeoverlay
2017-12-18 12:23:02.490 [warning] <0.184.0> Could not auto-cluster with node rabbit@bridge-apps-test_rabbitmq.3.txyqdhq5tag4ypvqxmrrv7xux.bridge-apps-test_bridgeoverlay: {badrpc,nodedown}
2017-12-18 12:23:02.496 [warning] <0.184.0> Could not auto-cluster with node rabbit@9d08e408217e.bridge-apps-test_bridgeoverlay: {badrpc,nodedown}
2017-12-18 12:23:02.496 [warning] <0.184.0> Could not successfully contact any node of: rabbit@bridge-apps-test_rabbitmq.3.txyqdhq5tag4ypvqxmrrv7xux.bridge-apps-test_bridgeoverlay,rabbit@9d08e408217e.bridge-apps-test_bridgeoverlay (as in Erlang distribution). Starting as a blank standalone node...

are the relevant log lines. Peer discovery does success but cluster formation doesn't.

@michaelklishin michaelklishin added not-enough-information Please provide more information by starting a mailing list thread. mailing list material This belongs to the mailing list (rabbitmq-users on Google Groups) labels Dec 18, 2017
@rabbitmq rabbitmq locked as off-topic and limited conversation to collaborators Dec 18, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
mailing list material This belongs to the mailing list (rabbitmq-users on Google Groups) not-enough-information Please provide more information by starting a mailing list thread.
Projects
None yet
Development

No branches or pull requests

2 participants