Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-master DNS Issue - minion connect fail #49520

Closed
doesitblend opened this issue Sep 5, 2018 · 7 comments

Comments

Projects
None yet
4 participants
@doesitblend
Copy link
Contributor

commented Sep 5, 2018

Description of Issue/Question

When using multi-master and all keys have been accepted, a minion does not connect to the other masters when the first master in the list is down. Another symptom of this problem occurs simply when any master in the list is down

In a docker container test environment with 4 masters and 2 minions, each minion connecting to all masters via dns name, stop the first master. Then restart the two minions. Both of the minions will likely have the following error in the logs and be unresponsive from ay other master that is still up:

minion1_1  | [ERROR   ] DNS lookup or connection check of 'master1' failed.
minion1_1  | [ERROR   ] Master hostname: 'master1' not found or not responsive. Retrying in 30 seconds
minion1_1  | [ERROR   ] DNS lookup or connection check of 'master1' failed.
minion1_1  | [ERROR   ] Master hostname: 'master1' not found or not responsive. Retrying in 30 seconds

Setup

My minion configuration for both minions is:

[root@minion1 /]# cat /etc/salt/minion
master: 
  - master1
  - master2
  - master3
  - master4

log_level_logfile: trace
master_alive_interval: 20

My master configurarion for all masters is:

auto_accept: True
log_level_logfile: trace

Versions Report

Salt Version:
           Salt: 2018.3.2
 
Dependency Versions:
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: Not Installed
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.7.2
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.5.6
   mysql-python: Not Installed
      pycparser: Not Installed
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 2.7.5 (default, Jul 13 2018, 13:06:57)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 15.3.0
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.1.4
 
System Versions:
           dist: centos 7.5.1804 Core
         locale: ANSI_X3.4-1968
        machine: x86_64
        release: 4.9.93-linuxkit-aufs
         system: Linux
        version: CentOS Linux 7.5.1804 Core

@doesitblend doesitblend added the ZD label Sep 5, 2018

@doesitblend

This comment has been minimized.

Copy link
Contributor Author

commented Sep 5, 2018

ZD-2774

@Ch3LL

This comment has been minimized.

Copy link
Contributor

commented Sep 6, 2018

just to clarify does this only occur when using dns names for the masters in the list?

@Ch3LL Ch3LL added the Info Needed label Sep 6, 2018

@Ch3LL Ch3LL added this to the Blocked milestone Sep 6, 2018

@doesitblend

This comment has been minimized.

Copy link
Contributor Author

commented Sep 6, 2018

@Ch3LL Yes, this only appears to happen when using DNS. If I specify all masters to use the IP address and leave master1 down the minions start up fine and I'm able to run commands from each master.

@Ch3LL

This comment has been minimized.

Copy link
Contributor

commented Sep 7, 2018

thanks for clarifying :)

@Ch3LL Ch3LL added Bug Critical P3 team-core and removed Info Needed labels Sep 7, 2018

@Ch3LL Ch3LL modified the milestones: Blocked, Approved Sep 7, 2018

@Ch3LL

This comment has been minimized.

Copy link
Contributor

commented Sep 7, 2018

looks like i'm able to replicate this. in fact i see this was not working in 2017.7.5, 2017.7.7, 2018.3.0, 2018.3.2 and head of 2018.3. Do you know if this was ever working? I was hoping to bisect it but from what i can see this was never working.

@rallytime

This comment has been minimized.

Copy link
Contributor

commented Sep 7, 2018

@garethgreenaway Can you take a look here?

@rallytime

This comment has been minimized.

Copy link
Contributor

commented Oct 1, 2018

@doesitblend This should be resolved with the changes in #49764 with the retry_dns_count option.

@rallytime rallytime closed this Oct 11, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.