New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event.send short-circuiting in multi-master mode #48415

Closed
doesitblend opened this Issue Jul 2, 2018 · 8 comments

Comments

@doesitblend
Contributor

doesitblend commented Jul 2, 2018

Description of Issue/Question

In a multi-master setup event.send appears to just go through the list of masters, open a channel and fire an event to that master's bus. This continues to work when one of these masters is down, logging [DEBUG ] SaltReqTimeoutError, retrying. (1/7). However, in the case where any one of these masters has not yet accepted the minion key, this process will die at whatever master has not accepted the key instead of continuing through the list of masters.

The event appears on all the masters prior to the failure, but the overall command back from the CLI fails.

I've attached the output of my trace log from the CLI for reference. The first log will show that the second master is not attempted to be connected to. The second log shows that the command failed at the second master but sent the event to the first one.
salt-call_trace_eventsend.log
salt-call_trace_eventsend_master2_fail.log

My testing was primarily done in 2017.7.5, but I have also verified this is 2017.7.6.

Setup

I'm using three Docker containers, one minion and two masters. I've installed the code with the bootstrap script on each container.

Minion Config

master:
  -  172.17.0.2
  -  172.17.0.4

Steps to Reproduce Issue

  • Connect minion to two masters
  • Accept key on one of the masters
  • Leave the key in unaccepted state on the other master
  • From the minion run, salt-call -ltrace event.send 'myco/mytag/someminion/testing4' "{'mydata':'this is neat'}"

Versions Report

Salt Version:
           Salt: 2017.7.5
 
Dependency Versions:
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: Not Installed
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.7.2
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.5.1
   mysql-python: Not Installed
      pycparser: Not Installed
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 2.7.5 (default, Apr 11 2018, 07:36:10)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 15.3.0
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.1.4
 
System Versions:
           dist: centos 7.5.1804 Core
         locale: ANSI_X3.4-1968
        machine: x86_64
        release: 4.9.87-linuxkit-aufs
         system: Linux
        version: CentOS Linux 7.5.1804 Core

As a side note I also verified this in 2017.7.6.

@doesitblend doesitblend added the ZD label Jul 2, 2018

@doesitblend

This comment has been minimized.

Contributor

doesitblend commented Jul 2, 2018

ZD-2654

@rickh563

This comment has been minimized.

rickh563 commented Jul 6, 2018

@doesitblend -- please test in 2017.7.7 and report back. It is possible that this may be remediated.

@doesitblend

This comment has been minimized.

Contributor

doesitblend commented Jul 9, 2018

@rickh563 This has been verified to also occur in 2017.7.7.

This is installed straight from the repository with install_salt.sh -X -M -P stable 2017.7.7.

@rallytime

This comment has been minimized.

Contributor

rallytime commented Jul 9, 2018

@garethgreenaway can you take a look at this for 2017.7.8?

@garethgreenaway

This comment has been minimized.

Member

garethgreenaway commented Jul 12, 2018

@doesitblend I was able to get this working by adding the following configuration options into the minion configuration:

master_type: failover
master_shuffle: True

Can you confirm that adding these two items solves the issue for you?

@garethgreenaway

This comment has been minimized.

Member

garethgreenaway commented Jul 12, 2018

Looks like I may have spoken too soon.

@garethgreenaway

This comment has been minimized.

Member

garethgreenaway commented Jul 13, 2018

Found the cause and have a fix available. A bit more testing to make 100% sure but I should be able to provide a PR tomorrow.

@rallytime

This comment has been minimized.

Contributor

rallytime commented Jul 26, 2018

@doesitblend This is fixed with #48588

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment