Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beacon events handled differently than event.send #49680

Closed
doesitblend opened this issue Sep 17, 2018 · 10 comments

Comments

@doesitblend
Copy link
Contributor

commented Sep 17, 2018

Description of Issue/Question

I would like to open a discussion about how events are fired from the minion side. This discussion is brought up in observance of a difference between how beacons emit events versus how event.fire emits events. The difference really shows up in HA environments where the first master listed in the minion configuration is down. The problem was noticed in an HA environment with 4 masters and a couple of minions. This especially shows up with the inotify beacon on Linux. So we will focus on this. However, I believe that this behavior is also manifested when using the process beacon.

In the above environment, using IP addresses to define the masters, when the inotify beacon fires, it will fire quickly to a single random master, even if a master is down. This is different from salt-call event.fire which seems to attempt to send the event serially to each master in the configured list. This is fine except that it causes a slowdown in sending the events when attempting to connect to the down master.

I will explain some observances below.

Setup

For testing:

  1. Configure 4 masters and at least 1 minion connected to all 4 masters in active/active:
  • Use IP address, not DNS, especially in Docker since DNS will not resolve for a stopped container (depending on your configuration)
  1. I have attached my files for configuring Docker. Note that the image is built locally.
    Docker_Setup.zip
  • If you run this on Linux you may run into permission issues that do not occur on Macos since the Docker daemon runs as an admin user, not root. To solve, just fix the permissions on the keys and files under your local saltstack directory.
  • Be sure to review the docker-compose.yml file before running docker-compose up to ensure that all needed local directories are created.

Behavior of beacon (inotify specifically)

  1. monitor events on all masters
  • run test.version and check events on all masters
  1. Minion runs through the serially and stops at first alive master for sending
    the event.
  • Takes timeout period to move on to the next
  • There will be a log entry showing that the new connection is being made to the
    new master
  1. Bring the master back up and re-run the command and this master will be re-added
    to the beginning. So the minion is talking to the masters serially every single time
    instead of tracking in memory which master it should talk to.

Versions report

    Salt Version:
               Salt: 2018.3.2
     
    Dependency Versions:
               cffi: Not Installed
           cherrypy: Not Installed
           dateutil: Not Installed
          docker-py: Not Installed
              gitdb: Not Installed
          gitpython: Not Installed
              ioflo: Not Installed
             Jinja2: 2.7.2
            libgit2: Not Installed
            libnacl: Not Installed
           M2Crypto: Not Installed
               Mako: Not Installed
       msgpack-pure: Not Installed
     msgpack-python: 0.5.6
       mysql-python: Not Installed
          pycparser: Not Installed
           pycrypto: 2.6.1
       pycryptodome: Not Installed
             pygit2: Not Installed
             Python: 2.7.5 (default, Jul 13 2018, 13:06:57)
       python-gnupg: Not Installed
             PyYAML: 3.11
              PyZMQ: 15.3.0
               RAET: Not Installed
              smmap: Not Installed
            timelib: Not Installed
            Tornado: 4.2.1
                ZMQ: 4.1.4
     
    System Versions:
               dist: centos 7.5.1804 Core
             locale: UTF-8
            machine: x86_64
            release: 4.9.93-linuxkit-aufs
             system: Linux
            version: CentOS Linux 7.5.1804 Core
@doesitblend doesitblend added the ZD label Sep 17, 2018
@doesitblend doesitblend changed the title Beacon events handled differently than event.fire Beacon events handled differently than event.send Sep 17, 2018
@austinpapp

This comment has been minimized.

Copy link
Contributor

commented Sep 17, 2018

fyi. multi-master minion hanging was fixed in #49292

@austinpapp

This comment has been minimized.

Copy link
Contributor

commented Sep 17, 2018

i think we need to better understand why a beacon should fire event to random master versus broadcast versus... something else?

its well understood that event.send broadcasts. is there a reason event.send would broadcast over picking a random master? i would ask the same question as to why a beacon would pick a random master over broadcasting.

also, salt-call takes a totally different path given it needs to run through the SAuthclass.

ultimately, there is different behaviors depending on what you are doing.

@austinpapp

This comment has been minimized.

Copy link
Contributor

commented Sep 17, 2018

@garethgreenaway made a comment here #49663 (comment) and he is right. most people don't want redundant execution of a reactor given an event. however, some people may want that to happen.

so, i think a potential thought is that you make the exec mod event.send have the option to randomly pick a master to send the event (if the minion is already auth'd and not through salt-call). this aligns with beacon.

@DmitryKuzmenko

This comment has been minimized.

Copy link
Contributor

commented Sep 18, 2018

I agree, we shouldn't change the default behavior here, but we should give people an option to notify the only one master at a time. Or maybe to have any other way to not activate reactors on all masters by one event.

@austinpapp

This comment has been minimized.

Copy link
Contributor

commented Sep 18, 2018

@DmitryKuzmenko thanks for your comments. @mattp- and i are working on something that may address your last comment. 👍 however, that solution may not be for everyone. anyway, i think an option to randomly pick an alive master is very useful.

@rickh563

This comment has been minimized.

Copy link

commented Sep 18, 2018

ZD-2774

@mattp-

This comment has been minimized.

Copy link
Contributor

commented Sep 18, 2018

beacons exist on a per minion object basis (in the context that there is a minion object per master in an N master configuration, managed by MinionManager). It's not that that the minion is handling things in serial, but the mininon is by default using sync=True with minion._return_pub(); zeromq blocks on the connection failure for each down master.

I think it makes sense for all minion interaction in a hot-standby multimaster to broadcast, all of the time (which is what it currently does). as mentioned my changes in #49292 should remove dead masters with an enabling of master_alive_interval scheduling for re-addition when it comes back up. Is that the root of this ticket?

@garethgreenaway

This comment has been minimized.

Copy link
Member

commented Sep 18, 2018

In speaking with @doesitblend about this, it's my understanding that the desired situation for beacon events is the following.

In the event of having multiple masters and multiple minion processes connected to those masters, we need a way for a minion process to detect whether the master that it is connected to is down before picking up a beacon event.

We also should allow the optional behavior that will send beacon events along to all masters.

@Ch3LL Ch3LL added this to the Approved milestone Sep 19, 2018
@gaoweisgp

This comment has been minimized.

Copy link

commented Sep 21, 2018

Recap the original objectives:

  1. standardize behaviors across different methods
    a) on master, salt minion event.send- currently broadcasts to all alive masters, if no preload option specified
    b) on minion, salt-call test.version - currently sends job return event to the first alive master, based on the sequence of the mast list defined in minion config, with timeout. Subsequent run will repeat the same.
    c) on minion, beacon event - currently sends notification event to ONE "random" alive master, without any timeout. The same master will be used for the subsequent beacon events, unless that master became unavailable. The another "random" alive master is selected
  2. make the behavior configurable
@DmitryKuzmenko

This comment has been minimized.

Copy link
Contributor

commented Oct 6, 2019

Fixed by #54247

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Neon
Backlog
9 participants
You can’t perform that action at this time.