Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple returns with syndic #21601

Closed
ev0rtex opened this issue Mar 12, 2015 · 14 comments
Closed

Multiple returns with syndic #21601

ev0rtex opened this issue Mar 12, 2015 · 14 comments
Labels
Bug broken, incorrect, or confusing behavior cannot-reproduce cannot be replicated with info/context provided Duplicate Duplicate of another issue or PR - will be closed severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Milestone

Comments

@ev0rtex
Copy link

ev0rtex commented Mar 12, 2015

I am trying to figure out if I've done something wrong or if this is actually a bug of some sort. My setup has a main salt master and lower-level masters running the syndic (I'll call these syndic-masters). Whenever I try to run something (e.g. test.ping) from the main master targeting minions under a syndic-master I am seeing multiple returns:

root@salt:~# salt ddmvarnish01-ssl.deseretdigital.com test.ping

ddmvarnish01-ssl.deseretdigital.com:
    True
ddmvarnish01-ssl.deseretdigital.com:
    True
ddmvarnish01-ssl.deseretdigital.com:
    True
ddmvarnish01-ssl.deseretdigital.com:
    True
ddmvarnish01-ssl.deseretdigital.com:
    True
ddmvarnish01-ssl.deseretdigital.com:
    True

in the debug output on the targeted minion I see a lot of this:

[INFO    ] User root Executing command test.ping with jid 20150312124653507569
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20150312124653507569', 'tgt': 'ddmvarnish01-ssl.deseretdigital.com', 'ret': '', 'user': 'root', 'arg': [], 'fun': 'test.ping'}
[INFO    ] User root Executing command test.ping with jid 20150312124653507569
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20150312124653507569', 'tgt': 'ddmvarnish01-ssl.deseretdigital.com', 'ret': '', 'user': 'root', 'arg': [], 'fun': 'test.ping'}
[INFO    ] Starting a new job with PID 20191
[INFO    ] User sudo_nbrooks Executing command test.ping with jid 20150312124653507569
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20150312124653507569', 'tgt': 'ddmvarnish01-ssl.deseretdigital.com', 'ret': '', 'user': 'sudo_nbrooks', 'arg': [], 'fun': 'test.ping'}
[INFO    ] Returning information for job: 20150312124653507569
[INFO    ] Starting a new job with PID 20194
[INFO    ] Returning information for job: 20150312124653507569
[INFO    ] User sudo_nbrooks Executing command test.ping with jid 20150312124653507569
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20150312124653507569', 'tgt': 'ddmvarnish01-ssl.deseretdigital.com', 'ret': '', 'user': 'sudo_nbrooks', 'arg': [], 'fun': 'test.ping'}
[INFO    ] Starting a new job with PID 20201
[INFO    ] Returning information for job: 20150312124653507569
[INFO    ] User sudo_nbrooks Executing command test.ping with jid 20150312124653507569
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20150312124653507569', 'tgt': 'ddmvarnish01-ssl.deseretdigital.com', 'ret': '', 'user': 'sudo_nbrooks', 'arg': [], 'fun': 'test.ping'}
[INFO    ] User root Executing command test.ping with jid 20150312124653507569
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20150312124653507569', 'tgt': 'ddmvarnish01-ssl.deseretdigital.com', 'ret': '', 'user': 'root', 'arg': [], 'fun': 'test.ping'}
[INFO    ] Starting a new job with PID 20206
[INFO    ] Returning information for job: 20150312124653507569
[INFO    ] Starting a new job with PID 20209
[INFO    ] Returning information for job: 20150312124653507569
[INFO    ] Starting a new job with PID 20215
[INFO    ] Returning information for job: 20150312124653507569
        .
        .
        .
[INFO    ] User root Executing command saltutil.find_job with jid 20150312124658526024
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20150312124658526024', 'tgt': 'ddmvarnish01-ssl.deseretdigital.com', 'ret': '', 'user': 'root', 'arg': ['20150312124653507569'], 'fun': 'saltutil.find_job'}
[INFO    ] User sudo_nbrooks Executing command saltutil.find_job with jid 20150312124658526024
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20150312124658526024', 'tgt': 'ddmvarnish01-ssl.deseretdigital.com', 'ret': '', 'user': 'sudo_nbrooks', 'arg': ['20150312124653507569'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 20232
[INFO    ] Returning information for job: 20150312124658526024
[INFO    ] User root Executing command saltutil.find_job with jid 20150312124658526024
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20150312124658526024', 'tgt': 'ddmvarnish01-ssl.deseretdigital.com', 'ret': '', 'user': 'root', 'arg': ['20150312124653507569'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 20235
[INFO    ] Returning information for job: 20150312124658526024
[INFO    ] User root Executing command saltutil.find_job with jid 20150312124658526024
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20150312124658526024', 'tgt': 'ddmvarnish01-ssl.deseretdigital.com', 'ret': '', 'user': 'root', 'arg': ['20150312124653507569'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 20240
[INFO    ] User sudo_nbrooks Executing command saltutil.find_job with jid 20150312124658526024
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20150312124658526024', 'tgt': 'ddmvarnish01-ssl.deseretdigital.com', 'ret': '', 'user': 'sudo_nbrooks', 'arg': ['20150312124653507569'], 'fun': 'saltutil.find_job'}
[INFO    ] Returning information for job: 20150312124658526024
[INFO    ] User sudo_nbrooks Executing command saltutil.find_job with jid 20150312124658526024
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20150312124658526024', 'tgt': 'ddmvarnish01-ssl.deseretdigital.com', 'ret': '', 'user': 'sudo_nbrooks', 'arg': ['20150312124653507569'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 20246
[INFO    ] Starting a new job with PID 20251
[INFO    ] Returning information for job: 20150312124658526024
[INFO    ] Starting a new job with PID 20253
[INFO    ] Returning information for job: 20150312124658526024
[INFO    ] Returning information for job: 20150312124658526024
        .
        .
        .
[DEBUG   ] Failed to authenticate message
[DEBUG   ] Waiting 7 seconds to re-authenticate
[DEBUG   ] Attempting to authenticate with the Salt Master at 10.250.13.19
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[INFO    ] Authentication with master at 10.250.13.19 successful!
[INFO    ] User root Executing command test.ping with jid 20150312125127215638
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20150312125127215638', 'tgt': 'ddmvarnish01-ssl.deseretdigital.com', 'ret': '', 'user': 'root', 'arg': [], 'fun': 'test.ping'}
[INFO    ] User sudo_nbrooks Executing command test.ping with jid 20150312125127215638
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20150312125127215638', 'tgt': 'ddmvarnish01-ssl.deseretdigital.com', 'ret': '', 'user': 'sudo_nbrooks', 'arg': [], 'fun': 'test.ping'}
[INFO    ] User sudo_nbrooks Executing command test.ping with jid 20150312125127215638
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20150312125127215638', 'tgt': 'ddmvarnish01-ssl.deseretdigital.com', 'ret': '', 'user': 'sudo_nbrooks', 'arg': [], 'fun': 'test.ping'}
[INFO    ] Starting a new job with PID 20320
[INFO    ] Returning information for job: 20150312125127215638
[INFO    ] User sudo_nbrooks Executing command test.ping with jid 20150312125127215638
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20150312125127215638', 'tgt': 'ddmvarnish01-ssl.deseretdigital.com', 'ret': '', 'user': 'sudo_nbrooks', 'arg': [], 'fun': 'test.ping'}
[INFO    ] Starting a new job with PID 20323
[INFO    ] Returning information for job: 20150312125127215638
[INFO    ] Starting a new job with PID 20328
[INFO    ] Returning information for job: 20150312125127215638
[INFO    ] User root Executing command test.ping with jid 20150312125127215638
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20150312125127215638', 'tgt': 'ddmvarnish01-ssl.deseretdigital.com', 'ret': '', 'user': 'root', 'arg': [], 'fun': 'test.ping'}
[INFO    ] Starting a new job with PID 20330
[INFO    ] Returning information for job: 20150312125127215638
[INFO    ] User root Executing command test.ping with jid 20150312125127215638
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20150312125127215638', 'tgt': 'ddmvarnish01-ssl.deseretdigital.com', 'ret': '', 'user': 'root', 'arg': [], 'fun': 'test.ping'}
[INFO    ] User root Executing command saltutil.find_job with jid 20150312125132228694
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20150312125132228694', 'tgt': 'ddmvarnish01-ssl.deseretdigital.com', 'ret': '', 'user': 'root', 'arg': ['20150312125127215638'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 20340
[INFO    ] Returning information for job: 20150312125127215638
[INFO    ] User sudo_nbrooks Executing command saltutil.find_job with jid 20150312125132228694
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20150312125132228694', 'tgt': 'ddmvarnish01-ssl.deseretdigital.com', 'ret': '', 'user': 'sudo_nbrooks', 'arg': ['20150312125127215638'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 20343
[INFO    ] Returning information for job: 20150312125127215638
[INFO    ] User sudo_nbrooks Executing command saltutil.find_job with jid 20150312125132228694
[INFO    ] Starting a new job with PID 20348
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20150312125132228694', 'tgt': 'ddmvarnish01-ssl.deseretdigital.com', 'ret': '', 'user': 'sudo_nbrooks', 'arg': ['20150312125127215638'], 'fun': 'saltutil.find_job'}
[INFO    ] Returning information for job: 20150312125132228694
[INFO    ] Starting a new job with PID 20352
[INFO    ] Returning information for job: 20150312125132228694
[INFO    ] User sudo_nbrooks Executing command saltutil.find_job with jid 20150312125132228694
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20150312125132228694', 'tgt': 'ddmvarnish01-ssl.deseretdigital.com', 'ret': '', 'user': 'sudo_nbrooks', 'arg': ['20150312125127215638'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 20360
[INFO    ] Returning information for job: 20150312125132228694
[INFO    ] User root Executing command saltutil.find_job with jid 20150312125132228694
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20150312125132228694', 'tgt': 'ddmvarnish01-ssl.deseretdigital.com', 'ret': '', 'user': 'root', 'arg': ['20150312125127215638'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 20365
[INFO    ] User root Executing command saltutil.find_job with jid 20150312125132228694
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20150312125132228694', 'tgt': 'ddmvarnish01-ssl.deseretdigital.com', 'ret': '', 'user': 'root', 'arg': ['20150312125127215638'], 'fun': 'saltutil.find_job'}
[INFO    ] Returning information for job: 20150312125132228694
[INFO    ] Starting a new job with PID 20368
[INFO    ] Returning information for job: 20150312125132228694
[INFO    ] Starting a new job with PID 20372
[INFO    ] Returning information for job: 20150312125132228694
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem

I have no idea what's happening here but if I'm not mistaken I think I'm seeing the same job actually being run multiple times on the minion. This issue does not happen if I run the same command from the syndic-master.

@rallytime
Copy link
Contributor

@ev0rtex Thanks for filing this issue @ev0rtex. Can you tell us what version of salt you're running by posting the output of salt --versions-report for your "Super Master", syndic, and affected minion?

@rallytime rallytime added the info-needed waiting for more info label Mar 13, 2015
@rallytime rallytime added this to the Blocked milestone Mar 13, 2015
@ev0rtex
Copy link
Author

ev0rtex commented Mar 16, 2015

Thanks for the reply @rallytime. Our servers are all running the same version:

Super Master:

           Salt: 2014.7.1
         Python: 2.7.3 (default, Mar 13 2014, 11:03:55)
         Jinja2: 2.6
       M2Crypto: 0.21.1
 msgpack-python: 0.1.10
   msgpack-pure: Not Installed
       pycrypto: 2.6
        libnacl: Not Installed
         PyYAML: 3.10
          ioflo: Not Installed
          PyZMQ: 13.1.0
           RAET: Not Installed
            ZMQ: 4.0.4
           Mako: 0.7.0

Syndic:

           Salt: 2014.7.1
         Python: 2.7.3 (default, Mar 13 2014, 11:03:55)
         Jinja2: 2.6
       M2Crypto: 0.21.1
 msgpack-python: 0.1.10
   msgpack-pure: Not Installed
       pycrypto: 2.6
        libnacl: Not Installed
         PyYAML: 3.10
          ioflo: Not Installed
          PyZMQ: 13.1.0
           RAET: Not Installed
            ZMQ: 4.0.4
           Mako: 0.7.0

Minion:

           Salt: 2014.7.1
         Python: 2.7.3 (default, Mar 13 2014, 11:03:55)
         Jinja2: 2.6
       M2Crypto: 0.21.1
 msgpack-python: 0.1.10
   msgpack-pure: Not Installed
       pycrypto: 2.6
        libnacl: Not Installed
         PyYAML: 3.10
          ioflo: Not Installed
          PyZMQ: 13.1.0
           RAET: Not Installed
            ZMQ: 4.0.4
           Mako: 0.7.0

@ev0rtex
Copy link
Author

ev0rtex commented Mar 16, 2015

Also, if it helps, here are the relevant config file settings (minus some defined nodegroups and some grains):

Super Master (/etc/salt/master):

file_roots:
  base:
    - /srv/salt
    - /srv/files

pillar_roots:
  base:
    - /srv/pillar

order_masters: True

Syndic (/etc/salt/master):

file_roots:
  base:
    - /srv/salt

pillar_roots:
  base:
    - /srv/pillar

# Modified to protect master hostname
syndic_master: salt.example.com

Syndic (/etc/salt/minion):

# Modified to protect master hostname
master: salt.example.com

Minion (/etc/salt/minion):

# Modified to protect syndic hostname
master: syndic.example.com

@rallytime
Copy link
Contributor

@ev0rtex Do you have a minion installed on the box that has the Syndic/Master? I am wondering if the config value you have listed in the Syndic's /etc/salt/minion file is causing these multiple returns. If you do not have a minion on the Syndic box, you shouldn't need an /etc/salt/minion file. The syndic_master config in /etc/salt/master should take care of "pointing" the Syndic to the "Super Master".

@ev0rtex
Copy link
Author

ev0rtex commented Mar 16, 2015

@rallytime I do run a minion on the syndic as evidenced by my config files above. I have tried shutting down the minion in case that was part of the problem but it makes no difference.

@rallytime
Copy link
Contributor

Ok I just wanted to make sure that wasn't it as I have been setting up many syndic configurations lately and I have not seen this problem. There isn't anything else standing out to me here in your configs.

@basepi or @UtahDave have either of you seen this behavior?

@basepi
Copy link
Contributor

basepi commented Mar 17, 2015

This is very odd. The only time I would expect to see behavior like this is in a multi-master syndic setup. If minions are connected to multiple masters, and those masters are all connected to a master of masters, then the minion will get the job multiple times (once from each master). Are you using multi-master at all?

@ev0rtex
Copy link
Author

ev0rtex commented Mar 19, 2015

@basepi We don't have a multi-master setup. As I'm thinking about it I know we switched the minions to point to the syndic in the past. Prior to doing so they originally pointed at our super master before we had the syndic master in place. When I did the switch I deleted the minion keys from the super master and then I pointed the minions to the syndic master. I can't imagine that anything would be wrong with how I did this though. I'm pretty stumped at this point and nothing I have tried thus far seems to make a difference.

@basepi
Copy link
Contributor

basepi commented Mar 20, 2015

@ev0rtex Nothing sounds wrong with that setup or the changes to your setup.

I wonder if the various fixes we have put in place for the syndic in the last week or so would help this issue? We're hoping to get 2015.2.0rc2 out today or Monday and it will have those changes, would be useful if you could test there.

Another thought I had -- you should double check that you only have one copy of the syndic running on your lower level masters. Otherwise I could see duplicate events coming through and potentially causing these issues.

If neither of those things helps, I'm a little stumped too....

@basepi basepi added Bug broken, incorrect, or confusing behavior severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around cannot-reproduce cannot be replicated with info/context provided and removed info-needed waiting for more info labels Mar 20, 2015
@basepi basepi modified the milestones: Approved, Blocked Mar 20, 2015
@claudiupopescu
Copy link
Contributor

Check if you have multiple syndic processes running, that was my problem. I even reported this in issue #19957

@ev0rtex
Copy link
Author

ev0rtex commented Mar 23, 2015

@claudiupopescu, @basepi I don't know why I didn't think to check that earlier. It does appear that I'm affected by that problem. I just checked and there were like 20 salt-syndic processes running. I definitely have that exact same issue where stopping the salt-syndic service does not kill the salt-syndic processes.

@rallytime
Copy link
Contributor

@ev0rtex Glad that we figured it out! I didn't think to check that either. I think that in this case we should close this in favor of the other issue and track the progress of this bug there, since it was filed earlier. That ok with you?

@rallytime rallytime added the Duplicate Duplicate of another issue or PR - will be closed label Mar 23, 2015
@ev0rtex
Copy link
Author

ev0rtex commented Mar 24, 2015

@rallytime yeah, let's watch the other issue. Thanks

@ev0rtex ev0rtex closed this as completed Mar 24, 2015
@rallytime
Copy link
Contributor

Cool! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior cannot-reproduce cannot be replicated with info/context provided Duplicate Duplicate of another issue or PR - will be closed severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Projects
None yet
Development

No branches or pull requests

4 participants