Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

salt-minion cannot start of one of the masters is down (2014.1.5) #14099

Closed
rasturic opened this issue Jul 10, 2014 · 19 comments
Closed

salt-minion cannot start of one of the masters is down (2014.1.5) #14099

rasturic opened this issue Jul 10, 2014 · 19 comments
Labels
Bug broken, incorrect, or confusing behavior Core relates to code central or existential to Salt fixed-pls-verify fix is linked, bug author to confirm fix Multi-Master severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Milestone

Comments

@rasturic
Copy link

With salt multi-master 2014.1.5, if one of the salt masters is not up we seem to be experiencing a problem where salt minions crash while trying to start. If we downgrade to 2014.1.4, the problem goes away.

Please advise.

[root@134 salt]# salt-minion --versions-report
Salt: 2014.1.5
Python: 2.6.6 (r266:84292, Jan 22 2014, 09:42:36)
Jinja2: unknown
M2Crypto: 0.20.2
msgpack-python: 0.1.13
msgpack-pure: Not Installed
pycrypto: 2.0.1
PyYAML: 3.10
PyZMQ: 2.2.0.1
ZMQ: 3.2.4

/etc/salt/minion

master: [192.168.1.10, 10.168.1.11]

One of the master is powered off.

2014-07-10 11:30:10,398 [salt.loader ][DEBUG ] Loaded cmdmod as virtual cmd
2014-07-10 11:30:10,404 [salt.loader ][DEBUG ] Loaded virtualenv_mod as virtual virtualenv
2014-07-10 11:30:10,405 [salt.loader ][DEBUG ] Loaded djangomod as virtual django
2014-07-10 11:30:10,409 [salt.loader ][DEBUG ] Loaded linux_lvm as virtual lvm
2014-07-10 11:30:10,421 [salt.loader ][DEBUG ] Loaded syslog_return as virtual syslog
2014-07-10 11:30:10,421 [salt.loader ][DEBUG ] Loaded couchdb_return as virtual couchdb
2014-07-10 11:30:10,422 [salt.loader ][DEBUG ] Loaded carbon_return as virtual carbon
2014-07-10 11:30:10,422 [salt.loader ][DEBUG ] Loaded sqlite3_return as virtual sqlite3
2014-07-10 11:30:10,423 [salt.minion ][DEBUG ] I am qa134 and I am not supposed to start any proxies.
2014-07-10 11:30:10,425 [salt.log.setup ][ERROR ] An un-handled exception was caught by salt's global exception handler:
KeyError: 'minion'
Traceback (most recent call last):
File "/usr/bin/salt-minion", line 14, in
salt_minion()
File "/usr/lib/python2.6/site-packages/salt/scripts.py", line 35, in salt_minion
minion.start()
File "/usr/lib/python2.6/site-packages/salt/init.py", line 224, in start
self.minion.tune_in()
File "/usr/lib/python2.6/site-packages/salt/minion.py", line 465, in tune_in
minion = minion['minion']
KeyError: 'minion'

@basepi
Copy link
Contributor

basepi commented Jul 11, 2014

Hrm, strange. Could you test with the newly-released 2014.1.7?

@rasturic
Copy link
Author

I'd love to try this asap. Is there an rpm version? Looks like it has not yet hit EPEL, and for the life of me, I cannot build the rpm with the spec file included with the source tree. I'm not new to rpm building. Help please. :-)

@basepi
Copy link
Contributor

basepi commented Jul 11, 2014

It's up on epel testing now, I think.

@rasturic
Copy link
Author

Thanks.

Unfortunately 2014.1.7 does not fix it for me.

root@hsm63 ~]# salt-minion

[ERROR ] Error while bring up minion for multi-master. Is master responding?
[ERROR ] An un-handled exception was caught by salt's global exception handler:
SaltClientError:
Traceback (most recent call last):
File "/usr/bin/salt-minion", line 14, in
salt_minion()
File "/usr/lib/python2.6/site-packages/salt/scripts.py", line 35, in salt_minion
minion.start()
File "/usr/lib/python2.6/site-packages/salt/init.py", line 224, in start
self.minion.tune_in()
File "/usr/lib/python2.6/site-packages/salt/minion.py", line 457, in tune_in
minions = self.minions()
File "/usr/lib/python2.6/site-packages/salt/minion.py", line 381, in minions
minions = self._gen_minions()
File "/usr/lib/python2.6/site-packages/salt/minion.py", line 373, in _gen_minions
raise exc
SaltClientError
Traceback (most recent call last):
File "/usr/bin/salt-minion", line 14, in
salt_minion()
File "/usr/lib/python2.6/site-packages/salt/scripts.py", line 35, in salt_minion
minion.start()
File "/usr/lib/python2.6/site-packages/salt/init.py", line 224, in start
self.minion.tune_in()
File "/usr/lib/python2.6/site-packages/salt/minion.py", line 457, in tune_in
minions = self.minions()
File "/usr/lib/python2.6/site-packages/salt/minion.py", line 381, in minions
minions = self._gen_minions()
File "/usr/lib/python2.6/site-packages/salt/minion.py", line 373, in _gen_minions
raise exc
salt.exceptions.SaltClientError
[root@hsm63 ~]#
[root@hsm63 ~]# grep ^master /etc/salt/minion
master: [ 10.44.1.62, 10.44.1.61, 10.44.1.63 ]
[root@hsm63 ~]#

@basepi
Copy link
Contributor

basepi commented Jul 11, 2014

Thanks, we'll investigate this issue.

@basepi
Copy link
Contributor

basepi commented Jul 29, 2014

For the record, I have been able to reproduce this. I don't have a fix yet, but will investigate this more this week.

@replicant0wnz
Copy link
Contributor

Can confirm, seeing exact same behavior. Makes multi-master for HA pretty much useless :-(

@replicant0wnz
Copy link
Contributor

[ERROR ] Error while bring up minion for multi-master. Is master responding?
[ERROR ] An un-handled exception was caught by salt's global exception handler:
SaltClientError:
Traceback (most recent call last):
File "/usr/bin/salt-minion", line 14, in
salt_minion()
File "/usr/lib/pymodules/python2.7/salt/scripts.py", line 35, in salt_minion
minion.start()
File "/usr/lib/pymodules/python2.7/salt/init.py", line 224, in start
self.minion.tune_in()
File "/usr/lib/pymodules/python2.7/salt/minion.py", line 457, in tune_in
minions = self.minions()
File "/usr/lib/pymodules/python2.7/salt/minion.py", line 381, in minions
minions = self._gen_minions()
File "/usr/lib/pymodules/python2.7/salt/minion.py", line 373, in _gen_minions
raise exc
SaltClientError
Traceback (most recent call last):
File "/usr/bin/salt-minion", line 14, in
salt_minion()
File "/usr/lib/pymodules/python2.7/salt/scripts.py", line 35, in salt_minion
minion.start()
File "/usr/lib/pymodules/python2.7/salt/init.py", line 224, in start
self.minion.tune_in()
File "/usr/lib/pymodules/python2.7/salt/minion.py", line 457, in tune_in
minions = self.minions()
File "/usr/lib/pymodules/python2.7/salt/minion.py", line 381, in minions
minions = self._gen_minions()
File "/usr/lib/pymodules/python2.7/salt/minion.py", line 373, in _gen_minions
raise exc
salt.exceptions.SaltClientError

@foxx
Copy link

foxx commented Aug 16, 2014

This has been happening to me for the last 3 weeks as well :(

root@web10:/home/admin# salt-minion --version
salt-minion 2014.1.10 (Hydrogen)
2014-08-15 09:24:07,304 [salt.minion      ][ERROR   ] Error while bring up minion for multi-master. Is master responding?
2014-08-15 09:24:07,310 [salt.log.setup   ][ERROR   ] An un-handled exception was caught by salt's global exception handler:
SaltClientError: 
Traceback (most recent call last):
  File "/usr/bin/salt-minion", line 14, in <module>
    salt_minion()
  File "/usr/lib/pymodules/python2.7/salt/scripts.py", line 35, in salt_minion
    minion.start()
  File "/usr/lib/pymodules/python2.7/salt/__init__.py", line 224, in start
    self.minion.tune_in()
  File "/usr/lib/pymodules/python2.7/salt/minion.py", line 457, in tune_in
    minions = self.minions()
  File "/usr/lib/pymodules/python2.7/salt/minion.py", line 381, in minions
    minions = self._gen_minions()
  File "/usr/lib/pymodules/python2.7/salt/minion.py", line 373, in _gen_minions
    raise exc
SaltClientError

@basepi
Copy link
Contributor

basepi commented Aug 18, 2014

Can anyone verify that this is fixed in the 2014.7 and develop branches? Assuming the fix works, we can investigate backporting it to the 2014.1 branch for 2014.1.11.

@cro cro modified the milestones: ww31, ww33 Aug 20, 2014
@dsumsky
Copy link
Contributor

dsumsky commented Aug 27, 2014

I can confirm such minion misbehavior in multi-master SaltStack environment with versions 2014.1.7 and 2014.1.10 as well. Any update regarding the issue? Our SaltStack environment depends on multi-master setup and this is really a showstopper bug as it makes it almost unusable...

@basepi
Copy link
Contributor

basepi commented Aug 27, 2014

I have backported some fixes in #15333. Would someone please test that patch and see if it fixes the issue? If so, it will be in 2014.1.11.

@dsumsky
Copy link
Contributor

dsumsky commented Aug 28, 2014

Hello,
I have tested the fix on 2014.1.10 and it works. Nevertheless, if you have all the configured masters down/unavailable the minion get stuck and no error/exception is raised. IMHO, it would be nice to raise the SaltClientErro exception under such conditions.

Please check my changes in the pull request: #15352

@basepi
Copy link
Contributor

basepi commented Aug 28, 2014

I commented, there's a small lint error that needs to be fixed, otherwise I like your additions.

Could I convince you to also open a pull request against 2014.7 adding that change? We don't merge forward from 2014.1, so I want to make sure it gets into 2014.7 (which will be merged forward into develop)

@cachedout
Copy link
Contributor

Based on what I'm seeing in the last few comments here, this issue seems to me to be resolved. Therefore, I'm going to go ahead and mark it as closed. If it's not truly resolved to everyone's satisfaction, just drop a comment here and we'll happily re-open it. Thanks!

@bbinet
Copy link
Contributor

bbinet commented Oct 13, 2014

I still encounter this issue on debian with salt version 2014.1.10.

I will try to update to 2014.1.11 and try again.

@bbinet
Copy link
Contributor

bbinet commented Oct 13, 2014

Sorry for the noise: this seems to be fixed in 2014.1.11.

@cachedout
Copy link
Contributor

No worries, thanks @bbinet

@jfindlay jfindlay removed the Core relates to code central or existential to Salt label May 26, 2015
@jfindlay jfindlay added the Core relates to code central or existential to Salt label May 26, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior Core relates to code central or existential to Salt fixed-pls-verify fix is linked, bug author to confirm fix Multi-Master severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Projects
None yet
Development

No branches or pull requests

9 participants