Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ubuntu 12.04, salt 2014.1.3-1precise1, minion dead #12172

Closed
LeTink opened this issue Apr 21, 2014 · 12 comments · Fixed by #12969
Closed

Ubuntu 12.04, salt 2014.1.3-1precise1, minion dead #12172

LeTink opened this issue Apr 21, 2014 · 12 comments · Fixed by #12969
Labels
Bug broken, incorrect, or confusing behavior severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Milestone

Comments

@LeTink
Copy link

LeTink commented Apr 21, 2014

After the latest upgrade to salt-stack minions fail to start;

[DEBUG   ] Loaded ldapmod as virtual ldap
[DEBUG   ] Loaded linux_lvm as virtual lvm
[DEBUG   ] Loaded memcache_return as virtual memcache
[DEBUG   ] Loaded couchdb_return as virtual couchdb
[DEBUG   ] Loaded syslog_return as virtual syslog
[DEBUG   ] Loaded carbon_return as virtual carbon
[DEBUG   ] Loaded sqlite3_return as virtual sqlite3
[DEBUG   ] I am playpen-andrej.domain.gone and I am not supposed to start any proxies.
[INFO    ] Minion is starting as user 'root'
[DEBUG   ] Minion 'playpen-andrej.domain.gone' trying to tune in
[DEBUG   ] Minion PUB socket URI: ipc:///var/run/salt/minion/minion_event_8a6de61b96fb02d932bc49f886927dcc944cc48b55b143ea84d16dee8877a076389177776c6bdc43ea3fea37b5fa8412740d18c6421b0b02b3e503e1c6e2c233_pub.ipc
[DEBUG   ] Minion PULL socket URI: ipc:///var/run/salt/minion/minion_event_8a6de61b96fb02d932bc49f886927dcc944cc48b55b143ea84d16dee8877a076389177776c6bdc43ea3fea37b5fa8412740d18c6421b0b02b3e503e1c6e2c233_pull.ipc
[ERROR   ] An un-handled exception was caught by salt's global exception handler:
ZMQError: File name too long
Traceback (most recent call last):
  File "/usr/bin/salt-minion", line 14, in <module>
    salt_minion()
  File "/usr/lib/pymodules/python2.7/salt/scripts.py", line 35, in salt_minion
    minion.start()
  File "/usr/lib/pymodules/python2.7/salt/__init__.py", line 224, in start
    self.minion.tune_in()
  File "/usr/lib/pymodules/python2.7/salt/minion.py", line 1232, in tune_in
    self.epub_sock.bind(epub_uri)
  File "socket.pyx", line 432, in zmq.core.socket.Socket.bind (zmq/core/socket.c:3894)
  File "checkrc.pxd", line 23, in zmq.core.checkrc._check_rc (zmq/core/socket.c:5754)
ZMQError: File name too long
Traceback (most recent call last):
  File "/usr/bin/salt-minion", line 14, in <module>
    salt_minion()
  File "/usr/lib/pymodules/python2.7/salt/scripts.py", line 35, in salt_minion
    minion.start()
  File "/usr/lib/pymodules/python2.7/salt/__init__.py", line 224, in start
    self.minion.tune_in()
  File "/usr/lib/pymodules/python2.7/salt/minion.py", line 1232, in tune_in
    self.epub_sock.bind(epub_uri)
  File "socket.pyx", line 432, in zmq.core.socket.Socket.bind (zmq/core/socket.c:3894)
  File "checkrc.pxd", line 23, in zmq.core.checkrc._check_rc (zmq/core/socket.c:5754)
zmq.error.ZMQError: File name too long
@LeTink LeTink changed the title Ubuntu 12.04, salt 2014.1.3-1precise1 Ubuntu 12.04, salt 2014.1.3-1precise1, minion dead Apr 21, 2014
@LeTink
Copy link
Author

LeTink commented Apr 21, 2014

OK, did some more digging. Turns out that only my playpens minion is karking it. Since UtahDave asked about the masters and minions config file(s) I compared the playpens file against a "plain" one, and it turns out that I had played with the default hash in the past, changing it from md5 to sha512 on the playpen.

This worked OK in the past, but must have suffered a regression in 2014.1.3 ... reverting to md5 fixed the issue of the minion not wanting to start any more.

@basepi
Copy link
Contributor

basepi commented Apr 22, 2014

Any chance you could test this on the develop branch? I know there were some changes to the way we handle hashing, and I think the problem is that we missed one or more of the fixes while cherry-picking to 2014.1.3. Just want to make sure it's only a bug in 2014.1, and not in develop as well.

@basepi basepi added this to the Blocked milestone Apr 22, 2014
@LeTink
Copy link
Author

LeTink commented Apr 22, 2014

I'd be happy to, but have to admit I don't know how to go about that?

@basepi
Copy link
Contributor

basepi commented Apr 22, 2014

You can use salt bootstrap using flags to install from git, or you can just run python setup.py install --force from a salt repo to install over the top of an existing install from packages or whatever.

@LeTink
Copy link
Author

LeTink commented Apr 22, 2014

OK - it would appear the regressions is also present in devel, not just in the 2014.1.3 release.

After the bootstrap & canging the hash back to sha512 I still get the issues originally reported.

$ salt-minion --version
salt-minion 2014.1.0-5036-gedcf898

sudo salt-minion -l debug
[DEBUG ] Reading configuration from /etc/salt/minion
[DEBUG ] Including configuration from '/etc/salt/minion.d/mine.conf'
[DEBUG ] Reading configuration from /etc/salt/minion.d/mine.conf
[DEBUG ] Configuration file path: /etc/salt/minion
[INFO ] Setting up the Salt Minion "playpen-andrej.fqdn.removed
[DEBUG ] Created pidfile: /var/run/salt-minion.pid
[DEBUG ] Reading configuration from /etc/salt/minion
[DEBUG ] Including configuration from '/etc/salt/minion.d/mine.conf'
[DEBUG ] Reading configuration from /etc/salt/minion.d/mine.conf
[DEBUG ] Attempting to authenticate with the Salt Master at 210.7.46.248
[DEBUG ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[DEBUG ] Decrypting the current master AES key
[DEBUG ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[INFO ] Authentication with master successful!
[DEBUG ] Decrypting the current master AES key
[DEBUG ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[DEBUG ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[DEBUG ] Reading configuration from /etc/salt/minion
[DEBUG ] Including configuration from '/etc/salt/minion.d/mine.conf'
[DEBUG ] Reading configuration from /etc/salt/minion.d/mine.conf
[DEBUG ] Loaded groupadd as virtual group
[DEBUG ] Loaded localemod as virtual locale
[DEBUG ] Loaded linux_sysctl as virtual sysctl
[DEBUG ] Loaded debian_ip as virtual ip
[DEBUG ] Loaded parted as virtual partition
[DEBUG ] Loaded gnomedesktop as virtual gnome
[DEBUG ] Loaded aptpkg as virtual pkg
[DEBUG ] Loaded zcbuildout as virtual buildout
[DEBUG ] Loaded sysmod as virtual sys
[DEBUG ] Loaded djangomod as virtual django
[DEBUG ] Loaded upstart as virtual service
[DEBUG ] Loaded htpasswd as virtual webutil
[DEBUG ] Loaded useradd as virtual user
[DEBUG ] Loaded dpkg as virtual lowpkg
[DEBUG ] Loaded cmdmod as virtual cmd
[DEBUG ] Loaded ini_manage as virtual ini
[DEBUG ] Loaded debconfmod as virtual debconf
[DEBUG ] Loaded etcd_mod as virtual etcd
[DEBUG ] Loaded virtualenv_mod as virtual virtualenv
[DEBUG ] Loaded deb_apache as virtual apache
[WARNING ] /usr/lib/python2.7/dist-packages/salt/loader.py:1046: DeprecationWarning: The 'salt.loaded.int.module.deb_apache' module is renaming itself in it's virtual() function (deb_apache => apache). Please set it's virtual name as the 'virtualname' module attribute. Example: "virtualname = 'apache'"
virtual

[DEBUG ] Loaded ldapmod as virtual ldap
[DEBUG ] Loaded linux_lvm as virtual lvm
[DEBUG ] Loaded memcache_return as virtual memcache
[DEBUG ] Loaded etcd_return as virtual etcd
[DEBUG ] Loaded couchdb_return as virtual couchdb
[DEBUG ] Loaded smtp_return as virtual smtp
[DEBUG ] Loaded syslog_return as virtual syslog
[DEBUG ] Loaded carbon_return as virtual carbon
[DEBUG ] Loaded sqlite3_return as virtual sqlite3
[DEBUG ] I am playpen-andrej.fqdn.removed and I am not supposed to start any proxies. (Likely not a problem)
[INFO ] Minion is starting as user 'root'
[DEBUG ] Minion 'playpen-andrej.fqdn.removed' trying to tune in
[DEBUG ] Minion PUB socket URI: ipc:///var/run/salt/minion/minion_event_8a6de61b96fb02d932bc49f886927dcc944cc48b55b143ea84d16dee8877a076389177776c6bdc43ea3fea37b5fa8412740d18c6421b0b02b3e503e1c6e2c233_pub.ipc
[DEBUG ] Minion PULL socket URI: ipc:///var/run/salt/minion/minion_event_8a6de61b96fb02d932bc49f886927dcc944cc48b55b143ea84d16dee8877a076389177776c6bdc43ea3fea37b5fa8412740d18c6421b0b02b3e503e1c6e2c233_pull.ipc
[INFO ] Starting pub socket on ipc:///var/run/salt/minion/minion_event_8a6de61b96fb02d932bc49f886927dcc944cc48b55b143ea84d16dee8877a076389177776c6bdc43ea3fea37b5fa8412740d18c6421b0b02b3e503e1c6e2c233_pub.ipc
[ERROR ] An un-handled exception was caught by salt's global exception handler:
ZMQError: File name too long
Traceback (most recent call last):
File "/usr/bin/salt-minion", line 14, in
salt_minion()
File "/usr/lib/python2.7/dist-packages/salt/scripts.py", line 37, in salt_minion
minion.start()
File "/usr/lib/python2.7/dist-packages/salt/init.py", line 247, in start
self.minion.tune_in()
File "/usr/lib/python2.7/dist-packages/salt/minion.py", line 1296, in tune_in
self._prepare_minion_event_system()
File "/usr/lib/python2.7/dist-packages/salt/minion.py", line 377, in _prepare_minion_event_system
self.epub_sock.bind(epub_uri)
File "socket.pyx", line 432, in zmq.core.socket.Socket.bind (zmq/core/socket.c:3894)
File "checkrc.pxd", line 23, in zmq.core.checkrc._check_rc (zmq/core/socket.c:5754)
ZMQError: File name too long
Traceback (most recent call last):
File "/usr/bin/salt-minion", line 14, in
salt_minion()
File "/usr/lib/python2.7/dist-packages/salt/scripts.py", line 37, in salt_minion
minion.start()
File "/usr/lib/python2.7/dist-packages/salt/init.py", line 247, in start
self.minion.tune_in()
File "/usr/lib/python2.7/dist-packages/salt/minion.py", line 1296, in tune_in
self._prepare_minion_event_system()
File "/usr/lib/python2.7/dist-packages/salt/minion.py", line 377, in _prepare_minion_event_system
self.epub_sock.bind(epub_uri)
File "socket.pyx", line 432, in zmq.core.socket.Socket.bind (zmq/core/socket.c:3894)
File "checkrc.pxd", line 23, in zmq.core.checkrc._check_rc (zmq/core/socket.c:5754)
zmq.error.ZMQError: File name too long

@basepi
Copy link
Contributor

basepi commented Apr 23, 2014

Awesome, thanks for testing that! We'll definitely investigate.

@dattas
Copy link

dattas commented May 5, 2014

+1, can confirm that changing the hash function back to md5 fixes this for this package.

@smithjm
Copy link

smithjm commented May 20, 2014

I can confirm this on fedora19 and fedora20 (salt 2014.1.4). switching back to md5 solved the problem. Switching to sha384 interestingly eliminated the error messages, but not the highstate hangs on the minion side when doing a salt-call.

@basepi
Copy link
Contributor

basepi commented May 20, 2014

Thanks for the continued updates.

@terminalmage
Copy link
Contributor

There is a known kernel limitation for socket path length, defined in <sys/un.h>:

/* Structure describing the address of an AF_LOCAL (aka AF_UNIX) socket.  */
struct sockaddr_un
  {
    __SOCKADDR_COMMON (sun_);
    char sun_path[108];         /* Path name.  */
  };

This path is limited to 107 characters, since C strings end with a null terminator. The path of the socket Salt is trying to use is 171 characters long, which is what is causing the exception.

The error can be tracked down to this block of code. It looks like some work had already been done in the past to shorten the path length in the event that sha256 was being used. However, this was shortening id hash to 10 characters, when the length of an md5 hash is 32 characters.

I know what needs to be done to resolve this, but we need to decide on how we're going to handle this path name. If the idea is that we want the path to be the same every time, in my opinion we should still be using a hash, but we should always shorten it to the same length. If we don't care about the socket path being the same every time, then there is no reason we can't use tempfille.mktemp() or something like that to derive a unique path.

My guess is that the idea is to have a regular path name, because if they are not cleaned up when the minion stops, or the minion does not exit cleanly, then you'll end up with a bunch of sockets accumulating in the sock_dir. So, I think the best course of action is to truncate the id hash to 10 characters in all cases.

@terminalmage
Copy link
Contributor

After some internal discussion, the choice will indeed be to truncate the ID hash to 10 chars.

terminalmage added a commit to terminalmage/salt that referenced this issue May 22, 2014
This helps avoid the sock path being longer than the max length allowed
by the kernel. See the following comment for more information:

saltstack#12172 (comment)
techhat added a commit that referenced this issue May 22, 2014
@terminalmage
Copy link
Contributor

Fix has been merged and unit test has been added, this fix should be in 2014.1.5.

terminalmage added a commit to terminalmage/salt that referenced this issue May 23, 2014
This helps avoid the sock path being longer than the max length allowed
by the kernel. See the following comment for more information:

saltstack#12172 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants