Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minion_id with BOM causes master to hang #12296

Closed
bkeroack opened this issue Apr 25, 2014 · 9 comments · Fixed by #13203
Closed

minion_id with BOM causes master to hang #12296

bkeroack opened this issue Apr 25, 2014 · 9 comments · Fixed by #13203
Labels
Bug broken, incorrect, or confusing behavior fixed-pls-verify fix is linked, bug author to confirm fix severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Milestone

Comments

@bkeroack
Copy link

If the file minion_id begins with a BOM (byte-order mark, FF FE in hex bytes) as is the default for Windows unicode file writing, when the minion tries to authenticate with the master it will cause the master to hang, requiring a restart.

The following messages are typical of what appears in the master log:

2014-04-25 17:29:43,104 [salt.master][INFO] Authentication request from ��web014
2014-04-25 17:29:44,152 [salt.master][INFO] Authentication request from ��web013
2014-04-25 17:40:11,787 [salt.master][INFO] Authentication request from ��web011
2014-04-25 17:40:12,866 [salt.master][INFO] Authentication request from ��web012
2014-04-25 17:40:13,116 [salt.master][INFO] Authentication request from ��web014

Note that the master hangs immediately after this log message.

The bug can be reproduced (using minion and master version 2014.1.3) by doing the following in Powershell on the minion:

PS> $($env:COMPUTERNAME).toLower() > C:\salt\conf\minion_id
PS> Restart-Service salt-minion

This will cause the master to hang.

Workaround is to ensure minion_id is ASCII:

PS> $($env:COMPUTERNAME).toLower() |Out-File -Encoding "ASCII" C:\salt\conf\minion_id
@basepi
Copy link
Contributor

basepi commented Apr 25, 2014

Wow, thanks for pointing this out! We'll get it fixed!

@basepi basepi added this to the Approved milestone Apr 25, 2014
@cachedout cachedout removed the Windows label Jun 2, 2014
@cachedout
Copy link
Contributor

I cannot replicate this.

Here's the method that I have used:

>>> f = codecs.open('/tmp/foo', 'w', 'utf-8')
>>> f.write(u'\uFEFFfoo')
>>> f.close()

I then cat /tmp/foo over to /etc/salt/minion_id.

I then start a minion, which starts and auths fine. I can see that it's not responding to simply salt foo test.ping but does respond with the minion ID preceeded by a BOM in the master logs:

[DEBUG   ] Sending event - data = {'fun_args': [], 'jid': '20140602150336641920', 'return': True, 'retcode': 0, 'success': True, 'cmd': '_return', '_stamp': '2014-06-02T15:03:36.681348', 'fun': 'test.ping', 'id': '\xef\xbb\xbffoo'}

And from the console:

root@silver:~# salt-run manage.up
foo

@bkeroack
Copy link
Author

bkeroack commented Jun 2, 2014

To reproduce this bug reliably, you will need to use a Windows minion and Powershell. See the script cited above.

@dsumsky
Copy link
Contributor

dsumsky commented Jul 16, 2014

Hello,
could anybody confirm me that this bug is resolved now?!? I'm running SaltStack 2014.1.7 (tested on 2014.5, 2014.1.3 with same results) in multi-master mode with heterogeneous group of Windows/Linux minions. Basically, the behavior is the same as described above. We have a Powershell script to bootstrap Windows minions which generates minion_id. By default, it is in unicode encoding. When converted to pure ASCII the issues is gone.

The point is that I can't ensure in our environment that all the Windows minions are bootstrapped correctly and so to keep SaltStack environment stable. Just one malign Windows minion can destroy whole setup!!!

Salt version report:
Salt: 2014.1.7
Python: 2.6.6 (r266:84292, Jan 22 2014, 09:42:36)
Jinja2: 2.2.1
M2Crypto: 0.20.2
msgpack-python: 0.1.13
msgpack-pure: Not Installed
pycrypto: 2.0.1
PyYAML: 3.10
PyZMQ: 2.2.0.1
ZMQ: 3.2.4

Malign authentication request from master log file:
2014-07-16 13:21:43,518 [salt.master ][INFO ] Clear payload received with command _auth
2014-07-16 13:21:43,519 [salt.master ][INFO ] Authentication request from ��i-09682422

2014-07-16 13:23:03,531 [salt.master ][INFO ] Clear payload received with command _auth
2014-07-16 13:23:03,532 [salt.master ][INFO ] Authentication request from ��i-09682422

2014-07-16 13:24:23,519 [salt.master ][INFO ] Clear payload received with command _auth
2014-07-16 13:24:23,520 [salt.master ][INFO ] Authentication request from ��i-09682422

Dead master threads:
root 23410 0.0 1.3 979908 22104 ? Sl 12:57 0:00 /usr/bin/python /usr/bin/salt-master -d
root 23411 0.0 1.8 395184 31768 ? Sl 12:57 0:01 /usr/bin/python /usr/bin/salt-master -d
root 23429 0.0 1.2 521156 21140 ? Sl 12:57 0:00 /usr/bin/python /usr/bin/salt-master -d
root 23432 0.0 1.2 521156 21884 ? Sl 12:57 0:00 /usr/bin/python /usr/bin/salt-master -d
root 23435 0.0 0.0 0 0 ? Z 12:57 0:02 [salt-master]
root 23438 0.0 0.0 0 0 ? Z 12:57 0:02 [salt-master]
root 23441 0.0 0.0 0 0 ? Z 12:57 0:02 [salt-master]
root 23444 0.0 0.0 0 0 ? Z 12:57 0:02 [salt-master]
root 23447 0.0 0.0 0 0 ? Z 12:57 0:02 [salt-master]
root 23450 0.0 0.0 0 0 ? Z 12:57 0:02 [salt-master]
root 23451 0.0 0.0 0 0 ? Z 12:57 0:02 [salt-master]
root 23454 0.0 0.0 0 0 ? Z 12:57 0:02 [salt-master]
root 23457 0.0 0.0 0 0 ? Z 12:57 0:02 [salt-master]
root 23460 0.0 0.0 0 0 ? Z 12:57 0:02 [salt-master]
root 23463 0.0 0.0 0 0 ? Z 12:57 0:02 [salt-master]
root 23468 0.0 0.0 0 0 ? Z 12:57 0:02 [salt-master]
root 23471 0.0 0.0 0 0 ? Z 12:57 0:02 [salt-master]
root 23475 0.0 0.0 0 0 ? Z 12:57 0:02 [salt-master]
root 23490 0.1 0.0 0 0 ? Z 12:57 0:02 [salt-master]

@basepi
Copy link
Contributor

basepi commented Jul 16, 2014

I don't think this got cherry-picked, so the fix wasn't in 2014.1.7. We need to evaluate backporting this fix for 2014.1.8. As a side note, 2014.7.0 is in feature freeze, and should have a release candidate in the next week or so so you'll be able to test this.

I've marked the fix for cherry-picking so I will evaluate it when 2014.1.8 is closer.

@dsumsky
Copy link
Contributor

dsumsky commented Jul 16, 2014

I don't plan to switch to 2014.7.x release immediately although I would like to. I was struggling with other problems with some regression bugs in 2014.1.x like not working built-in scheduler before 2014.1.5, multi-master bugs in 2014.1.0 etc. etc. With 2014.1.7, it seems quite stable apart from this bug and " Multi-master minion not failing over properly for state runs #13944 ".

Btw, from the code above, is the fix at minion side or on master side? I'm bit afraid of the first one as I can't assure that all the minions in our environment are updated. I hope the check/BOM removal is perform on master.

Anyway, I tried to update config.py (from 2014.1.7) with the merged changes by hand and masters died almost immediately. But this might have been my mistake so I will wait till 2014.1.8 is released.

I would appreciate to release 2014.1.8 asap as this is IMO really serious issue as it is pretty simple to destroy any heterogeneous Salt environment. Thanks.

@basepi
Copy link
Contributor

basepi commented Jul 16, 2014

The fix is, unfortunately, on the minion side. Since it's a problem with the minion ID, we have to change it right at the point of generating the minion ID, so that the master and minion both call the minion by the same name. (Otherwise all sorts of things would break, the biggest of which would be targeting)

@dsumsky
Copy link
Contributor

dsumsky commented Jul 17, 2014

By the chance, are you aware of any mechanism which would help to filter such "malformed" minion IDs out at the master level? I was thinking of reactor system but I guess that authentication request is already processed at this stage and so the master is dead now. What do you think?

@basepi
Copy link
Contributor

basepi commented Jul 17, 2014

Again, this could very well be done, but the problem is, now the minion knows itself as one name (the malformed name) and the master knows it by a different name. That's going to cause all sorts of targeting problems. We have to catch those malformed names on the minion side to keep them consistent throughout.

cachedout pushed a commit that referenced this issue Jul 30, 2014
Closes #12296

Conflicts:
	salt/config.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior fixed-pls-verify fix is linked, bug author to confirm fix severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants