-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
minion_id with BOM causes master to hang #12296
Comments
Wow, thanks for pointing this out! We'll get it fixed! |
I cannot replicate this. Here's the method that I have used:
I then cat I then start a minion, which starts and auths fine. I can see that it's not responding to simply
And from the console:
|
To reproduce this bug reliably, you will need to use a Windows minion and Powershell. See the script cited above. |
Hello, The point is that I can't ensure in our environment that all the Windows minions are bootstrapped correctly and so to keep SaltStack environment stable. Just one malign Windows minion can destroy whole setup!!! Salt version report: Malign authentication request from master log file: 2014-07-16 13:23:03,531 [salt.master ][INFO ] Clear payload received with command _auth 2014-07-16 13:24:23,519 [salt.master ][INFO ] Clear payload received with command _auth Dead master threads: |
I don't think this got cherry-picked, so the fix wasn't in 2014.1.7. We need to evaluate backporting this fix for 2014.1.8. As a side note, 2014.7.0 is in feature freeze, and should have a release candidate in the next week or so so you'll be able to test this. I've marked the fix for cherry-picking so I will evaluate it when 2014.1.8 is closer. |
I don't plan to switch to 2014.7.x release immediately although I would like to. I was struggling with other problems with some regression bugs in 2014.1.x like not working built-in scheduler before 2014.1.5, multi-master bugs in 2014.1.0 etc. etc. With 2014.1.7, it seems quite stable apart from this bug and " Multi-master minion not failing over properly for state runs #13944 ". Btw, from the code above, is the fix at minion side or on master side? I'm bit afraid of the first one as I can't assure that all the minions in our environment are updated. I hope the check/BOM removal is perform on master. Anyway, I tried to update config.py (from 2014.1.7) with the merged changes by hand and masters died almost immediately. But this might have been my mistake so I will wait till 2014.1.8 is released. I would appreciate to release 2014.1.8 asap as this is IMO really serious issue as it is pretty simple to destroy any heterogeneous Salt environment. Thanks. |
The fix is, unfortunately, on the minion side. Since it's a problem with the minion ID, we have to change it right at the point of generating the minion ID, so that the master and minion both call the minion by the same name. (Otherwise all sorts of things would break, the biggest of which would be targeting) |
By the chance, are you aware of any mechanism which would help to filter such "malformed" minion IDs out at the master level? I was thinking of reactor system but I guess that authentication request is already processed at this stage and so the master is dead now. What do you think? |
Again, this could very well be done, but the problem is, now the minion knows itself as one name (the malformed name) and the master knows it by a different name. That's going to cause all sorts of targeting problems. We have to catch those malformed names on the minion side to keep them consistent throughout. |
Closes #12296 Conflicts: salt/config.py
If the file minion_id begins with a BOM (byte-order mark, FF FE in hex bytes) as is the default for Windows unicode file writing, when the minion tries to authenticate with the master it will cause the master to hang, requiring a restart.
The following messages are typical of what appears in the master log:
Note that the master hangs immediately after this log message.
The bug can be reproduced (using minion and master version 2014.1.3) by doing the following in Powershell on the minion:
This will cause the master to hang.
Workaround is to ensure minion_id is ASCII:
The text was updated successfully, but these errors were encountered: