-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
salt-server doesn't appropriately raise Permission Denied error in /etc/salt/master to the log causing a vague SaltReqTimeoutError in the minion #11783
Comments
These are both good proposed fixes. I've always been surprised that there's not more information when we raise a SaltReqTimeoutError -- though it's possible that ZeroMQ doesn't give us enough relevant information. I've never had a chance to look into it, though. You said it is logging the IOError, but only as DEBUG level? That surprises me -- we usually log tracebacks at warning or higher! We should definitely fix that. |
Can you include the log that contains the IOError? If there's a whole traceback, it will save time figuring out where the problem is occurring. |
|
Wow, we just have no IOError handling at all in that This should actually probably solve your minion reporting problem as well. I think the problem was that the master didn't properly exit when it couldn't read its config file, so it was just kind of limping along and not functioning properly. This is why the minion could attempt to auth, but it would never complete, and would timeout. If we just exit the master properly, then the minion will just get a connection refused, which should be more indicative of the problem. |
In my testing with Salt minions, even when I purposefully misconfigured the minion to localhost on a port that wasn't receiving connections at all, the SaltReqTimeoutError still appeared rather than a connection refused. |
Which salt version did this happen in? |
The latest, 2014.1.1 |
The file might exist, we just might not be able to read it. Refs saltstack#11783.
@combusean Does the above PR alleviate your issues? |
As far as I can tell, this should fix the server side issues. I'm not running salt in an environment where I can readily patch to it, I'll try it this weekend and let you know. What about the SaltReqTimeoutError on the client? |
That's more difficult. As far as the client was concerned, it tried to contact the master and the master didn't reply. So it raised a timeout error. There's really no way for the minion to diagnose what's wrong with the master if it gets into a weird state like that. I'm not sure it's fixable, unfortunately. =\ |
I'm going to close this for now, as there's really no way for the minion to diagnose problems on the master. Glad we got the root cause resolved, though. |
The file might exist, we just might not be able to read it. Refs #11783. Conflicts: salt/config.py
I had expected that salt read configuration files and switched to the user declared in /etc/salt/master like in other services, in fact it appears to switch very early on thus configuration files that were only readable by root were never readable by the user. The server did not log this critical error to /var/log/salt/master by default.
It wasn't until that I ran it in debug did I see the issue, specifically IOError: [Errno 13] Permission denied: '/etc/salt/master' in salt-master -l all. The fix was simple: chown -R [salt user] /etc/salt .
The misconfiguration caused the salt minion to raise a SaltReqTimeoutError (even in -l all) even tho it could make TCP connections to the salt-master, and, as a test and perhaps another separate issue in and of itself, salt minion raises SaltReqTimeoutError even if the salt master refuses TCP connections.
I propose two fixes: that salt-master's failure to read configuration files be raised better, and that salt-minion traps connection issues better rather than raising a generic and red herring SaltReqTimeoutError.
I'd be happy to help contribute these fixes but would need some assistance from an experienced salt contributor as I am new to salt.
The text was updated successfully, but these errors were encountered: