Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

salt-server doesn't appropriately raise Permission Denied error in /etc/salt/master to the log causing a vague SaltReqTimeoutError in the minion #11783

Closed
combusean opened this issue Apr 5, 2014 · 11 comments
Labels
Bug broken, incorrect, or confusing behavior fixed-pls-verify fix is linked, bug author to confirm fix help-wanted Community help is needed to resolve this severity-low 4th level, cosemtic problems, work around exists
Milestone

Comments

@combusean
Copy link

I had expected that salt read configuration files and switched to the user declared in /etc/salt/master like in other services, in fact it appears to switch very early on thus configuration files that were only readable by root were never readable by the user. The server did not log this critical error to /var/log/salt/master by default.

It wasn't until that I ran it in debug did I see the issue, specifically IOError: [Errno 13] Permission denied: '/etc/salt/master' in salt-master -l all. The fix was simple: chown -R [salt user] /etc/salt .

The misconfiguration caused the salt minion to raise a SaltReqTimeoutError (even in -l all) even tho it could make TCP connections to the salt-master, and, as a test and perhaps another separate issue in and of itself, salt minion raises SaltReqTimeoutError even if the salt master refuses TCP connections.

I propose two fixes: that salt-master's failure to read configuration files be raised better, and that salt-minion traps connection issues better rather than raising a generic and red herring SaltReqTimeoutError.

I'd be happy to help contribute these fixes but would need some assistance from an experienced salt contributor as I am new to salt.

@combusean combusean changed the title salt-server doesn't appropriately raise Permission Denied error in /etc/salt/master to the log causing a vague SaltReqTimeoutError salt-server doesn't appropriately raise Permission Denied error in /etc/salt/master to the log causing a vague SaltReqTimeoutError in the minion Apr 5, 2014
@basepi
Copy link
Contributor

basepi commented Apr 7, 2014

These are both good proposed fixes. I've always been surprised that there's not more information when we raise a SaltReqTimeoutError -- though it's possible that ZeroMQ doesn't give us enough relevant information. I've never had a chance to look into it, though.

You said it is logging the IOError, but only as DEBUG level? That surprises me -- we usually log tracebacks at warning or higher! We should definitely fix that.

@basepi basepi added Bug labels Apr 7, 2014
@basepi basepi added this to the Outstanding Bugs milestone Apr 7, 2014
@basepi
Copy link
Contributor

basepi commented Apr 7, 2014

Can you include the log that contains the IOError? If there's a whole traceback, it will save time figuring out where the problem is occurring.

@combusean
Copy link
Author

[root@ip-10-0-1-249 ~]# salt-master -l debug
[DEBUG   ] Reading configuration from /etc/salt/master
[DEBUG   ] Configuration file path: /etc/salt/master
[INFO    ] Setting up the Salt Master
[DEBUG   ] Loaded master key: /etc/salt/pki/master/master.pem
[INFO    ] Preparing the root key for local communication
[DEBUG   ] Removing stale keyfile: /var/cache/salt/master/.root_key
[INFO    ] Preparing the atweb key for local communication
[DEBUG   ] Removing stale keyfile: /var/cache/salt/master/.atweb_key
[DEBUG   ] Created pidfile: /var/run/salt-master.pid
[DEBUG   ] Chowned pidfile: /var/run/salt-master.pid to user: atweb
[INFO    ] salt-master is starting as user 'root'
[INFO    ] Current values for max open files soft/hard setting: 1024/4096
[INFO    ] The value for the 'max_open_files' setting, 100000, is higher than what the user running salt is allowed to raise to, 4096. Defaulting to 4096.
[INFO    ] Raising max open files value to 4096
[INFO    ] New values for max open files soft/hard values: 4096/4096
[DEBUG   ] Reading configuration from /etc/salt/master
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib64/python2.6/multiprocessing/process.py", line 232, in _bootstrap
    self.run()
  File "/usr/lib64/python2.6/multiprocessing/process.py", line 88, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python2.6/site-packages/salt/master.py", line 196, in _clear_old_jobs
    search = salt.search.Search(self.opts)
  File "/usr/lib/python2.6/site-packages/salt/search/__init__.py", line 92, in __init__
    matcher=False)
  File "/usr/lib/python2.6/site-packages/salt/minion.py", line 303, in __init__
    self.opts = salt.config.minion_config(opts['conf_file'])
  File "/usr/lib/python2.6/site-packages/salt/config.py", line 676, in minion_config
    overrides = load_config(path, env_var, DEFAULT_MINION_OPTS['conf_file'])
  File "/usr/lib/python2.6/site-packages/salt/config.py", line 577, in load_config
    opts = _read_conf_file(path)
  File "/usr/lib/python2.6/site-packages/salt/config.py", line 502, in _read_conf_file
    with salt.utils.fopen(path, 'r') as conf_file:
  File "/usr/lib/python2.6/site-packages/salt/utils/__init__.py", line 1037, in fopen
    fhandle = open(*args, **kwargs)
IOError: [Errno 13] Permission denied: '/etc/salt/master'
[INFO    ] Starting the Salt Publisher on tcp://0.0.0.0:4505
[INFO    ] Starting the Salt Puller on ipc:///var/run/salt/master/publish_pull.ipc
[DEBUG   ] Halite: Unavailable.
[INFO    ] Setting up the master communication server
[INFO    ] Starting Salt worker process 0
[INFO    ] Starting Salt worker process 1
[DEBUG   ] MasterEvent PUB socket URI: ipc:///var/run/salt/master/master_event_pub.ipc
[DEBUG   ] MasterEvent PULL socket URI: ipc:///var/run/salt/master/master_event_pull.ipc
[DEBUG   ] Reading configuration from /etc/salt/master
Process MWorker-4:
Traceback (most recent call last):
  File "/usr/lib64/python2.6/multiprocessing/process.py", line 232, in _bootstrap
    self.run()
  File "/usr/lib/python2.6/site-packages/salt/master.py", line 786, in run
    self.crypticle)
  File "/usr/lib/python2.6/site-packages/salt/master.py", line 1744, in __init__
    self.local = salt.client.LocalClient(self.opts['conf_file'])
  File "/usr/lib/python2.6/site-packages/salt/client/__init__.py", line 111, in __init__
    self.opts = salt.config.client_config(c_path)
  File "/usr/lib/python2.6/site-packages/salt/config.py", line 1974, in client_config
[INFO    ] Starting Salt worker process 2
    master_config(path, defaults=defaults)
  File "/usr/lib/python2.6/site-packages/salt/config.py", line 1851, in master_config
    overrides = load_config(path, env_var, DEFAULT_MASTER_OPTS['conf_file'])
  File "/usr/lib/python2.6/site-packages/salt/config.py", line 577, in load_config
    opts = _read_conf_file(path)
  File "/usr/lib/python2.6/site-packages/salt/config.py", line 502, in _read_conf_file
    with salt.utils.fopen(path, 'r') as conf_file:
  File "/usr/lib/python2.6/site-packages/salt/utils/__init__.py", line 1037, in fopen
    fhandle = open(*args, **kwargs)
IOError: [Errno 13] Permission denied: '/etc/salt/master'
IOError: [Errno 13] Permission denied: '/etc/salt/master'
[DEBUG   ] MasterEvent PUB socket URI: ipc:///var/run/salt/master/master_event_pub.ipc

@basepi
Copy link
Contributor

basepi commented Apr 8, 2014

Wow, we just have no IOError handling at all in that _read_conf_file function. I think the fix is to surround that with a try:except block, log the error with log.error, and then call sys.exit(2) (the master shouldn't keep running if it can't read its configuration)

This should actually probably solve your minion reporting problem as well. I think the problem was that the master didn't properly exit when it couldn't read its config file, so it was just kind of limping along and not functioning properly. This is why the minion could attempt to auth, but it would never complete, and would timeout. If we just exit the master properly, then the minion will just get a connection refused, which should be more indicative of the problem.

@combusean
Copy link
Author

In my testing with Salt minions, even when I purposefully misconfigured the minion to localhost on a port that wasn't receiving connections at all, the SaltReqTimeoutError still appeared rather than a connection refused.

@s0undt3ch
Copy link
Collaborator

Which salt version did this happen in?

@combusean
Copy link
Author

The latest, 2014.1.1

s0undt3ch added a commit to s0undt3ch/salt that referenced this issue Apr 9, 2014
The file might exist, we just might not be able to read it.
Refs saltstack#11783.
@rallytime
Copy link
Contributor

@combusean Does the above PR alleviate your issues?

@combusean
Copy link
Author

As far as I can tell, this should fix the server side issues. I'm not running salt in an environment where I can readily patch to it, I'll try it this weekend and let you know.

What about the SaltReqTimeoutError on the client?

@basepi
Copy link
Contributor

basepi commented Apr 17, 2014

That's more difficult. As far as the client was concerned, it tried to contact the master and the master didn't reply. So it raised a timeout error. There's really no way for the minion to diagnose what's wrong with the master if it gets into a weird state like that. I'm not sure it's fixable, unfortunately. =\

@basepi basepi modified the milestones: Approved, Outstanding Bugs Apr 21, 2014
@basepi
Copy link
Contributor

basepi commented Apr 24, 2014

I'm going to close this for now, as there's really no way for the minion to diagnose problems on the master. Glad we got the root cause resolved, though.

@basepi basepi closed this as completed Apr 24, 2014
s0undt3ch added a commit that referenced this issue Apr 25, 2014
The file might exist, we just might not be able to read it.
Refs #11783.

Conflicts:
	salt/config.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior fixed-pls-verify fix is linked, bug author to confirm fix help-wanted Community help is needed to resolve this severity-low 4th level, cosemtic problems, work around exists
Projects
None yet
Development

No branches or pull requests

4 participants