New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Service state design flaw #24783
Comments
service.status is init's dependant your asumptions are mostly wrong. |
if you want to fix something, fix your init script status command. |
(really, i think your problem is a broken status command) |
@kiorky With respect, no... this has nothing to do with the init script, I wouldn't be filing a bug if it were. Please don't tell me my "assumptions" are wrong without at least checking first. It's not constructive to solving issues. Digging in further the problem is actually worse, it's inconsistently behaving the way I described based on the distribution. Summary version is... unless you're running on something it recognises as debian, bsd or redhat, your results are going to be wrong in some way. However, none of the service modules have the capacity to return anything from the init script beyond a strict boolean, so your comment is wrong in all cases. The init script cannot cause the behaviour I described, no matter how broken it is. I've snipped the doc strings for sanity. However....
I can probably fix the Gentoo one, for openrc at any rate, but that doesn't solve the issue that the generic behaviour is going to break state reporting. Debian (Netbsd and Openbsd are the same except in how they generate the cmd variable): def status(name, sig=None):
if sig:
return bool(__salt__['status.pid'](sig))
cmd = _service_cmd(name, 'status')
return not __salt__['cmd.retcode'](cmd) Gentoo: def status(name, sig=None):
return __salt__['status.pid'](sig if sig else name) Redhat: def status(name, sig=None):
if _service_is_upstart(name):
cmd = 'status {0}'.format(name)
return 'start/running' in __salt__['cmd.run'](cmd, python_shell=False)
if sig:
return bool(__salt__['status.pid'](sig))
cmd = '/sbin/service {0} status'.format(name)
return __salt__['cmd.retcode'](cmd, python_shell=False, ignore_retcode=True) == 0 Service.py, for all platforms not supported elsewhere: def status(name, sig=None):
return __salt__['status.pid'](sig if sig else name) Windows: def status(name, sig=None):
cmd = ['sc', 'query', name]
statuses = __salt__['cmd.run'](cmd, python_shell=False).splitlines()
for line in statuses:
if 'RUNNING' in line:
return True
elif 'STOP_PENDING' in line:
return True
return False |
i will stop respond to you, you didnt read correctly my comments, @jfindlay @basepi @rallytime i m thinking that the bug is still invalid. |
@basepi, the pb here is clearly the status command returning wrong information (mis implemented status command in one of his services) |
@basepi it's the job of the init process (sysv, or managers like systemd/upstart and so on) to mess with processes, what does (and correctly) do service.* actually is to ask them via the status cmd. The problem is on the raw status got from the underlying init system. |
@kiorky I have just provided the actual code responsible for handling the service.status calls, none of which are capable of retrieving a list of pids from an init script. None. Every single call I could find from a service.status implementation to an init script is wrapped in either bool() or a boolean casting operator ( Further the code I provided clearly demonstrates that what I'm describing is intended behaviour under some conditions. If you're going to accuse me of not reading your comments, the least you could do is properly read my responses to said comments before making your assessment of my ability to read. Doing otherwise is simply wasting everyone's time, your own included. |
pid for the salt's status cmd is unrelated. |
by a discussion we had on IRC, there is still a brittle point, but not on state, it's more on the |
In other words, virtuals should less rely on grains but more on running systems. |
If pids are being returned, can't we just force those before/after toggle statuses to bools? I think that's really what the function is expecting anyway. |
@kiorky I know you're worried about this being changed and introducing regressions, but you need to be less confrontational. It's obviously not working for someone else, and we need to carefully investigate why, and fix it if we can, without causing regressions in such a popular state function. @kaithar Thanks for the report, see my comment above. |
@basepi there are no pids returned, and the functions toggles are awaitening booleans already, the state logic, as i aforementioned is correct and must not changed at all cost for this reason. I never said that there was not a bug, but the bug is really not in the state, which i think is now in good shape. The problem is in what certain exec mods return. |
I think I agree with you, any |
Sorry for the delay on replying, I was away for a week. I agree with the notion that it should be returning a bool, the disagreement arises from the status of The issue is further confused by the fact that there are both OS specific service modules ( upstart.py def __virtual__():
'''
Only work on Ubuntu
'''
# Disable on these platforms, specific service modules exist:
if salt.utils.systemd.booted(__context__):
return False
elif __grains__['os'] in ('Ubuntu', 'Linaro', 'elementary OS', 'Mint'):
return __virtualname__
elif __grains__['os'] in ('Debian', 'Raspbian'):
debian_initctl = '/sbin/initctl'
if os.path.isfile(debian_initctl):
initctl_version = salt.modules.cmdmod._run_quiet(debian_initctl + ' version')
if 'upstart' in initctl_version:
return __virtualname__
return False debian_service.py def __virtual__():
'''
Only work on Debian and when systemd isn't running
'''
if __grains__['os'] in ('Debian', 'Raspbian') and not salt.utils.systemd.booted(__context__):
return __virtualname__
return False Something I'm not certain on is what happens when multiple modules have disable = set((
'RedHat',
'CentOS',
'Amazon',
'ScientificLinux',
'CloudLinux',
'Fedora',
'Gentoo',
'Ubuntu',
'Debian',
'Arch',
'Arch ARM',
'ALT',
'SUSE Enterprise Server',
'OEL',
'Linaro',
'elementary OS',
'McAfee OS Server',
'Mint'
)) So yeah, consistency is my first main objection to the current service implemenations. My second main objection is that the documentation for '''
The default service module, if not otherwise specified salt will fall back
to this basic module
''' That to me says "this is the reference module, specific service modules should provide a superset of my behaviours" ... if one of the service implementations differ in behaviour from that default service module, the default has to be considered correct by definition. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue. |
Ok, I get the intent of this but it's horribly flawed and I can't think how to fix it. The problem stems from these lines in states/service.py:dead()
This seems logical enough, get the status before, check if it's actually running, then afterwards get the status again and compare to try and determine if we made a change.
Only that doesn't work for an obvious reason... service.status is returning the pids of the processes it thinks belongs to that service. That might result in sane behaviour for something that has a single process model, but if you bring something like apache prefork in to the picture you have a reasonably constant failure scenario. Any service that forks worker processes off on a regular basis is going to trigger false alarms to greater or lesser extents.
Any ideas how the logic can be adjusted to be more reliable? Parent process detection might be a workable approach?
The text was updated successfully, but these errors were encountered: