Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add opkg/NILinuxRT 'reboot required' witnessing feature #46774

Merged
merged 6 commits into from May 9, 2018

Conversation

Projects
None yet
6 participants
@10ne1
Copy link
Contributor

commented Mar 29, 2018

What does this PR do?

Add new reboot required witnessed logic to opkg operations on NILRT systems similar to how win_system.py does it, using the same system API as on windows.

What issues does this PR fix or reference?

Doesn't have an issue assigned.

New Behavior

A new Linux system API is added for getting/setting reboot witness events which is called by the opkg module. The witness events are stored in volatile memory such that they're cleared upon reboot.

Tests written?

No

Commits signed with GPG?

No

Please review Salt's Contributing Guide for best practices.

See GitHub's page on GPG signing for more information about signing commits with GPG.

@rallytime rallytime requested review from terminalmage and skizunov Mar 29, 2018

@skizunov
Copy link
Contributor

left a comment

There are a few instances of inconsistent use of single-quotes (') vs doublequotes ("), but I don't know if that qualifies as an issue. It looks fine overall.

@terminalmage terminalmage removed their request for review Mar 29, 2018

@@ -412,12 +453,19 @@ def install(name=None,
ret.update({pkgname: {'old': old.get(pkgname, ''),
'new': new.get(pkgname, '')}})

rs_result = __salt__['restartcheck.restartcheck'](verbose=False)
if isinstance(rs_result, dict) and 'comment' in rs_result:

This comment has been minimized.

Copy link
@cachedout

cachedout Mar 30, 2018

Collaborator

We have this duplicated over on L527. Could we refactor it to a single block that's called?

This comment has been minimized.

Copy link
@10ne1

10ne1 Apr 2, 2018

Author Contributor

Done. I've ammended the commit.

@10ne1 10ne1 force-pushed the 10ne1:develop branch from 97309ea to a90b064 Apr 2, 2018

salt '*' system.set_reboot_required_witnessed
'''
if __grains__.get('os_family') == 'NILinuxRT':

This comment has been minimized.

Copy link
@cachedout

cachedout Apr 4, 2018

Collaborator

Hrm. I don't like that this function is essentially a no-op for everything but the NILinuxRT OS family but exists in the system module which is available and consumed by all types of systems. Is this the best place for this?

This comment has been minimized.

Copy link
@skizunov

skizunov Apr 4, 2018

Contributor

@cachedout : I think the idea behind this was to copy from the architecture already put in place in salt/modules/win_system.py, which is also a "system" module (by virtual name). So the function names and location are not new. I believe this exact same feature could apply to other Linux distributions, I would assume that the author was not motivated to implement and test it on other distributions. Although, by looking at things, it seems that it wouldn't be difficult to do so.

Perhaps the author should add at least one more distribution (eg CentOS) so that your automated tests would cover this new feature?

This comment has been minimized.

Copy link
@10ne1

10ne1 Apr 4, 2018

Author Contributor

I can move it somewhere else, no problem, we just need to figure out where is the best place to put them. How about the opkg module? It was written for the NILinuxRT family and it's the only place which calls these witness functions.

This comment has been minimized.

Copy link
@terminalmage

terminalmage Apr 4, 2018

Contributor

I think that opkg.py is probably the correct place to put it. I don't see any code here that actually calls these new functions, and since they only apply to NILinuxRT they shouldn't be in a more general module.

This comment has been minimized.

Copy link
@skizunov

skizunov Apr 4, 2018

Contributor

@terminalmage : I disagree, because I think these new functions are not private and really meant to be called at the job level. Eg system.get_reboot_required_witnessed. How else would the master find out if a reboot is required on a given minion? Because they are required to be called at the job level, putting them into a module with virtual name pkg doesn't make that much sense, because all modules with virtual name pkg should adhere to a predefined set of APIs, which this falls outside of.

Since there already is precedent for system.get_reboot_required_witnessed from the Windows side, it seems like the logical place to put these functions where they are right now. This feature can easily be extended to other Linux distributions, because the main logic that checks if a reboot is required is in salt/modules/restartcheck.py, which already supports Debian and Red Hat based systems.

The pkg modules for such systems could be modified in a similar way to check if a restart is required after a package operation, and invoke system.set_reboot_required_witnessed in such a case. Then, for such systems, a user has the option of checking if a reboot is required after a package operation (eg if a new kernel is installed). It would actually be a pretty useful feature for general use.

This comment has been minimized.

Copy link
@10ne1

10ne1 Apr 5, 2018

Author Contributor

I'm ok either way, I think we just need a consensus so I know what to do further.

This comment has been minimized.

Copy link
@terminalmage

terminalmage Apr 5, 2018

Contributor

@skizunov Thanks, I overlooked your earlier comment regarding the win_system module. Sorry about that! I agree, system.py is a good place for it.

This comment has been minimized.

Copy link
@isbm

isbm Apr 11, 2018

Contributor

@terminalmage Jain ("Yes, and no", German). Here is why:

@cachedout Remember our talk at SUSECon in Prague? Here is a good reason for the namespaces. E.g.:

salt foo system.nilinuxrt.set_reboot_required_witnessed

I would certainly move it there by simply mapping them in __virtual__. An example is Ansible gate how this is done. But currently everyone gets this function that does exactly nothing to their OS. To make things even worse, it even returns whole False with error code 0 (in case of state modules.run for example), and so nobody knows if it worked or minion was down or what else?.., especially if you call this via API. There is no clarity here. I still certainly on your side that this is no-op for others, and so this is not the best solution at the moment.

At least it doesn't even raises an exception "Not supported on Foo platform", once somebody still has a very strong opinion against the namespaces in this case.

salt '*' system.get_reboot_required_witnessed
'''
if __grains__.get('os_family') == 'NILinuxRT':

This comment has been minimized.

Copy link
@cachedout

cachedout Apr 4, 2018

Collaborator

Same concern as above here.

This comment has been minimized.

Copy link
@isbm

isbm Apr 11, 2018

Contributor

Same support of @cachedout 's concern as above.

@cachedout

This comment has been minimized.

Copy link
Collaborator

commented Apr 9, 2018

@terminalmage Re-review requested.

@rallytime rallytime requested a review from terminalmage Apr 9, 2018

@salt-jenkins salt-jenkins requested a review from saltstack/team-suse Apr 9, 2018

@terminalmage
Copy link
Contributor

left a comment

@cachedout sorry, my comment here was meant to signify that my concerns were satisfied, but I neglected to submit an "Approved" review. Sorry for holding things up.

if not (os.path.exists(os.path.join(NILRT_MODULE_STATE_PATH, 'modules.dep.timestamp')) and
os.path.exists(os.path.join(NILRT_MODULE_STATE_PATH, 'modules.dep.md5sum'))):
try:
os.makedirs(NILRT_MODULE_STATE_PATH)

This comment has been minimized.

Copy link
@isbm

isbm Apr 11, 2018

Contributor

@10ne1 this is inconsistent: you are checking for files, but creating directories.

🐞 Second, bug alert: if one of the files has been not created due to various reasons (e.g. "no space left on device" which could be temporary occurrence), then you will not update the modules anymore.

Here I would change the following:

  • Check if the directory does not exist, then try to create it.
  • Check if any of files do not exists, then flush them away and update the info.

This comment has been minimized.

Copy link
@10ne1

10ne1 Apr 24, 2018

Author Contributor

Done.

except OSError as exc:
if exc.errno != errno.EEXIST:
msg = 'Error creating {0} (-{1}): {2}'.format(NILRT_MODULE_STATE_PATH, exc.errno, exc.strerror)
return (False, msg)

This comment has been minimized.

Copy link
@isbm

isbm Apr 11, 2018

Contributor

Extra parenthesis is not needed here.

This comment has been minimized.

Copy link
@10ne1

10ne1 Apr 24, 2018

Author Contributor

Done.

'''
If this is an older version of NILinuxRT, return True. Otherwise, return False.
'''
return bool(os.path.exists('/usr/local/natinst/bin/nisafemodeversion'))

This comment has been minimized.

Copy link
@isbm

isbm Apr 11, 2018

Contributor

Why you are wrapping boolean into a boolean? The os.path.exists already returns type bool.

This comment has been minimized.

Copy link
@10ne1

10ne1 Apr 24, 2018

Author Contributor

Done.

kernel = os.path.basename(kernel)
kernel = kernel.strip('bzImage-')

if __grains__.get('os_family') == 'NILinuxRT' and _is_older_nilrt():

This comment has been minimized.

Copy link
@isbm

isbm Apr 11, 2018

Contributor

Can we move this magic word NILinuxRT into a constant, please?

This comment has been minimized.

Copy link
@10ne1

10ne1 Apr 24, 2018

Author Contributor

Done.

# backwards compatible so it'll just get more complicated.
kpath = '/boot/runmode/bzImage'
kvregex = r'[0-9]+\.[0-9]+\.[0-9]+-rt'
kernel = __salt__['cmd.shell']('strings {0} | awk \'$1 ~ /{1}/ {{print $1}}\''

This comment has been minimized.

Copy link
@isbm

isbm Apr 11, 2018

Contributor

What happens if gawk package is not installed? This has to be checked in __virtual__ and properly reported. Or you would rather drop this dependency and process the output here with Python (preferred).

This comment has been minimized.

Copy link
@10ne1

10ne1 Apr 24, 2018

Author Contributor

Done (ported the awk logic to python re)

if retcode != 0:
return True

return False

This comment has been minimized.

Copy link
@isbm

isbm Apr 11, 2018

Contributor

In fact:

return bool(__salt__['cmd.retcode']('md5sum -cs {0}'.format(depmodpath_md5sum), output_loglevel="quiet"))

😉

This comment has been minimized.

Copy link
@10ne1

10ne1 Apr 24, 2018

Author Contributor

Done.

@@ -359,8 +413,14 @@ def restartcheck(ignorelist=None, blacklist=None, excludepid=None, verbose=True)
kernel_current = __salt__['cmd.run']('uname -a')
for kernel in kernel_versions:
if kernel in kernel_current:
kernel_restart = False
break
if __grains__.get('os_family') == 'NILinuxRT':

This comment has been minimized.

Copy link
@isbm

isbm Apr 11, 2018

Contributor

The same hard-coded magic word. Would be nicer to keep it in a constant.

This comment has been minimized.

Copy link
@10ne1

10ne1 Apr 24, 2018

Author Contributor

Done.

salt '*' system.set_reboot_required_witnessed
'''
if __grains__.get('os_family') == 'NILinuxRT':

This comment has been minimized.

Copy link
@isbm

isbm Apr 11, 2018

Contributor

@terminalmage Jain ("Yes, and no", German). Here is why:

@cachedout Remember our talk at SUSECon in Prague? Here is a good reason for the namespaces. E.g.:

salt foo system.nilinuxrt.set_reboot_required_witnessed

I would certainly move it there by simply mapping them in __virtual__. An example is Ansible gate how this is done. But currently everyone gets this function that does exactly nothing to their OS. To make things even worse, it even returns whole False with error code 0 (in case of state modules.run for example), and so nobody knows if it worked or minion was down or what else?.., especially if you call this via API. There is no clarity here. I still certainly on your side that this is no-op for others, and so this is not the best solution at the moment.

At least it doesn't even raises an exception "Not supported on Foo platform", once somebody still has a very strong opinion against the namespaces in this case.

salt '*' system.get_reboot_required_witnessed
'''
if __grains__.get('os_family') == 'NILinuxRT':

This comment has been minimized.

Copy link
@isbm

isbm Apr 11, 2018

Contributor

Same support of @cachedout 's concern as above.

if __grains__.get('os_family') == 'NILinuxRT':
if os.path.exists(NILRT_REBOOT_WITNESS_PATH):
return True
return False

This comment has been minimized.

Copy link
@isbm

isbm Apr 11, 2018

Contributor

Also it can be much simpler to code:

def get_reboot_required_witnessed():
    return __grains__['os_family'] == 'NILinuxRT' and os.path.exists(NILRT_REBOOT_WITNESS_PATH)

But again, I totally dislike that this function behaves as supported on, say, BSD or Solaris, while it is actually lying what it says. It is available on BSD/Solaris/Windows and returns like it would be no reboot witnessed on NILinuxRT. This is a no go for me.

@terminalmage @cachedout thoughts?

This comment has been minimized.

Copy link
@terminalmage

terminalmage Apr 12, 2018

Contributor

My understanding from the OP was that the "witness" events were something unique to NILinuxRT.

If you're concerned about this function being available on other platforms, we can perhaps write a decorator similar to salt.utils.decorators.depends which either checks for the existence of a file or runs a command and analyzes the retcode. The failure of either of those actions would keep the function from being added to the __salt__ dunder.

This comment has been minimized.

Copy link
@terminalmage

terminalmage Apr 12, 2018

Contributor

Actually, I like the idea of such decorators even if we don't use them here, so I am looking into writing them now.

This comment has been minimized.

Copy link
@isbm

isbm Apr 12, 2018

Contributor

@terminalmage 👍 for the decorator. I would still map it separately into its own namespace. Because it is "special" stuff. Then in your state you can better segregate these calls, like:

salt * somemodule.special.do_something

Then you will also construct your states/commands differently, avoiding hitting systems that aren't in this space. It is just a matter of placing it to the module's dunder and nothing much else.

This comment has been minimized.

Copy link
@skizunov

skizunov Apr 12, 2018

Contributor

@isbm : Have you read my earlier comments? I have already made a case that this isn't "special" stuff. It is also something that other Linux distributions could benefit from if implemented for them. The implementation is likely to be the same or almost the same but with a different path. So, as before, I stand by my statement that it is already located in the correct place.

This comment has been minimized.

Copy link
@isbm

isbm Apr 12, 2018

Contributor

@skizunov are we on the same page? Sounds like you're talking about an interface. And it is yet to be implemented. What I am talking is a principal of an actual implementation of any module that on some OSes even do not have to exist. That said, if other systems needs to implement this, then exception raise "not yet implemented" or "cannot be at all supported" is needed here. And @terminalmage is about to finish this fixing.

But right now it is returning one of an identical return data of the actual implementation. And so this is a bug and confusion.

This comment has been minimized.

Copy link
@skizunov

skizunov Apr 12, 2018

Contributor

@isbm : This interface has already been implemented for Windows systems (in win_system.py which also implements the system virtual interface). But I understand your point about the behavior on other Linux distributions that may not necessarily implement the interface. So perhaps there are a few options:

  1. Use the solution that @terminalmage suggests (requires infrastructure not yet present)
  2. Just implement for the other Linux distributions (should be very easy, just need to confirm the path to volatile storage is valid for each Linux distribution)
  3. On other Linux distributions that don't (yet) support this, throw an exception instead of returning a value.

This comment has been minimized.

Copy link
@isbm

isbm Apr 12, 2018

Contributor

@skizunov I'd prefer @terminalmage 's suggestion as it is case-agnostic.

This comment has been minimized.

Copy link
@terminalmage

terminalmage Apr 16, 2018

Contributor

Implementing function namespaces is not going to happen right now. It may in a future release, but it's extremely unlikely that it will happen in time for Fluorine. So, we should adjust expectations here accordingly.

Late last week I added the needed support to the depends decorator to enable the running of a shell command (#47040). In the process of doing so, I discovered that this decorator already supports interpreting an expression that evaluates to a boolean (with True meaning that the function should be loaded and False meaning that it should not).

There are two approaches we could use:

OPTION 1 (depends on #47040, would have to wait for that PR to be merged and then rebase to gain access to this functonality)

from salt.utils.decorators import depends

@depends('some shell command', retcode=0)
def set_reboot_required_witnessed():
    '''

OPTION 2

from salt.utils.decorators import depends

@depends(some python expression)
def set_reboot_required_witnessed():
    '''

Option 2 could be done right now without waiting for my decorator modifications to be merged, but note that whatever is passed to the depends decorator must evaluate as boolean. So it has to be an expression or function that returns a bool.

The question is, I guess, what would be the best way to confirm that the minion is NILinuxRT? Keep in mind that we won't have access to the grains.

This comment has been minimized.

Copy link
@10ne1

10ne1 Apr 24, 2018

Author Contributor

I've rebased on top of master and have the code from #47040

Can I run any shell commands in the depends decorator? I can check if we're on NILRT using 'lsb-release -i | grep -q nilrt ' if yes.

@cachedout

This comment has been minimized.

Copy link
Collaborator

commented Apr 18, 2018

I think we're still waiting on some feedback for some of the concerns raised by @isbm here.

@isbm

This comment has been minimized.

Copy link
Contributor

commented Apr 23, 2018

@10ne1 any updates here?

@10ne1

This comment has been minimized.

Copy link
Contributor Author

commented Apr 23, 2018

Sorry for the delay, I've been overwhelmed with other work + conference travel, I'll get back to this ASAP (I hope to update the PR tomorrow).

@isbm

This comment has been minimized.

Copy link
Contributor

commented Apr 24, 2018

@10ne1 do not forget to push. 😉

@10ne1

This comment has been minimized.

Copy link
Contributor Author

commented Apr 24, 2018

@isbm Sure, I'll push, I just need to do a little more testing (got preempted again but I'm back on it 😃 )

@10ne1 10ne1 force-pushed the 10ne1:develop branch 2 times, most recently from 55451b3 to 14c430e Apr 24, 2018

@isbm
Copy link
Contributor

left a comment

Sorry, I found some more concerns. Could you look at them?

'''
If this is an older version of NILinuxRT, return True. Otherwise, return False.
'''
return os.path.exists('/usr/local/natinst/bin/nisafemodeversion')

This comment has been minimized.

Copy link
@isbm

isbm Apr 26, 2018

Contributor

This is strange. You're judging version by just a mere place of the binary installation. Why you're so sure that the nisafemodeversion isn't up to the latest version but just is a symlink for backward compatibility? If you want to look for version, then look for the version by actually calling it or refer to some data that would actually fetch the real version and compare it.

This comment has been minimized.

Copy link
@10ne1

10ne1 Apr 26, 2018

Author Contributor

We have two big incompatible versions (usually we call them older and newer NILRT) of the distro supported side by side and there is no way to distinguish them from the rootfs (no versioning info to make that distinction in any package). I could implement a machanism to do that but we'd still have to support already released images...

That nisafemodeversion file is not present t all in newer NILRT, as we have a unified bootflow with no "safe mode", so even though it looks ugly that's the most reliable way I could come up to distinguish older vs newer 😅

This comment has been minimized.

Copy link
@isbm

isbm May 9, 2018

Contributor

@10ne1 I hear you. It is still would be nicer if your distro would start using /etc/os-release like everyone else are doing.

.format(dir_path, ex.errno, ex.strerror))

__salt__['cmd.run']('touch {0}'.format(NILRT_REBOOT_WITNESS_PATH))
return True

This comment has been minimized.

Copy link
@isbm

isbm Apr 26, 2018

Contributor

And if that fails for whatever reason with error code non zero, you're still saying "all perfect". I would not return it here, but at the end, explicitly that the exit code is 0 and then bool it:

errcode = -1
...
if not os.path.exists(dir_path):
    ...
    errcode = ....

return errcode == 0

This comment has been minimized.

Copy link
@10ne1

10ne1 Apr 26, 2018

Author Contributor

Done.


NILRT_REBOOT_WITNESS_PATH = '/var/volatile/tmp/salt/reboot_witnessed'

NILRT_FAMILY_VERIFY_CMD = 'sh -c \'lsb_release -d | grep -qi "NI Linux Real-Time"\''

This comment has been minimized.

Copy link
@isbm

isbm Apr 26, 2018

Contributor

It has to be better than this...

$ lsb_release -d
-bash: lsb_release: command not found

My point is that the whole thing also depends on the /usr/bin/lsb_release script on its own, while you could just:

import lsb_release
distro = lsb_release.get_distro_information()

And in some OSes on Python 3.6+ it is sometimes broken, so you should likely look at the data more reliably as well as this would be a wheel re-implementation, just from the other end.

Why not just looking at grains that you already have?

This comment has been minimized.

Copy link
@10ne1

10ne1 Apr 26, 2018

Author Contributor

Done, I'm using the os family grain now.

@10ne1 10ne1 force-pushed the 10ne1:develop branch 2 times, most recently from 2488f75 to 5cc8f3e Apr 26, 2018

@10ne1 10ne1 force-pushed the 10ne1:develop branch from 5cc8f3e to b682ffe May 9, 2018

10ne1 added some commits Mar 20, 2018

restartcheck.py: Verify older NILRT kernels
Older NILRT distro versions don't use package management when
installing the kernel bzImage, nor symlinks to files containing the
version, so it's impossible to determine the version without using
ugly hacks like this "strings" one.

The need to maintain backwards compatibility complicates the matter
further because there's no easy way out: We need to wait until support
for older NILRT's is discontinued to avoid having both this hack and
the fixed older NILRT solution at the same time, besides the one
already working on newer distros..

Signed-off-by: Ioan-Adrian Ratiu <adrian.ratiu@ni.com>
restartcheck: update function doc
The doc comment is innacurate: in case of error, the returned
dictionary will have the 'result' key set to False and if no
packages are found to need restart, a string is returned.

Signed-off-by: Ioan-Adrian Ratiu <adrian.ratiu@ni.com>

@10ne1 10ne1 force-pushed the 10ne1:develop branch from b682ffe to ae29297 May 9, 2018

10ne1 added some commits Mar 20, 2018

restartcheck: verify NILRT kernel modules
NI's modules are special in the sense that once they're inserted in
a kernel they can't be removed so a reboot is required when upgrading
or uninstalling them. Some even ask for reboots when installing for
the first time.

NI kernel modules being a given source of reboots, we have to make
restartcheck aware of them by checking a md5sum of modules.dep and
its timestamp (modules.dep is modified/touched by all modules).

The timestamp is checked because on a module upgrade, the modules.dep
contents will be the same (identical md5 hashes).

Signed-off-by: Ioan-Adrian Ratiu <adrian.ratiu@ni.com>
opkg: add minion reboot & service restart support
Some packages on NILRT systems, especially kernel modules which are a
poignant exception in the Linux world for this use case, require
rebooting the minion after an opkg install/remove/upgrade/etc.

This adds support for tracking kernel module states to know when a
system reboot is necessary (via restartcheck) and calls a 'witness'
function to set a temporary 'reboot required' state which is cleared
on boot.

If restartcheck output indicates services require restart without
rebooting the system, said services are restarted without calling the
'reboot witness'.

Signed-off-by: Ioan-Adrian Ratiu <adrian.ratiu@ni.com>
system: add reboot API with inital NILRT implementation
This adds a UNIX 'reboot required' tracking mechanism similar to the
one in win_system via two witness get/set functions (same API).

Only NILinuxRT makes use of it for now via the opkg module.

Signed-off-by: Ioan-Adrian Ratiu <adrian.ratiu@ni.com>
restartcheck: fix initscripts detection
The fallback if's are reversed :)

__salt__['service.available'] currently returns only initscripts so
don't report them as systemd services by default (check first to see
if it's an initscript and if not add them to systemd because in the
future __salt__['service.available'] may return systemd services).

Signed-off-by: Ioan-Adrian Ratiu <adrian.ratiu@ni.com>
@10ne1

This comment has been minimized.

Copy link
Contributor Author

commented May 9, 2018

Rebased (resolved 1 conflict with develop branch) and added 1 more fix.

@isbm

isbm approved these changes May 9, 2018

@rallytime rallytime merged commit a72d6e3 into saltstack:develop May 9, 2018

4 of 10 checks passed

jenkins/PR/salt-pr-linode-cent7-py3 Pull Requests » Salt PR - Linode CentOS 7 - PY3 #4781 — ABORTED
Details
jenkins/PR/salt-pr-linode-ubuntu16-py3 Pull Requests » Salt PR - Linode Ubuntu16.04 - PY3 #9746 — ABORTED
Details
jenkins/PR/salt-pr-rs-cent7-n Pull Requests » Salt PR - RS CentOS 7 #18836 — ABORTED
Details
codeclimate 2 issues to fix
Details
default Build finished.
Details
jenkins/PR/salt-pr-linode-ubuntu14-n Pull Requests » Salt PR - Linode Ubuntu14.04 #22706 — FAILURE
Details
WIP ready for review
Details
jenkins/PR/salt-pr-clone Pull Requests » Salt PR - Clone #24956 — SUCCESS
Details
jenkins/PR/salt-pr-docs-n Pull Requests » Salt PR - Docs #17074 — SUCCESS
Details
jenkins/PR/salt-pr-lint-n Pull Requests » Salt PR - Code Lint #21694 — SUCCESS
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.