Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add opkg/NILinuxRT 'reboot required' witnessing feature #46774

Merged
merged 6 commits into from May 9, 2018

Conversation

@10ne1
Copy link
Contributor

@10ne1 10ne1 commented Mar 29, 2018

What does this PR do?

Add new reboot required witnessed logic to opkg operations on NILRT systems similar to how win_system.py does it, using the same system API as on windows.

What issues does this PR fix or reference?

Doesn't have an issue assigned.

New Behavior

A new Linux system API is added for getting/setting reboot witness events which is called by the opkg module. The witness events are stored in volatile memory such that they're cleared upon reboot.

Tests written?

No

Commits signed with GPG?

No

Please review Salt's Contributing Guide for best practices.

See GitHub's page on GPG signing for more information about signing commits with GPG.

@rallytime rallytime requested review from terminalmage and skizunov Mar 29, 2018
Copy link
Contributor

@skizunov skizunov left a comment

There are a few instances of inconsistent use of single-quotes (') vs doublequotes ("), but I don't know if that qualifies as an issue. It looks fine overall.

@terminalmage terminalmage removed their request for review Mar 29, 2018
@@ -412,12 +453,19 @@ def install(name=None,
ret.update({pkgname: {'old': old.get(pkgname, ''),
'new': new.get(pkgname, '')}})

rs_result = __salt__['restartcheck.restartcheck'](verbose=False)
if isinstance(rs_result, dict) and 'comment' in rs_result:
Copy link
Contributor

@cachedout cachedout Mar 30, 2018

We have this duplicated over on L527. Could we refactor it to a single block that's called?

Copy link
Contributor Author

@10ne1 10ne1 Apr 2, 2018

Done. I've ammended the commit.

salt '*' system.set_reboot_required_witnessed
'''
if __grains__.get('os_family') == 'NILinuxRT':
Copy link
Contributor

@cachedout cachedout Apr 4, 2018

Hrm. I don't like that this function is essentially a no-op for everything but the NILinuxRT OS family but exists in the system module which is available and consumed by all types of systems. Is this the best place for this?

Copy link
Contributor

@skizunov skizunov Apr 4, 2018

@cachedout : I think the idea behind this was to copy from the architecture already put in place in salt/modules/win_system.py, which is also a "system" module (by virtual name). So the function names and location are not new. I believe this exact same feature could apply to other Linux distributions, I would assume that the author was not motivated to implement and test it on other distributions. Although, by looking at things, it seems that it wouldn't be difficult to do so.

Perhaps the author should add at least one more distribution (eg CentOS) so that your automated tests would cover this new feature?

Copy link
Contributor Author

@10ne1 10ne1 Apr 4, 2018

I can move it somewhere else, no problem, we just need to figure out where is the best place to put them. How about the opkg module? It was written for the NILinuxRT family and it's the only place which calls these witness functions.

Copy link
Collaborator

@terminalmage terminalmage Apr 4, 2018

I think that opkg.py is probably the correct place to put it. I don't see any code here that actually calls these new functions, and since they only apply to NILinuxRT they shouldn't be in a more general module.

Copy link
Contributor

@skizunov skizunov Apr 4, 2018

@terminalmage : I disagree, because I think these new functions are not private and really meant to be called at the job level. Eg system.get_reboot_required_witnessed. How else would the master find out if a reboot is required on a given minion? Because they are required to be called at the job level, putting them into a module with virtual name pkg doesn't make that much sense, because all modules with virtual name pkg should adhere to a predefined set of APIs, which this falls outside of.

Since there already is precedent for system.get_reboot_required_witnessed from the Windows side, it seems like the logical place to put these functions where they are right now. This feature can easily be extended to other Linux distributions, because the main logic that checks if a reboot is required is in salt/modules/restartcheck.py, which already supports Debian and Red Hat based systems.

The pkg modules for such systems could be modified in a similar way to check if a restart is required after a package operation, and invoke system.set_reboot_required_witnessed in such a case. Then, for such systems, a user has the option of checking if a reboot is required after a package operation (eg if a new kernel is installed). It would actually be a pretty useful feature for general use.

Copy link
Contributor Author

@10ne1 10ne1 Apr 5, 2018

I'm ok either way, I think we just need a consensus so I know what to do further.

Copy link
Collaborator

@terminalmage terminalmage Apr 5, 2018

@skizunov Thanks, I overlooked your earlier comment regarding the win_system module. Sorry about that! I agree, system.py is a good place for it.

Copy link
Contributor

@isbm isbm Apr 11, 2018

@terminalmage Jain ("Yes, and no", German). Here is why:

@cachedout Remember our talk at SUSECon in Prague? Here is a good reason for the namespaces. E.g.:

salt foo system.nilinuxrt.set_reboot_required_witnessed

I would certainly move it there by simply mapping them in __virtual__. An example is Ansible gate how this is done. But currently everyone gets this function that does exactly nothing to their OS. To make things even worse, it even returns whole False with error code 0 (in case of state modules.run for example), and so nobody knows if it worked or minion was down or what else?.., especially if you call this via API. There is no clarity here. I still certainly on your side that this is no-op for others, and so this is not the best solution at the moment.

At least it doesn't even raises an exception "Not supported on Foo platform", once somebody still has a very strong opinion against the namespaces in this case.

salt '*' system.get_reboot_required_witnessed
'''
if __grains__.get('os_family') == 'NILinuxRT':
Copy link
Contributor

@cachedout cachedout Apr 4, 2018

Same concern as above here.

Copy link
Contributor

@isbm isbm Apr 11, 2018

Same support of @cachedout 's concern as above.

@cachedout
Copy link
Contributor

@cachedout cachedout commented Apr 9, 2018

@terminalmage Re-review requested.

@rallytime rallytime requested a review from terminalmage Apr 9, 2018
@ghost ghost self-requested a review Apr 9, 2018
Copy link
Collaborator

@terminalmage terminalmage left a comment

@cachedout sorry, my comment here was meant to signify that my concerns were satisfied, but I neglected to submit an "Approved" review. Sorry for holding things up.

if not (os.path.exists(os.path.join(NILRT_MODULE_STATE_PATH, 'modules.dep.timestamp')) and
os.path.exists(os.path.join(NILRT_MODULE_STATE_PATH, 'modules.dep.md5sum'))):
try:
os.makedirs(NILRT_MODULE_STATE_PATH)
Copy link
Contributor

@isbm isbm Apr 11, 2018

@10ne1 this is inconsistent: you are checking for files, but creating directories.

🪲 Second, bug alert: if one of the files has been not created due to various reasons (e.g. "no space left on device" which could be temporary occurrence), then you will not update the modules anymore.

Here I would change the following:

  • Check if the directory does not exist, then try to create it.
  • Check if any of files do not exists, then flush them away and update the info.

Copy link
Contributor Author

@10ne1 10ne1 Apr 24, 2018

Done.

except OSError as exc:
if exc.errno != errno.EEXIST:
msg = 'Error creating {0} (-{1}): {2}'.format(NILRT_MODULE_STATE_PATH, exc.errno, exc.strerror)
return (False, msg)
Copy link
Contributor

@isbm isbm Apr 11, 2018

Extra parenthesis is not needed here.

Copy link
Contributor Author

@10ne1 10ne1 Apr 24, 2018

Done.

'''
If this is an older version of NILinuxRT, return True. Otherwise, return False.
'''
return bool(os.path.exists('/usr/local/natinst/bin/nisafemodeversion'))
Copy link
Contributor

@isbm isbm Apr 11, 2018

Why you are wrapping boolean into a boolean? The os.path.exists already returns type bool.

Copy link
Contributor Author

@10ne1 10ne1 Apr 24, 2018

Done.

kernel = os.path.basename(kernel)
kernel = kernel.strip('bzImage-')

if __grains__.get('os_family') == 'NILinuxRT' and _is_older_nilrt():
Copy link
Contributor

@isbm isbm Apr 11, 2018

Can we move this magic word NILinuxRT into a constant, please?

Copy link
Contributor Author

@10ne1 10ne1 Apr 24, 2018

Done.

# backwards compatible so it'll just get more complicated.
kpath = '/boot/runmode/bzImage'
kvregex = r'[0-9]+\.[0-9]+\.[0-9]+-rt'
kernel = __salt__['cmd.shell']('strings {0} | awk \'$1 ~ /{1}/ {{print $1}}\''
Copy link
Contributor

@isbm isbm Apr 11, 2018

What happens if gawk package is not installed? This has to be checked in __virtual__ and properly reported. Or you would rather drop this dependency and process the output here with Python (preferred).

Copy link
Contributor Author

@10ne1 10ne1 Apr 24, 2018

Done (ported the awk logic to python re)

if retcode != 0:
return True

return False
Copy link
Contributor

@isbm isbm Apr 11, 2018

In fact:

return bool(__salt__['cmd.retcode']('md5sum -cs {0}'.format(depmodpath_md5sum), output_loglevel="quiet"))

😉

Copy link
Contributor Author

@10ne1 10ne1 Apr 24, 2018

Done.

@@ -359,8 +413,14 @@ def restartcheck(ignorelist=None, blacklist=None, excludepid=None, verbose=True)
kernel_current = __salt__['cmd.run']('uname -a')
for kernel in kernel_versions:
if kernel in kernel_current:
kernel_restart = False
break
if __grains__.get('os_family') == 'NILinuxRT':
Copy link
Contributor

@isbm isbm Apr 11, 2018

The same hard-coded magic word. Would be nicer to keep it in a constant.

Copy link
Contributor Author

@10ne1 10ne1 Apr 24, 2018

Done.

salt '*' system.set_reboot_required_witnessed
'''
if __grains__.get('os_family') == 'NILinuxRT':
Copy link
Contributor

@isbm isbm Apr 11, 2018

@terminalmage Jain ("Yes, and no", German). Here is why:

@cachedout Remember our talk at SUSECon in Prague? Here is a good reason for the namespaces. E.g.:

salt foo system.nilinuxrt.set_reboot_required_witnessed

I would certainly move it there by simply mapping them in __virtual__. An example is Ansible gate how this is done. But currently everyone gets this function that does exactly nothing to their OS. To make things even worse, it even returns whole False with error code 0 (in case of state modules.run for example), and so nobody knows if it worked or minion was down or what else?.., especially if you call this via API. There is no clarity here. I still certainly on your side that this is no-op for others, and so this is not the best solution at the moment.

At least it doesn't even raises an exception "Not supported on Foo platform", once somebody still has a very strong opinion against the namespaces in this case.

salt '*' system.get_reboot_required_witnessed
'''
if __grains__.get('os_family') == 'NILinuxRT':
Copy link
Contributor

@isbm isbm Apr 11, 2018

Same support of @cachedout 's concern as above.

if __grains__.get('os_family') == 'NILinuxRT':
if os.path.exists(NILRT_REBOOT_WITNESS_PATH):
return True
return False
Copy link
Contributor

@isbm isbm Apr 11, 2018

Also it can be much simpler to code:

def get_reboot_required_witnessed():
    return __grains__['os_family'] == 'NILinuxRT' and os.path.exists(NILRT_REBOOT_WITNESS_PATH)

But again, I totally dislike that this function behaves as supported on, say, BSD or Solaris, while it is actually lying what it says. It is available on BSD/Solaris/Windows and returns like it would be no reboot witnessed on NILinuxRT. This is a no go for me.

@terminalmage @cachedout thoughts?

Copy link
Collaborator

@terminalmage terminalmage Apr 12, 2018

My understanding from the OP was that the "witness" events were something unique to NILinuxRT.

If you're concerned about this function being available on other platforms, we can perhaps write a decorator similar to salt.utils.decorators.depends which either checks for the existence of a file or runs a command and analyzes the retcode. The failure of either of those actions would keep the function from being added to the __salt__ dunder.

Copy link
Collaborator

@terminalmage terminalmage Apr 12, 2018

Actually, I like the idea of such decorators even if we don't use them here, so I am looking into writing them now.

Copy link
Contributor

@isbm isbm Apr 12, 2018

@terminalmage 👍 for the decorator. I would still map it separately into its own namespace. Because it is "special" stuff. Then in your state you can better segregate these calls, like:

salt * somemodule.special.do_something

Then you will also construct your states/commands differently, avoiding hitting systems that aren't in this space. It is just a matter of placing it to the module's dunder and nothing much else.

Copy link
Contributor

@skizunov skizunov Apr 12, 2018

@isbm : Have you read my earlier comments? I have already made a case that this isn't "special" stuff. It is also something that other Linux distributions could benefit from if implemented for them. The implementation is likely to be the same or almost the same but with a different path. So, as before, I stand by my statement that it is already located in the correct place.

Copy link
Contributor

@isbm isbm Apr 12, 2018

@skizunov are we on the same page? Sounds like you're talking about an interface. And it is yet to be implemented. What I am talking is a principal of an actual implementation of any module that on some OSes even do not have to exist. That said, if other systems needs to implement this, then exception raise "not yet implemented" or "cannot be at all supported" is needed here. And @terminalmage is about to finish this fixing.

But right now it is returning one of an identical return data of the actual implementation. And so this is a bug and confusion.

Copy link
Contributor

@skizunov skizunov Apr 12, 2018

@isbm : This interface has already been implemented for Windows systems (in win_system.py which also implements the system virtual interface). But I understand your point about the behavior on other Linux distributions that may not necessarily implement the interface. So perhaps there are a few options:

  1. Use the solution that @terminalmage suggests (requires infrastructure not yet present)
  2. Just implement for the other Linux distributions (should be very easy, just need to confirm the path to volatile storage is valid for each Linux distribution)
  3. On other Linux distributions that don't (yet) support this, throw an exception instead of returning a value.

Copy link
Contributor

@isbm isbm Apr 12, 2018

@skizunov I'd prefer @terminalmage 's suggestion as it is case-agnostic.

Copy link
Collaborator

@terminalmage terminalmage Apr 16, 2018

Implementing function namespaces is not going to happen right now. It may in a future release, but it's extremely unlikely that it will happen in time for Fluorine. So, we should adjust expectations here accordingly.

Late last week I added the needed support to the depends decorator to enable the running of a shell command (#47040). In the process of doing so, I discovered that this decorator already supports interpreting an expression that evaluates to a boolean (with True meaning that the function should be loaded and False meaning that it should not).

There are two approaches we could use:

OPTION 1 (depends on #47040, would have to wait for that PR to be merged and then rebase to gain access to this functonality)

from salt.utils.decorators import depends

@depends('some shell command', retcode=0)
def set_reboot_required_witnessed():
    '''

OPTION 2

from salt.utils.decorators import depends

@depends(some python expression)
def set_reboot_required_witnessed():
    '''

Option 2 could be done right now without waiting for my decorator modifications to be merged, but note that whatever is passed to the depends decorator must evaluate as boolean. So it has to be an expression or function that returns a bool.

The question is, I guess, what would be the best way to confirm that the minion is NILinuxRT? Keep in mind that we won't have access to the grains.

Copy link
Contributor Author

@10ne1 10ne1 Apr 24, 2018

I've rebased on top of master and have the code from #47040

Can I run any shell commands in the depends decorator? I can check if we're on NILRT using 'lsb-release -i | grep -q nilrt ' if yes.

@cachedout
Copy link
Contributor

@cachedout cachedout commented Apr 18, 2018

I think we're still waiting on some feedback for some of the concerns raised by @isbm here.

@isbm
Copy link
Contributor

@isbm isbm commented Apr 23, 2018

@10ne1 any updates here?

@10ne1
Copy link
Contributor Author

@10ne1 10ne1 commented Apr 23, 2018

Sorry for the delay, I've been overwhelmed with other work + conference travel, I'll get back to this ASAP (I hope to update the PR tomorrow).

@isbm
Copy link
Contributor

@isbm isbm commented Apr 24, 2018

@10ne1 do not forget to push. 😉

@10ne1
Copy link
Contributor Author

@10ne1 10ne1 commented Apr 24, 2018

@isbm Sure, I'll push, I just need to do a little more testing (got preempted again but I'm back on it 😃 )

@10ne1 10ne1 force-pushed the develop branch 2 times, most recently from 55451b3 to 14c430e Apr 25, 2018
Copy link
Contributor

@isbm isbm left a comment

Sorry, I found some more concerns. Could you look at them?

'''
If this is an older version of NILinuxRT, return True. Otherwise, return False.
'''
return os.path.exists('/usr/local/natinst/bin/nisafemodeversion')
Copy link
Contributor

@isbm isbm Apr 26, 2018

This is strange. You're judging version by just a mere place of the binary installation. Why you're so sure that the nisafemodeversion isn't up to the latest version but just is a symlink for backward compatibility? If you want to look for version, then look for the version by actually calling it or refer to some data that would actually fetch the real version and compare it.

Copy link
Contributor Author

@10ne1 10ne1 Apr 26, 2018

We have two big incompatible versions (usually we call them older and newer NILRT) of the distro supported side by side and there is no way to distinguish them from the rootfs (no versioning info to make that distinction in any package). I could implement a machanism to do that but we'd still have to support already released images...

That nisafemodeversion file is not present t all in newer NILRT, as we have a unified bootflow with no "safe mode", so even though it looks ugly that's the most reliable way I could come up to distinguish older vs newer 😅

Copy link
Contributor

@isbm isbm May 9, 2018

@10ne1 I hear you. It is still would be nicer if your distro would start using /etc/os-release like everyone else are doing.

.format(dir_path, ex.errno, ex.strerror))

__salt__['cmd.run']('touch {0}'.format(NILRT_REBOOT_WITNESS_PATH))
return True
Copy link
Contributor

@isbm isbm Apr 26, 2018

And if that fails for whatever reason with error code non zero, you're still saying "all perfect". I would not return it here, but at the end, explicitly that the exit code is 0 and then bool it:

errcode = -1
...
if not os.path.exists(dir_path):
    ...
    errcode = ....

return errcode == 0

Copy link
Contributor Author

@10ne1 10ne1 Apr 26, 2018

Done.


NILRT_REBOOT_WITNESS_PATH = '/var/volatile/tmp/salt/reboot_witnessed'

NILRT_FAMILY_VERIFY_CMD = 'sh -c \'lsb_release -d | grep -qi "NI Linux Real-Time"\''
Copy link
Contributor

@isbm isbm Apr 26, 2018

It has to be better than this...

$ lsb_release -d
-bash: lsb_release: command not found

My point is that the whole thing also depends on the /usr/bin/lsb_release script on its own, while you could just:

import lsb_release
distro = lsb_release.get_distro_information()

And in some OSes on Python 3.6+ it is sometimes broken, so you should likely look at the data more reliably as well as this would be a wheel re-implementation, just from the other end.

Why not just looking at grains that you already have?

Copy link
Contributor Author

@10ne1 10ne1 Apr 26, 2018

Done, I'm using the os family grain now.

@10ne1 10ne1 force-pushed the develop branch 2 times, most recently from 2488f75 to 5cc8f3e Apr 26, 2018
Ioan-Adrian Ratiu added 2 commits May 9, 2018
Older NILRT distro versions don't use package management when
installing the kernel bzImage, nor symlinks to files containing the
version, so it's impossible to determine the version without using
ugly hacks like this "strings" one.

The need to maintain backwards compatibility complicates the matter
further because there's no easy way out: We need to wait until support
for older NILRT's is discontinued to avoid having both this hack and
the fixed older NILRT solution at the same time, besides the one
already working on newer distros..

Signed-off-by: Ioan-Adrian Ratiu <adrian.ratiu@ni.com>
The doc comment is innacurate: in case of error, the returned
dictionary will have the 'result' key set to False and if no
packages are found to need restart, a string is returned.

Signed-off-by: Ioan-Adrian Ratiu <adrian.ratiu@ni.com>
Ioan-Adrian Ratiu added 3 commits May 9, 2018
NI's modules are special in the sense that once they're inserted in
a kernel they can't be removed so a reboot is required when upgrading
or uninstalling them. Some even ask for reboots when installing for
the first time.

NI kernel modules being a given source of reboots, we have to make
restartcheck aware of them by checking a md5sum of modules.dep and
its timestamp (modules.dep is modified/touched by all modules).

The timestamp is checked because on a module upgrade, the modules.dep
contents will be the same (identical md5 hashes).

Signed-off-by: Ioan-Adrian Ratiu <adrian.ratiu@ni.com>
This adds a UNIX 'reboot required' tracking mechanism similar to the
one in win_system via two witness get/set functions (same API).

Only NILinuxRT makes use of it for now via the opkg module.

Signed-off-by: Ioan-Adrian Ratiu <adrian.ratiu@ni.com>
Some packages on NILRT systems, especially kernel modules which are a
poignant exception in the Linux world for this use case, require
rebooting the minion after an opkg install/remove/upgrade/etc.

This adds support for tracking kernel module states to know when a
system reboot is necessary (via restartcheck) and calls a 'witness'
function to set a temporary 'reboot required' state which is cleared
on boot.

If restartcheck output indicates services require restart without
rebooting the system, said services are restarted without calling the
'reboot witness'.

Signed-off-by: Ioan-Adrian Ratiu <adrian.ratiu@ni.com>
The fallback if's are reversed :)

__salt__['service.available'] currently returns only initscripts so
don't report them as systemd services by default (check first to see
if it's an initscript and if not add them to systemd because in the
future __salt__['service.available'] may return systemd services).

Signed-off-by: Ioan-Adrian Ratiu <adrian.ratiu@ni.com>
@10ne1
Copy link
Contributor Author

@10ne1 10ne1 commented May 9, 2018

Rebased (resolved 1 conflict with develop branch) and added 1 more fix.

isbm
isbm approved these changes May 9, 2018
@rallytime rallytime merged commit a72d6e3 into saltstack:develop May 9, 2018
4 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

6 participants