Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

win_system.reboot: can return True without rebooting #39243

Closed
morganwillcock opened this issue Feb 8, 2017 · 50 comments
Closed

win_system.reboot: can return True without rebooting #39243

morganwillcock opened this issue Feb 8, 2017 · 50 comments
Assignees
Labels
Pending-Discussion The issue or pull request needs more discussion before it can be closed or merged
Milestone

Comments

@morganwillcock
Copy link
Contributor

According to module documentation the reboot function returns True if successful, but doesn't say what success is meant to represent. I presumed that success should match whether it will be rebooting but when setting only_on_pending_reboot: True it will return True even if it doesn't reboot. The behaviour is confirmed by checking the code in the module:

if only_on_pending_reboot and not get_pending_reboot():
    return True

This makes it impossible to use the module output as dependency, so you can't fail a subsequent state to gracefully handle the reboot.

i.e. this can failhard when a reboot isn't occurring:

reboot_check:
  module.run:
    - name: system.reboot
    - only_on_pending_reboot: True
    - onchanges:
      - dism: install_a_package_that_might_need_a_reboot

failhard_for_reboot:
  test.fail_without_changes:
    - failhard: True
    - onchanges:
      - module: reboot_check

If this is a bug (success is meant to represent a reboot) I'd like to flip the return value to False:

if only_on_pending_reboot and not get_pending_reboot():
    return False

If it's by design then would it be okay to add an extra parameter, something like return_reboot=False, to optionally make success represent a reboot?

Steps to reproduce:

# salt salt-test system.get_pending_reboot
salt-test:
    False

# salt salt-test system.reboot only_on_pending_reboot=True
salt-test:
    True

Tested on 2016.11.2

@Ch3LL
Copy link
Contributor

Ch3LL commented Feb 8, 2017

@twangboy can I get your input here? Do you know the intended return value and behavior?

@Ch3LL Ch3LL added the Pending-Discussion The issue or pull request needs more discussion before it can be closed or merged label Feb 8, 2017
@Ch3LL Ch3LL added this to the Blocked milestone Feb 8, 2017
@twangboy
Copy link
Contributor

twangboy commented Feb 8, 2017

@morganwillcock @hrumph Is the one who added the code. Perhaps he would like to weigh in on this one. I can see the reason to return True if a reboot is not needed and only_on_pending_reboot is True. I think if you return False here, the state will think it failed.

@morganwillcock
Copy link
Contributor Author

I've tried a few ways of working around it and haven't found a good way. I follow your logic, but True might be rebooting the machine or True might not rebooting the machine, so success from the state isn't useful (if you do something based on the success, you might get a reboot in the middle of it).

Just thinking of the module itself, outside of any states:

salt salt-test system.reboot only_on_pending_reboot=True

salt-test:
    True

... you can't tell whether that is rebooting it or not, so currently True doesn't have much value.

@ghost
Copy link

ghost commented Feb 8, 2017

I realise that this may be causing problems and I don't mind if it gets changed one way or another. I had it return true in all cases because, to my mind, both outcomes (either rebooting or not rebooting) are successes. Anyway if indeed true should only be returned when a reboot is called, then the changes should be made.

Note that the use case I had in mind was a final housekeeping state where the result really didn't matter. Bottom line for me is make whatever changes you think are appropriate.

Edit: just make sure to document everything accurately.

Edit: I'd like it if @twangboy would make the decision as to what to do. I'm willing to do a PR to make the adjustments (if @twangboy decides that some are merited and no one else wants to do it).

@ghost
Copy link

ghost commented Feb 8, 2017

Perhaps None could be used for this case? Anyway others should way in.

@ghost
Copy link

ghost commented Feb 9, 2017

I should add that my understanding is that we were returning false to mean that an attempt to reboot had failed. In other words false is returned because the system fails to do what it's asked. If it doesn't attempt to reboot because only_on_pending_reboot is true and get_pending_reboot() is false then the system is doing exaclty what it has been asked to do, hence the return value of true.

BTW I'm not saying that this is an ideal state of affairs. I realise that it may be inadequate to not report what decision was made but, as I'm saying, I'd like for @twangboy to decide what should be done, if anything.

@ghost
Copy link

ghost commented Feb 9, 2017

Right now I'm thinking we could return "None" in this case, (if None isn't being used for other purporses). Another posibility, along the lines of what @morganwillcock proposed is to pass a special paramater called "return_boot_decision". WIth this flag set, instead of getting True or False you would get one of three values (e.g BOOT_FAILED, BOOT_SUCCEEDED, BOOT_NOT_NEEDED)

@morganwillcock
Copy link
Contributor Author

Could it be more straight-forward to add a state function win_system.reboot, to compliment the execution function?

states.win_system.reboot: return True if the system doesn't need to reboot or will reboot. If rebooting return why in the change data (either you passed a flag to force a reboot, or it was to clear a pending reboot).

modules.win_system.reboot: only return True if actually rebooting.

This way the module can stay in the realm of utility functions and a state failure would only show an actual command failure. State success would return True with changes (so handling the reboot can be done with the onchanges: requisite), or just True (not rebooting, nothing to see here).

@ghost
Copy link

ghost commented Feb 9, 2017

@morganwillcock, that sounds like some pretty good thinking that meshes well with the salt idiom. If @twangboy agrees to it then that's what we'll do.

@ghost
Copy link

ghost commented Feb 9, 2017

@morganwilcock but why not have the command return None on the decision to not reboot, and make your state function exactly as you described it? That way we'd get everything.

@ghost
Copy link

ghost commented Feb 9, 2017

Ideally we would drop the only_on_pending_reboot paramater altogether, and instead make it a paramater of the newly proposed state function. It was my original intention to make a state function but, at the time, I thought it was overkill. I don't think that's too much breakage because I think that @morganwillcock and I are probably the only two people to be using that paramater so far. Anyway, that's my two cents.

@morganwillcock
Copy link
Contributor Author

I gave it a try with a module that returns None and the state output is still true. Also gave it a go using watch instead, but since it's always true with changes that has the opposite problem of never failing on the second state.

reboot_if_needed:
  module.run:
    - name: module.that_returns_none

failhard_for_reboot:
  test.fail_without_changes:
    - failhard: True
    - onchanges:
      - module: reboot_if_needed

always_succeed:
  test.succeed_without_changes
salt-test:
----------
          ID: reboot_if_needed
    Function: module.run
        Name: module.that_returns_none
      Result: True
     Comment: Module function module.that_returns_none executed
     Started: 13:09:43.902000
    Duration: 0.0 ms
     Changes:   
              ----------
              ret:
                  None
----------
          ID: failhard_for_reboot
    Function: test.fail_without_changes
      Result: False
     Comment: Failure!
     Started: 13:09:43.902000
    Duration: 0.0 ms
     Changes:   

Summary for salt-test
------------
Succeeded: 1 (changed=1)
Failed:    1
------------
Total states run:     2
Total run time:   0.000 ms

@ghost
Copy link

ghost commented Feb 9, 2017

OK then I have a provisional plan as follows:

  1. Deprecate the only_on_pending_reboot paramater_ in the documentation
  2. Create a state function called win_system.reboot that has it's own only_on_pending_reboot paramater. When the paramater is set to true, it will check to see if there is a pending reboot. If none is required, the function will return True with no changes. If a reboot is required, it will return the return value of win_system.reboot and if that value is True it will also record a change in the changes dictionary.
    If the value of _only_on_pending_reboot is false the function will not make any check and will just call the reboot, reporting True with changes or False.

The default value of only_on_pending_reboot can be True since this is a state function and we don't want to reboot when we don't have to.

Since I would only be deprecating I wouldn't be breaking things anymore. Does everyone find this acceptable? @twangoy?

@morganwillcock
Copy link
Contributor Author

Yes, that is exactly what I was thinking, default the state with only_on_pending_reboot=True and if you really want to reboot every time you can optionally set it to False.

I think you could keep the only_on_pending_reboot parameter in the module though, if it was changed to return False here:

if only_on_pending_reboot and not get_pending_reboot():
    return False

...it can just be documented that the result represents a reboot. This way if someone wants to do a mass reboot of their servers after updates, they can call the module and know how many are restarting. If they want to reboot in a state they would use the state module instead.

@ghost
Copy link

ghost commented Feb 9, 2017

@morganwilcock, ok, although I don't really like returning False where I returned True before, I think it's reasonable. I'd like for @twangboy to weigh in now and then I (or whoever wants to do this) can proceed.

@ghost
Copy link

ghost commented Feb 9, 2017

@morganwillcock why can't we just return None? That won't stop people from counting instances of True.

@morganwillcock
Copy link
Contributor Author

@hrumph I don't mind, as long as it's clear what each value represents. Perhaps the question is, should False be reserved for use by the function itself?

True = Rebooting
None = Not Rebooting
False = Command Error

...or should it represent the final action?
True = Rebooting
False = Not Rebooting (say if it wasn't needed or if it was because of an error)

Personally I prefer the second way, but I'll go with whatever is decided (would like to assist with the state too).

@ghost
Copy link

ghost commented Feb 9, 2017

@morganwillcock at this point I consider this to not be important. EIther of the two choices is fine by me but I have a mild preference for choice 1. Technically both choices are breaking changes and may be resisted by the maintainers but hopefully it will be ok because this is a new feature anyway. Anyway I'm very happy to let let @twangboy have the final say in this. @twangboy, what's the delay?

When you are saying that you would like to assist with the state, if you are implying that you want to do these PR's yourself, I'm very happy with that. I have bigger fish to fry including documenting a new salt module pair that I plan on PR-ing shortly.

@thatch45
Copy link
Member

thatch45 commented Feb 9, 2017

@twangboy and I have been discussing this and this is what we think so far.

I would say that this is a situation where we should have a state to make the reboot happen. So overall I would say that if we call system.reboot and it decides not to issue a reboot then the module should return False, and the state should then interpret what the result should be based on that return. @twangboy might have some additional details

@twangboy
Copy link
Contributor

twangboy commented Feb 9, 2017

@morganwillcock @hrumph Sorry, I didn't see this thread until this morning. I think the logic for deciding to reboot or not should be handled by the state. The win_system.reboot command should do just that... reboot. Let the other logic be handled by the state.

@ghost
Copy link

ghost commented Feb 9, 2017

@twangboy I have no problem with that actually. Before I added the on_pending_reboot param that was actually my first impulse and then I thought that a state function was overkill. Therefore if I understand correctly, we should deprecate (or eliminate [if you guys are ok with the breakage] of the only_on_pending_reboot param). And add the state function I detailed above (although @morganwillcock brought it up first). I just want everyone to agree on something and then someone can proceed with the work (I have a vague impression that @morganwillcock is volunteering but I'll do it if no one else wants to).

@twangboy
Copy link
Contributor

twangboy commented Feb 9, 2017

Also, True should indicate that the reboot command was executed... False that it was not... raise a CommandExecutionError if it failed...

@twangboy
Copy link
Contributor

twangboy commented Feb 9, 2017

@hrumph I can see the value of the parameter from an execution module standpoint.... meaning if I want to reboot a minion only if it's pending, that's a nice switch.... I guess I'm saying it should return False instead of True. But don't use the only_on_pending_reboot option from the state module... Let the state decide if it needs a reboot.... which is a flip from my initial reply.

@ghost
Copy link

ghost commented Feb 9, 2017

@twangboy, ok I understand, and I am quite happy with your decisions.

@twangboy
Copy link
Contributor

twangboy commented Feb 9, 2017

@hrumph So, I guess this is what needs to happen:

  1. Change return in execution module from True to False
  2. Improve the documentation to clarify exactly what the return values mean
  3. Have the need to reboot or not be determined by the state module

@ghost
Copy link

ghost commented Feb 9, 2017

@morganwillcock I believe that we are in accord and you have enough information to proceed with a PR. Everyone on board now?

@twangboy
Copy link
Contributor

twangboy commented Feb 9, 2017

oui

@twangboy
Copy link
Contributor

twangboy commented Feb 9, 2017

Looks like we'll have to write a state for it... like @hrumph said in an earlier comment....

@morganwillcock
Copy link
Contributor Author

I'm happy to give it a go. I've not written a state module before (so it's not going to appear tomorrow) but leave it with me. I'm guessing that changing a return value of an execution module and adding a state module would have to be dev branch only, or would the change to the execution module be classed as a bug fix so it can go into stable branches without waiting for the state module?

@ghost
Copy link

ghost commented Feb 9, 2017

@morganwillcock, treat the module change as a bug fix and put it in stable. Treat the new state function as devel (since it's new). Just my two cents. I don't see why the new state function can't go into the existing win_system.py state module but I'm probably missing something.

@twangboy
Copy link
Contributor

twangboy commented Feb 9, 2017

@hrumph @morganwillcock Yeah put the new state function in salt.states.win_system.

@ghost
Copy link

ghost commented Feb 12, 2017

Went over both pr's. I think they can be merged and this issue can be considered resolved now.

@twangboy twangboy self-assigned this Feb 13, 2017
@twangboy twangboy modified the milestones: Nitrogen 4, Blocked Feb 13, 2017
@morganwillcock
Copy link
Contributor Author

@twangboy, thank you for merging.
@hrumph, thanks for looking over the changes.

@ghost
Copy link

ghost commented May 5, 2017

Pursuant to this i have a question that I don't know the answer to. How should we combine a final reboot state with a final pair of states to upgrade the salt minion? It seems like it's getting really messy when I think about how to approach it. The FAQ indicates that the following jinja should be used to update a minion (for Windows):

Upgrade Salt Minion:
  pkg.installed:
    - name: salt-minion
    - version: 2016.11.3{% if grains['os_family'] == 'Debian' %}+ds-1{% endif %}
    - order: last

Enable Salt Minion:
  service.enabled:
    - name: salt-minion
    - require:
      - pkg: Upgrade Salt Minion

However, it a minion that automatically does a highstate is restarted it will do the entire highstate again (without much good reason) [or am I wrong about that?] Anyway what should I do if I also want to do a reboot and a salt upgrade?

Right now I'm doing the salt upgrade first (using order:1). It seems that a salt run ends when as soon as the upgrade happens then (because my minions are doing highstate automatically) another salt run begins with the upgraded minion. Is this actualy ok? I can't really think of an elegant way to handle these things.

I think it would be nice (hypothetically) if salt upgrades could be handled completely independently from the salt state system.

@twangboy, @morganwillcock, what do you guys say?

@twangboy
Copy link
Contributor

twangboy commented May 5, 2017

@UtahDave ^^^

@morganwillcock
Copy link
Contributor Author

@hrumph for systems that do a highstate run on boot, I've just got this as the last state:

salt_minion:
  pkg.installed

...so I do get the whole highstate run again after the minion has upgraded.

Since the upgrade is running through the task scheduler there isn't really a way to know what happened during the upgrade (for anything else that needs a reboot I always have a 'failhard on changes' dependency to stop subsequent tasks from kicking in). I imagine you could skip the highstate run based on checking the installation date or version number of the minion. For the former would probably need a custom module because I don't think Jinja2 has functions for datetime logic, for the latter you could publish your target version as pillar data and bump the version number once you want highstates to run again.

@ghost
Copy link

ghost commented May 5, 2017

Everyone I botched my description. In the FAQ it says you have to do (in addition to the upgrade)
Restart Salt Minion:

cmd.run:
{%- if grains['kernel'] == 'Windows' %}
    - name: 'C:\salt\salt-call.bat --local service.restart salt-minion'
{%- else %}
    - name: 'salt-call --local service.restart salt-minion'
{%- endif %}
    - bg: True
    - onchanges:
      - pkg: Upgrade Salt Minion

Do we still have to do this, because the upgrade seems to cause the salt minion to restart all on it's own? @morganwillcock you indiciated that you are not doing this, right?

@morganwillcock I am averse to calling reboots in the middle of salt runs and they can proceed anyway even after returning errors after being invoked, so salt might not notice a change and the reboot will happen anyway so the failhard may not work in many cases.

I'm thinking that what I might like to do is, as follows

  • add a pillar entry for the desired windows minion version (thanks @morganwillock for this idea). Add the following jinja (expressed in pseudo-code) in a state file.
{% if current version <= desired version from pillar %}
salt minion installed:
    pkg.installed:
      - name: salt-minion
      - version: {% desired_version from pillar %}
      - order: last
Restart Salt Minion:
  cmd.run:
{%- if grains['kernel'] == 'Windows' %}
    - name: 'C:\salt\salt-call.bat --local service.restart salt-minion'
{%- else %}
    - name: 'salt-call --local service.restart salt-minion'
{%- endif %}
    - bg: True
    - onchanges:
      - pkg: salt minion installed
{%-else-%}
final housekeeping:
    system.reboot:
       - name: final housekeeping
       - order: last
{%-endif-%}

This way everything gets called last (as I had hoped). At the end of a state run if the version is not up to date, it will do the update. After the update is done another salt run will start (assuming we go into highstate) and if we need to reboot that will happen at the end of the second state run. Does this seem reasonable to anyone?

@morganwillcock
Copy link
Contributor Author

I don't think you can ever get changes generated (to be used with onchanges) because running the installer will kill the running version. If you run pkg.install manually you'll see the minion get disconnected and then reconnect with the new version (can confirm by watching the event bus). I've not had any problems running it as the last state in a highstate run or upgrading manually, plus I think there are issues with any long running process that hits the minion retry timeout (see #39469). I would imagine the only option to avoid the highstate run is to use the first if statement to bypass it entirely...
i.e.

if out-of-date
    do-minion-upgrade
else
   highstate

@twangboy
Copy link
Contributor

twangboy commented May 5, 2017

The only reason you would need to restart the minion would be if you made changes to the minion config...

@ghost
Copy link

ghost commented May 5, 2017

@twangboy ok so the FAQ needs to be changed, because it does tell you to restart the minion, even in the case of Windows. Agreed? However, @morganwillcock what if we ran the installer with /start-minion=0. Can we get away with this in windows at the end of a salt run? If we can do this then we could see the salt run end, and then do the restart as specified by the FAQ? What do you guys think of this? (I imagine that there's some reason why this won't work but I'm interested).

@morganwillcock
Copy link
Contributor Author

@hrumph when the installer runs the running minion version is removed (i.e. it has to be stopped as part of it's removal), that is why the installer has to be run via the task scheduler. If you specify /start-minion=0 it just means the newly installed service is never started, so the minion wouldn't reconnect without rebooting the system or remotely starting the service (by some other means).

@ghost
Copy link

ghost commented May 6, 2017

anyway I made a bug report about the FAQ. It's #41116

@ghost
Copy link

ghost commented May 8, 2017

OK, what I have done now is as follows:

salt minion installed:
    pkg.installed:
      - name: salt-minion
      - version: "2016.11.4"
      - order: last

reboot if needed:
   system.reboot:
     - require:
         - salt minion installed

I guess it's good enough for me. If salt gets updated, then there will be another highstate because salt will be restarted so will eventually get to the reboot.

Hypothetically it would be nice if there was someway (at install time) to set up the minion so it was auto-updating and the logic could be separated out of state logic. The installer could set up a task with windows scheduler. Some of the install paramaters could be used to determine an update schedule. A salt pillar value could be used to determine the target version.

@damon-atkins
Copy link
Contributor

damon-atkins commented May 8, 2017

Have you got a reactor in place to cause a high state when a minion connects?
Windows does not allow open exe/dll files to be replace while they are in use (unlike unix). So the minion must be shutdown by the installer/uninstaller.

@ghost
Copy link

ghost commented May 8, 2017

@damon-atkins I have

startup_states: highstate

in my minion configuration file. Is that inadequate?

@damon-atkins
Copy link
Contributor

That would be the reason why the upgrade causes a highstate. i.e. the minion is stop, uninstalled, installed and started.

@ghost
Copy link

ghost commented May 8, 2017

@damon-atkins yes i realise that. I assume that you are implying there is a more efficient way to go about this.

@ghost
Copy link

ghost commented May 8, 2017

I have now also added issue #41130.

@morganwillcock
Copy link
Contributor Author

@hrumph if you use a reactor formula to trigger the highstate it will be a lot more flexible for making the highstate optional. I've got something like this:

{% if data['id'].startswith('pc-') %}
highstate_run:
  local.state.apply:
    - tgt: {{ data['id'] }}
{% endif %}

...setup to match the event path salt/minion/*/start in the master's reactor config, rather than set startup_states in the minion's config.

@damon-atkins
Copy link
Contributor

damon-atkins commented May 9, 2017

It's a pity Salt does not have a reboot event. salt/minion/*/start_reboot.
i.e compares the uptime against the uptime of the last time salt was used which was saved in a file..
Or just sends start_reboot if salt is started within first 10 minutes of system being booted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Pending-Discussion The issue or pull request needs more discussion before it can be closed or merged
Projects
None yet
Development

No branches or pull requests

5 participants