Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper way to upgrade salt-minions / salt-master packages without losing minion connectivity #7997

Closed
shantanub opened this issue Oct 21, 2013 · 71 comments
Labels
Documentation help-wanted P1
Milestone

Comments

@shantanub
Copy link

shantanub commented Oct 21, 2013

We ran through the right way of doing this in salt training with Seth but I think I'm still missing something. I'm not sure if this is a bug or if I've missed something. I tried to run through the upgrade the master first / use salt to upgrade the minion service steps to upgrade from v.17 to v.17.1 of salt and ended up with losing access to most of my minions.

Long story short, I need a reliable way of upgrading all of the salt-minions and salt-master packages without losing access to the minions. From what I can tell, every time I perform such an upgrade I lose access to some if not all of my minions and need to login to each host/VM and restart the salt-minion package. This is doable in test/dev where we have 30 nodes being managed but not when I move this infrastructure to prod where I have over 200 nodes to manage. I need the upgrade path not to break the remote execution framework established between minions and master.

So without further ado here's what I did:

Update the master.

[root@salt-master ~]# yum list updates
Loaded plugins: security 
epel                                                                                                | 3.0 kB     00:00
epel/primary_db                                                                                     | 6.2 MB     00:00
epel-testing                                                                                        | 2.9 kB     00:00
epel-testing/primary_db                                                                             | 2.2 MB     00:00
rhel-localrepo                                                                                      | 3.0 kB     00:00
rhel-localrepo/primary_db                                                                           |  26 MB     00:00
Updated Packages         
glibc.x86_64                                         2.12-1.107.el6_4.5                                      rhel-localrepo
glibc-common.x86_64                                  2.12-1.107.el6_4.5                                      rhel-localrepo
glibc-devel.x86_64                                   2.12-1.107.el6_4.5                                      rhel-localrepo
glibc-headers.x86_64                                 2.12-1.107.el6_4.5                                      rhel-localrepo
java-1.6.0-openjdk.x86_64                            1:1.6.0.0-1.65.1.11.13.el6_4                            rhel-localrepo
kernel.x86_64                                        2.6.32-358.23.2.el6                                     rhel-localrepo
kernel-firmware.noarch                               2.6.32-358.23.2.el6                                     rhel-localrepo
kernel-headers.x86_64                                2.6.32-358.23.2.el6                                     rhel-localrepo
libtar.x86_64                                        1.2.11-17.el6_4.1                                       rhel-localrepo
nscd.x86_64                                          2.12-1.107.el6_4.5                                      rhel-localrepo
perf.x86_64                                          2.6.32-358.23.2.el6                                     rhel-localrepo
salt.noarch                                          0.17.1-1.el6                                            epel-testing
salt-master.noarch                                   0.17.1-1.el6                                            epel-testing
salt-minion.noarch                                   0.17.1-1.el6                                            epel-testing
setup.noarch                                         2.8.14-20.el6_4.1                                       rhel-localrepo
tzdata.noarch                                        2013g-1.el6                                             rhel-localrepo
tzdata-java.noarch                                   2013g-1.el6                                             rhel-localrepo
You have new mail in /var/spool/mail/root
[root@salt-master ~]# yum update -y

I restart the master and minion on my master VM.

[root@salt-master ~]# service salt-master restart
Stopping salt-master daemon:                               [  OK  ]
Starting salt-master daemon:                               [  OK  ]
[root@salt-master ~]# service salt-minion restart
Stopping salt-minion daemon:                               [  OK  ]
Starting salt-minion daemon:                               [  OK  ]

Try to upgrade some of my test minion VMs.

[root@salt-master ~]# salt 'salt-minion*' pkg.upgrade
[root@salt-master ~]# salt 'salt-minion*' pkg.list_upgrades

[root@salt-master ~]# salt -v 'salt-minion*' test.ping
Executing job with jid 20131021102016190263
-------------------------------------------

salt-minion-00:
    Minion did not return
salt-minion-01:
    Minion did not return

I login to each minion VM and restart the salt-minion service.

[root@salt-minion-01 ~]# service salt-minion restart
Stopping salt-minion daemon:                               [FAILED]
Starting salt-minion daemon:                               [  OK  ]
[root@salt-minion-01 ~]# chkconfig --list | grep salt-minion
salt-minion     0:off   1:off   2:on    3:on    4:on    5:on    6:off

Now I can ping the VMs again.

[root@salt-master ~]# salt -v 'salt-minion*' test.ping
Executing job with jid 20131021102229314417
-------------------------------------------

salt-minion-01:
    True                 
salt-minion-00:
    True  

Versions reports:

[root@salt-master ~]# salt --versions-report
           Salt: 0.17.1
         Python: 2.6.6 (r266:84292, May 27 2013, 05:35:12)
         Jinja2: 2.2.1
       M2Crypto: 0.20.2
 msgpack-python: 0.1.13
   msgpack-pure: Not Installed
       pycrypto: 2.0.1
         PyYAML: 3.10
          PyZMQ: 2.2.0.1
            ZMQ: 3.2.4

[root@salt-minion-00 ~]# salt-call --versions-report
           Salt: 0.17.1
         Python: 2.6.6 (r266:84292, May 27 2013, 05:35:12)
         Jinja2: 2.2.1
       M2Crypto: 0.20.2
 msgpack-python: 0.1.13
   msgpack-pure: Not Installed
       pycrypto: 2.0.1
         PyYAML: 3.10
          PyZMQ: 2.2.0.1
            ZMQ: 3.2.4

[root@salt-minion-01 ~]# salt-call --versions-report
           Salt: 0.17.1
         Python: 2.6.8 (unknown, Nov  7 2012, 14:47:45)
         Jinja2: unknown
       M2Crypto: 0.21.1
 msgpack-python: 0.1.12
   msgpack-pure: Not Installed
       pycrypto: 2.3
         PyYAML: 3.08
          PyZMQ: 2.1.9
            ZMQ: 2.2.0

You'll notice that the upgrade proceeded correctly. The packages were upgraded, but the salt-minion services were not restarted as a part of the upgrade process (for both minion VMs - one is RHEL5 and the other is RHEL6). Unfortunately, I didn't think to run the upgrade packages command in verbose mode at the time.

Do I need to find some external remote-execution method to restart all of the minions post-upgrade (mussh/omnitty, etc...)? This is probably not a bug but it's still very frustrating... I'm unlikely to upgrade again until I can figure out how to do this properly.

@kiorky
Copy link
Contributor

kiorky commented Oct 21, 2013

you may need my fix (already on git/develop) for #7987

@shantanub
Copy link
Author

shantanub commented Oct 21, 2013

Was this always a problem or just something specific to the v.17 -> v.17.1 upgrade? I've personally never got it to work reliably across all minion upgrades with past version upgrades but I had assumed I was going about it the wrong way.

I just tried an alternative upgrade approach. Unfortunately, it didn't work either.

[root@salt-master ~]# salt 'test*' cmd.run "yum update -y"
[root@salt-master ~]# salt 'test*' service.restart salt-minion
[root@salt-master ~]# salt 'test*' -v test.ping
Executing job with jid 20131021110335741192
-------------------------------------------

test:
    Minion did not return

[root@test ~]# service salt-minion status
salt-minion is stopped
[root@test ~]# service salt-minion start
Starting salt-minion daemon:                               [  OK  ]

[root@salt-master ~]# salt 'test*' -v test.ping
Executing job with jid 20131021110634961121
-------------------------------------------

test:
    True

The package upgrade seems to stop the running service.

@basepi
Copy link
Contributor

basepi commented Oct 21, 2013

This process has never been as consistent or stable as we would like it to be. But the first thing you need to do is make sure all of your minions have ZMQ 3.2 or higher. That minion that you listed above with ZMQ 2 is definitely going to cause problems with keeping the connection alive or reconnecting.

The rest of the process tends to depend on the init system in question and a lot of other factors. As soon as we get the general bug count under control, we want to dedicate some resources to solving this upgrade problem for good.

@shantanub
Copy link
Author

shantanub commented Oct 22, 2013

I don't have much of a choice here unless I deviate from the repos. ZMQ3 is only available in epel for RHEL6. I have the latest version of ZMQ offered for RHEL5 which happens to be the 2.2 version listed above.

Minion01 replicates our legacy RHEL5 nodes (which we have a lot of) and I'm using it as a control for testing issues.

Minion00 is running RHEL6 and mimics the configuration of our latest app/service deployments (and where we'd like to transition everything once I have enough cycles to complete migrations).

@shantanub
Copy link
Author

shantanub commented Oct 22, 2013

I should also say, you're right. Most of my upgrade and losing minions when the salt-master reboots woes have been with RHEL5 and the legacy ZMQ. That said, this latest upgrade to v.17.1 hasn't worked cleanly for any client. Every single upgrade has stopped the salt-minion service and requires me to login and reboot the services.

Can I get around this by using salt-ssh to restart the minion services? I could use an example of how to use salt-ssh. I'm not entirely clear what user / keys / password it uses (or if this is all hidden under the covers with salt's key management system) or how to issue commands. The docs I've run across don't have too many examples of executing shell commands. I'm presuming using the -r flag (raw) works much like "ssh -t"?

@equinoxefr
Copy link
Contributor

equinoxefr commented Oct 22, 2013

Hi,

I have a workaround for this:

For my Linux minion i set a crontab to restart minions every night.

{% if grains['kernel'] == 'Linux' %}
/etc/cron.daily/restart-minion.sh:
  file:                                     # state declaration
    - managed                               # function
    - mode: 755
    - source: salt://tools/restart-minion/restart-minion.sh   # function arg
{% endif %}

On Windows, same thing, i add a task to restart minions.

{% if grains['os'] == 'Windows' %}
addtask:
  cmd.run:
     - name: schtasks /f /create /tn "restart_salt" /ru System /tr "c:\salt\salt-call.exe service.restart salt-minion"  /sc daily
 /st 02:00

Regards

@shantanub
Copy link
Author

shantanub commented Oct 22, 2013

That's actually quite clever and I'm going to steal that idea.

That said, it would not have helped with the v.17 to v.17.1 upgrade via epel-testing packages. From what I'm seeing, the package upgrade itself stops the running minion service (whether via yum or from within salt's framework -- which makes sense since salt calls yum's methods) which is very peculiar behavior (I don't recall seeing this happen before with any daemon installed from rpm packages). This seems to be a new upgrade artifact that I don't remember seeing before but I've only done 4-5 upgrades so far.

I'm going to run the upgrade in verbose mode and see if I can find any other artifacts but need to troubleshoot some 10GbE networking problems we're having first.

@shantanub
Copy link
Author

shantanub commented Oct 22, 2013

Maybe I dismissed the idea too early... I could extend your example and poll to see if the service is running (every 5 minutes or something) and if it is not, start the salt-minion.

That seems a little overkill but would resolve this specific issue.

I should be able to use salt-ssh to login to all of minions and start the minion service manually however right?

@equinoxefr
Copy link
Contributor

equinoxefr commented Oct 22, 2013

@shantanub, I look at the spec file. There is some code to stop salt-minion before upgrade and if it's an upgrade to restart the minion with service salt-minion condrestart

I don't know why this part of code doesn't work. Before 0.17 version, the minion wasn't stopped during upgrade.

@basepi
Copy link
Contributor

basepi commented Oct 22, 2013

Strange. I wonder if we changed something between 0.17.0 and 0.17.1 that would have caused this change.

@equinoxefr I can't see any recent changes that changed whether the salt process was stopped or not. Looks like the stop of the salt-minion has been in there for awhile (at least for the 0.16 release, I didn't go earlier). Just wondering where you were looking to see that change.

@shantanub You've had successful upgrades from epel before, then? But not 0.17.0-0.17.1?

@equinoxefr
Copy link
Contributor

equinoxefr commented Oct 22, 2013

@basepi i didn't see any change in the code i have seen a change in the functioning of minion upgrade (on linux RPM).
Before 0.17.1, salt-minion process wasn't restarted after upgrade. If you upgraded from 0.16.3 to 0.17 and launched a test.version just after, salt-minion will say 0.16.3 not 0.17. After a manual restart it works and says 0.17...

Now with 0.17.1 salt-minion is stopped and not restarted. Perhaps the piece of code that use the state of RPM operation doesn't work.

0.16.0 -> 0.16.3 = Minion not restarted
0.16.3 -> 0.17 = Minion not restarted
0.17 -> 0.17.1 = Minion stopped

I did my tests on centos 6.4.

I don't know why but something has changed ;-)

@basepi
Copy link
Contributor

basepi commented Oct 22, 2013

Hrm, well, it doesn't appear to be in the spec file, so it must be somewhere else. Maybe the init script?

@shantanub
Copy link
Author

shantanub commented Oct 23, 2013

@basepi That's correct. I've always upgraded and used the epel/epel-testing rpms to install salt so far. This is the first time I've noticed the minion stopped after/during the upgrade (this is something that would have been obvious since I would have to login to every minion and restart the service). The upgrade itself seems to have executed fine in every other regard that I can tell (no errors, etc...).

Now, I have in the past lost all of the rhel5 minions when the salt-master service is restarted. I haven't had a problem with that in a few versions, but as I mentioned above, I very much would like to depart from rhel5 as soon as possible.

@shantanub
Copy link
Author

shantanub commented Dec 7, 2013

Upon upgrading to v.17.2, it looks like the minion restarts as a part of the upgrade just fine.

Has this issue with the package upgrade been resolved?

As a fail-safe, I "start" my minions every 5 minutes via cron just in case they're down for some reason or another. I'll need to add the windows specific scheduler task as well.

I would still like a definitive guide for how-to-upgrade minions and the salt-master. We're moving salt to production once v.17 is available in epel (as opposed to epel-testing), and I'd very much like upgrades to go smoothly.

@basepi
Copy link
Contributor

basepi commented Dec 9, 2013

In general, the upgrades themselves tend to go swimmingly. The problem is the restart after the upgrade. We have an open issue specifically for the restarting of the minion: #5721

The issue also varies in severity from system to system (specifically between different init systems). Making it so the salt minion can restart itself consistently is high on our priority list.

@ghost
Copy link

ghost commented Dec 19, 2013

Are you sure the minion upgrade/restart problem isn't just a bug in the rpm post install script? That's where it should be restarted ... did you have a look at the source rpm?

@tkwilliams
Copy link
Contributor

tkwilliams commented Dec 19, 2013

Restarting the minion process from inside a state running against that minion process (e.g. with a service / watch type state rule) will always fail. It's hard to imagine getting it to work without either a moderate re-architecture of the salt workflow or some cumbersome custom code inside the state machine to handle this special case.

That said, the following has been allowing flawless upgrades for me since I started deploying minions around 0.16.1, up to 0.17.4 we're running now:

salt-minion-reload:
  cmd.wait:
    - name: echo service salt-minion restart | at now + 5 minutes
    - watch:
      - file: /etc/salt/minion
      - pkg: salt-minion-pkgs

Obviously, you can set the wait time to whatever you want -- just make sure it's long enough that the minion proc doesn't get whacked during the current run...

@basepi
Copy link
Contributor

basepi commented Dec 20, 2013

Using at is a great idea. And if you put order: last onto the state, then you can put it at + 1 minute or something without much fear of it cutting the process off.

@tkwilliams
Copy link
Contributor

tkwilliams commented Dec 20, 2013

Ah, 'order: last' is a damned fine idea. Ashamed I didn't think of it
myself...

On Fri, Dec 20, 2013 at 11:41 AM, Colton Myers notifications@github.comwrote:

Using at is a great idea. And if you put order: last onto the state, then
you can put it at + 1 minute or something without much fear of it cutting
the process off.


Reply to this email directly or view it on GitHubhttps://github.com//issues/7997#issuecomment-31035886
.

@basepi
Copy link
Contributor

basepi commented Dec 20, 2013

Well, I can't believe I've never thought of using at! Thanks for your awesome workaround (until we can get this working properly without it, of course)

@shantanub
Copy link
Author

shantanub commented Dec 23, 2013

@tkwilliams: Woah cool. That's really helpful. Now you mentioned you guys watch your minion file as well. Does that mean you specify the contents of the minion file for every host/vm? Are you just pulling the fqdn from the environment for the contents of that file or doing something else?

I'm having a lot of trouble with renaming minions (I have the master copy keys and change the contents of the minion file on the minion but a simple service restart of the salt-minion doesn't seem to be sufficient to get the master talking to the minion with the new minion hostname).

This is an artifact of our kickstart setup. All of our hosts start up with a name that looks like "preconfig-macaddr.domain.org" where macaddr is the macaddress of the primary kickstarted interface (aa-bb-cc-dd-ee-ff). We then set the hostname and role of hosts via script but this has been a little painful since salt doesn't seem to readily want to move to the new hostname. Rebooting the host/VM after the change seems to work but I'd prefer not to have to do that.

I'll experiment with this little wrinkle and see if it helps with renaming minions.

@shantanub
Copy link
Author

shantanub commented Dec 31, 2013

The "at now" trick didn't work for renaming minions on rhel6 unfortunately. Something is caching the old hostname even though I've changed it just about everywhere I can imagine.

Oh well, back to restarting vms/hosts when I rename them.

@basepi
Copy link
Contributor

basepi commented Jan 3, 2014

Are you also deleting the /etc/salt/minion_id file so that the minion isn't caching the old name?

@shantanub
Copy link
Author

shantanub commented Jan 3, 2014

Nope. I'm putting the new hostname in that file and it doesn't seem to be doing anything without a reboot.

@shantanub
Copy link
Author

shantanub commented Jan 4, 2014

I've actually passed along to Seth the actual scripts I'm using to perform the name change. Feel free to see if I'm doing something silly. He thinks there may be a timing issue I've glossed over.

@basepi
Copy link
Contributor

basepi commented Jan 6, 2014

You do need to restart the minion to change the minion ID. Don't know if "without a reboot" meant system reboot or minion restart.

@shantanub
Copy link
Author

shantanub commented Jan 7, 2014

@basepi: I mean a system reboot/restart of the minion host/vm is required. As I mentioned restarting the minion service post name change even with the 'at now +1 minutes', several different sleep lengths, etc... doesn't get the minion to show up on the master as up.

An interesting factoid is I can restart the minion reverting the keys on the master back to the original minion hostname and the minion shows up on the master just fine (this is without changing the contents of the minion_id file or the actual hostname of the minion which now point to the new hostname). So something is being cached somewhere and I'm not sure why/what. I don't run nscd if that matters.

@shantanub
Copy link
Author

shantanub commented Jan 7, 2014

Just so we're all on the same page here's what I do: I have 2 hostname change scripts. One that resides on and is called from the master and one on each minion. The master calls the minion script as a part of its script via salt's remote execution framework:

rename-minion.sh script run on salt-master:

#!/bin/bash

DOMAIN={{ domainsuffix }} 

#!/bin/bash

die () {
    /bin/echo >&2 "$@"
    exit 1
}

[ "$#" -eq 2 ] || die "2 arguments required, $# provided"

/bin/echo $1
/bin/echo $2

orig_hostname="$1.${DOMAIN}"
new_hostname="$2.${DOMAIN}"

/bin/echo $orig_hostname
/bin/echo $new_hostname

path_to_keys="/etc/salt/pki/master/minions"

if [ -f "$path_to_keys/$orig_hostname" ]; then
    /bin/cp -a $path_to_keys/$orig_hostname $path_to_keys/$new_hostname

    # change name on minion
    /usr/bin/salt -v $orig_hostname cmd.run "/managed/scripts/set_hostname.sh $2 ${DOMAIN}"

    # /bin/sleep 10s
    /bin/rm -f $path_to_keys/$orig_hostname
fi

set_hostname.sh script called on minion:

#!/bin/bash

die () {
    /bin/echo >&2 "$@"
    exit 1
}

[ "$#" -eq 2 ] || die "2 arguments required, $# provided"

/bin/echo $1
/bin/echo $2

DOMAIN="$2"
hostname="$1.${DOMAIN}"

/bin/echo $hostname
/bin/cp -a /etc/sysconfig/network /etc/sysconfig/network.bak 
/usr/bin/chattr -i -V /etc/sysconfig/network 
/bin/sed "s/HOSTNAME=.*/HOSTNAME=${hostname}/" /etc/sysconfig/network.bak > /etc/sysconfig/network
/bin/hostname $hostname 
/usr/bin/chattr +i -V /etc/sysconfig/network 
/sbin/service salt-minion stop

/bin/echo $hostname > /etc/salt/minion_id

/sbin/reboot

# /bin/sleep 5
# /sbin/service salt-minion start

I've commented out the sleeps and the minion service start since they weren't doing anything post name change (in favor of a full host/vm restart of the minion but did experiment with a number of combinations of sleep times, calls to restart and stop/start of the minion service without avail.

@basepi
Copy link
Contributor

basepi commented Jan 8, 2014

I think I see what's going on here. I think what you need to do is delete the keys on the master before the minion is restarted with the new ID.

Nevermind, though, that particular issue should be resolved with a minion restart, not a minion system restart. Still, would be something to try.

@shantanub
Copy link
Author

shantanub commented Jan 8, 2014

Umm.. how exactly do I target the minion if I delete its key before restarting it?

That nested remote-execution call to the minion will never execute.

Are you implying this can't be done without an out-of-band restart of the salt minion (via salt-ssh or some other method?)?

@dragon788
Copy link
Contributor

dragon788 commented May 26, 2015

@andrejohansson Do any of the workarounds in this fellow's repo work on 2012? https://github.com/markuskramerIgitt/LearnSalt/blob/master/learn-run-as.sls

@dragon788
Copy link
Contributor

dragon788 commented May 29, 2015

We ended up creating a batch file to uninstall and reinstall the salt minion, since simply upgrading in place had weird behaviors on 2014.7.x to newer 2014.7.x. We then schedule the batch file using the method above which with /ru "SYSTEM" works quite well. The one thing we added to the batch was backing up the minion.pem and minion.pub (could also add the minion [.conf]) so that when it talks back to the master it isn't colliding with its old key, it is reusing it. We also had to trigger a service start after we installed the new version otherwise we are unable to connect to it from the salt master.

@andrejohansson
Copy link

andrejohansson commented Jul 22, 2015

@dragon788 yes, I ended up doing something similar, but I havent saved the key files yet. Smart!
What I've done is the following:

  1. Made a salt state that does the following:
    • Downloads the new minion and places in a temp folder
    • Downloads a scheduled task with multiple actions (xml file) to the same temp folder
    • Schedules the task (xml file) to run once on boot
    • Disable the minion so it doesn't autostart
    • Schedules a reboot
  2. Once the computer reboots the scheduled tasks
    • Uninstalls salt-minion
    • Removes the c:\salt dir completely (we don't want leftover cache stuff or other things)
    • Installs the new minion
    • Waits 10 minutes
    • Starts the minion

The reboot I've found necessary because even in the newest 2015.5.X releases sometimes nssm.exe won't be deleted by the uninstaller and remains active in c:\salt. This can prevent successful installs and startups later.

The wait 10 minutes I've found necessary because sometimes the installer won't start the minion after exit.

@douardda
Copy link

douardda commented Dec 1, 2015

under systemd systems, this upgrade of salt-minion via salt is really a pain in the neck.
On Debian jessie, not only the salt call never returns (which is not really a problem), but it le the salt-minion package half upgraded and the salt-minion stopped...
This is due to the fact the salt-minion.service file uses the default cgroup mode for the KillMode.

Setting this to KillMode=process helps there. I guess the salt-minion.service should be modified in this way. Meanwhile, I deploy the file /etc/systemd/system/salt-minion.service.d/killmode.conf with

[Service]
KillMode=process

It allows me to properly run

root@pw01:/srv/salt# salt 'pw02' service.restart salt-minion 
pw02:
    True

@basepi
Copy link
Contributor

basepi commented Dec 1, 2015

@douardda Thanks for the update. Looks like your pull request is merged, and I'm working on merging it forward today.

@sjorge
Copy link
Contributor

sjorge commented Jan 26, 2016

Having this issue too on SmartOS/Solaris.
My solution for now is also using at but I use this command:
salt-call --local service.restart salt-minion this works on both SmartOS and my ubuntu test vm

There should probably be a delay option for service.restart that uses at or the windows scheduler where it can to do the action after X seconds. Or better yet a way for the salt-minion to be survive a restart

@JensRantil
Copy link
Contributor

JensRantil commented Feb 5, 2016

Hm, this issue is labeled "documentation", but the discussions doesn't really be about documentation. Is this properly labeled?

@basepi
Copy link
Contributor

basepi commented Feb 5, 2016

This is the relevant comment: #7997 (comment)

@sjorge
Copy link
Contributor

sjorge commented Feb 5, 2016

Ideally there would be a 'reload' command instead that would properly reload everything for the salt-master/salt-minion without actually stopping and starting the process.

@cachedout
Copy link
Contributor

cachedout commented Feb 5, 2016

@sjorge That would be nice but that's a pretty tall order. We'd have to refresh the opts dict everywhere and that's non-trivial.

@sjorge
Copy link
Contributor

sjorge commented Feb 5, 2016

Can't the bigger brush be use? Closing the socks on the current process and forking a new copy, then existing. Since all listening sockets are close the new minion process should start fine while the old can still send a reply that all is OK to the master?

@dragon788
Copy link
Contributor

dragon788 commented Feb 18, 2016

The reload sounds really similar to how nginx handles config changes, it lets the sockets from the old config survive and spawns new ones with the new config until the old ones all expire.

One thing we've noticed is that upgrading a debian version forces the service restart due to how Debian derivatives handles services in general, ie "you requested this be installed so we are enabling/starting the service NOW!". We have worked around this when preseeding salt-minion on machines by using the rc-policy.d trick (I might have dyslexia'd the name) which basically prevents a service from starting during apt-get operations, though it shouldn't affect running services. This could then possibly be followed up by a salt-call --local service.restart as mentioned above to flip from old version (in memory) to new version (on disk).

This is completely untested, I'm just going through my watched issues and seeing if I've found any new creative ways to fix them.

@basepi
Copy link
Contributor

basepi commented Feb 24, 2016

I'm wondering if we should close this in favor of #5721?

@marbx
Copy link
Contributor

marbx commented Aug 17, 2016

I need to update out-of-date (0.17.5) Ubuntu Minions from an up-to-date (2016.3.2) salt-master; Minion-ID must remain the same, minion key must remain the same, minion cache should remain the same.
saltutil.update fails (see below)
What shall I do?

  • Install Esky on each Minion?
  • Uninstall each Minion and re-install it with pip?

Documentation at https://docs.saltstack.com/en/latest/ref/modules/all/salt.modules.saltutil.html#salt.modules.saltutil.update seems not updated since 2014.
Is what mickep76 commented on Jul 7, 2014 documented somewhere else?
update_url for UNIX is missing.
dragon788 commented on May 29, 2015 that an undocumented script is needed to retain the minion key.

Does saltutil.update work?
If yes, what are the requirements?

mkramer@mgmt-bn-051:~$ sudo salt --version
salt 2016.3.2 (Boron)

mkramer@mgmt-bn-051:~$ sudo salt PC* test.versions_report
PC-LIN-01:
               Salt: 0.17.5
             Python: 2.7.6 (default, Jun 22 2015, 17:58:13)
             Jinja2: 2.7.2
           M2Crypto: 0.21.1
     msgpack-python: 0.3.0
       msgpack-pure: Not Installed
           pycrypto: 2.6.1
             PyYAML: 3.10
              PyZMQ: 14.0.1
                ZMQ: 4.0.4
PC-LIN-02:
               Salt: 0.17.5
             Python: 2.7.6 (default, Jun 22 2015, 17:58:13)
             Jinja2: 2.7.2
           M2Crypto: 0.21.1
     msgpack-python: 0.3.0
       msgpack-pure: Not Installed
           pycrypto: 2.6.1
             PyYAML: 3.10
              PyZMQ: 14.0.1
                ZMQ: 4.0.4
mkramer@mgmt-bn-051:~$ sudo salt PC* saltutil.update
PC-LIN-01:
    Esky not available as import
PC-LIN-02:
    Esky not available as import

@oliver-dungey
Copy link

oliver-dungey commented Aug 19, 2016

The relatively new minion config option master_tries should help out in this area - I've recently set all my minions to a value of -1 (unlimited retries) which seems to be really helping to keep the minions connected up to the master ... but I may be talking nonsense as I haven't had enough time to be definite about that.

@carsonoid
Copy link
Contributor

carsonoid commented Aug 24, 2016

I've solved this by having salt simply fork off an upgrade script that lets the minion return instantly and has the service restart in the background. The script is Ubuntu 14.04 specific but could easily be adapted.

minion-upgrade.sls

/tmp/salt-minion-upgrade-deb.sh:
  cmd.script:
    - source: salt://salt/upgrade-minion-deb.sh

minion-upgrade-deb.sh

#!/bin/bash

# This script forks off and runs in the background so salt can continue

{
    DEBIAN_FRONTEND=noninteractive apt-get install -y -o Dpkg::Options::=--force-confold salt-minion
    service salt-minion restart
} >>/var/log/salt/minion-upgrade.log 2>&1 &

disown

@Reiner030
Copy link

Reiner030 commented Aug 24, 2016

Just to complete the ideas:
Last week found which do a good way (with "at" or "nohup" way):
https://docs.saltstack.com/en/latest/faq.html#what-is-the-best-way-to-restart-a-salt-daemon-using-salt

@vutny
Copy link
Contributor

vutny commented Aug 25, 2016

Speaking only about restarting Minion, I really like the solution from here: #5721
Being nice and tiny, it works with Salt 2016.3 on most Linux distros:

salt '*' cmd.run_bg 'sleep 10; service salt-minion restart'

As for Debian 8 and Ubuntu 16 with systemd on board, to prevent the salt-minion service to be restarted automatically after the upgrade, you need to mask it first.
So, the "upgrade" procedure is following:

salt -C 'G@init:systemd and G@os_family:Debian' service.mask salt-minion
salt -C 'G@init:systemd and G@os_family:Debian' pkg.install salt-minion refresh=True
salt -C 'G@init:systemd and G@os_family:Debian' service.unmask salt-minion

There is another solution for Upstart and SystemV init -- using of policy-rc.d method. You need to temporarily deny the runlevel operations:

salt -C '( G@init:upstart or G@init:sysvinit ) and G@os_family:Debian' file.manage_file \
/usr/sbin/policy-rc.d '' '{}' '' '{}' root root '755' base '' contents=''
salt -C '( G@init:upstart or G@init:sysvinit ) and G@os_family:Debian' file.append \
/usr/sbin/policy-rc.d '!#/bin/sh' 'exit 101'
salt -C '( G@init:upstart or G@init:sysvinit ) and G@os_family:Debian' pkg.install \
salt-minion refresh=True
salt -C '( G@init:upstart or G@init:sysvinit ) and G@os_family:Debian' file.remove \
/usr/sbin/policy-rc.d

I've found that it's the most reliable way to get Salt Minion upgraded properly.
Now you could safely restart Minions using nohup method from the FAQ.

Also I've discovered that restarting Minions just with:

salt '*' service.restart salt-minion

works like a charm with recent Salt version from 2015.8 and 2016.3 branches even after I did the upgrade. I believe this is because of systemd units were patched and Salt does the trick by forking itself to run restart command while keeping connection to a Master.

Need to do more testing, but I think using at or nohup is only required for scripting upgrade from very old versions in Salt States.

@Trouble123
Copy link

Trouble123 commented Aug 27, 2016

Hi, i had been trying to restart a windows minion for ages, and finally worked out a way to get it to work everytime:

salt '*' cmd.run_bg 'Restart-Service salt-minion' shell=powershell

@pavankumar2203
Copy link

pavankumar2203 commented Sep 29, 2016

Thanks @Trouble123 👍 That worked.

@vutny
Copy link
Contributor

vutny commented Sep 29, 2016

What about saltutil.update? Looks like it's Windows specific function.
https://docs.saltstack.com/en/latest/topics/tutorials/esky.html

And more, I see that esky becomes unmaintained (again?): https://github.com/cloudmatrix/esky
Anyone from Windows camp, does the thing still works?

cachedout pushed a commit that referenced this issue Mar 10, 2017
Fix #7997: describe how to upgrade Salt Minion in a proper way
gitebra pushed a commit to gitebra/salt that referenced this issue Mar 12, 2017
* upstream/develop: (57 commits)
  Gate class definitions
  Don't hardcode the webserver port number
  INFRA-4506 - fix indentation
  INFRA-4506 - test=True should not return False on success
  INFRA-4506 - add list_rules() function to boto_cloudwatch module.  Change 'Name' param of present and absent functions to default to the value of name if not provided.
  Update Azure ARM cache
  add utils to engines
  Disable mentionbot delay on develop
  Mention bot delay disable for 2016.11
  Add a function to list PRs to the GitHub execution module
  Fix saltstack#7997: describe how to upgrade Salt Minion in a proper way
  minionswarm.py: allow random UUID
  INFRA-4506 - add CLI example before lint complains :)
  INFRA-4506 - boto_lambda module is mysteriously missing a 'list_functions' function
  add specific docs for cmd_subset
  Avahi/Bonjour: Detect hostname or IP address change
  Add special token to insert the minion id into the default_include path
  Pylint fixes
  Correct comment lines output got list_hosts
  Code cleanup and make sure the beacons config file is deleted after testing
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation help-wanted P1
Projects
None yet
Development

No branches or pull requests