We ran through the right way of doing this in salt training with Seth but I think I'm still missing something. I'm not sure if this is a bug or if I've missed something. I tried to run through the upgrade the master first / use salt to upgrade the minion service steps to upgrade from v.17 to v.17.1 of salt and ended up with losing access to most of my minions.
Long story short, I need a reliable way of upgrading all of the salt-minions and salt-master packages without losing access to the minions. From what I can tell, every time I perform such an upgrade I lose access to some if not all of my minions and need to login to each host/VM and restart the salt-minion package. This is doable in test/dev where we have 30 nodes being managed but not when I move this infrastructure to prod where I have over 200 nodes to manage. I need the upgrade path not to break the remote execution framework established between minions and master.
So without further ado here's what I did:
Update the master.
[root@salt-master ~]# yum list updates
Loaded plugins: security
epel | 3.0 kB 00:00
epel/primary_db | 6.2 MB 00:00
epel-testing | 2.9 kB 00:00
epel-testing/primary_db | 2.2 MB 00:00
rhel-localrepo | 3.0 kB 00:00
rhel-localrepo/primary_db | 26 MB 00:00
Updated Packages
glibc.x86_64 2.12-1.107.el6_4.5 rhel-localrepo
glibc-common.x86_64 2.12-1.107.el6_4.5 rhel-localrepo
glibc-devel.x86_64 2.12-1.107.el6_4.5 rhel-localrepo
glibc-headers.x86_64 2.12-1.107.el6_4.5 rhel-localrepo
java-1.6.0-openjdk.x86_64 1:1.6.0.0-1.65.1.11.13.el6_4 rhel-localrepo
kernel.x86_64 2.6.32-358.23.2.el6 rhel-localrepo
kernel-firmware.noarch 2.6.32-358.23.2.el6 rhel-localrepo
kernel-headers.x86_64 2.6.32-358.23.2.el6 rhel-localrepo
libtar.x86_64 1.2.11-17.el6_4.1 rhel-localrepo
nscd.x86_64 2.12-1.107.el6_4.5 rhel-localrepo
perf.x86_64 2.6.32-358.23.2.el6 rhel-localrepo
salt.noarch 0.17.1-1.el6 epel-testing
salt-master.noarch 0.17.1-1.el6 epel-testing
salt-minion.noarch 0.17.1-1.el6 epel-testing
setup.noarch 2.8.14-20.el6_4.1 rhel-localrepo
tzdata.noarch 2013g-1.el6 rhel-localrepo
tzdata-java.noarch 2013g-1.el6 rhel-localrepo
You have new mail in /var/spool/mail/root
[root@salt-master ~]# yum update -y
I restart the master and minion on my master VM.
[root@salt-master ~]# service salt-master restart
Stopping salt-master daemon: [ OK ]
Starting salt-master daemon: [ OK ]
[root@salt-master ~]# service salt-minion restart
Stopping salt-minion daemon: [ OK ]
Starting salt-minion daemon: [ OK ]
Try to upgrade some of my test minion VMs.
[root@salt-master ~]# salt 'salt-minion*' pkg.upgrade
[root@salt-master ~]# salt 'salt-minion*' pkg.list_upgrades
[root@salt-master ~]# salt -v 'salt-minion*' test.ping
Executing job with jid 20131021102016190263
-------------------------------------------
salt-minion-00:
Minion did not return
salt-minion-01:
Minion did not return
I login to each minion VM and restart the salt-minion service.
[root@salt-minion-01 ~]# service salt-minion restart
Stopping salt-minion daemon: [FAILED]
Starting salt-minion daemon: [ OK ]
[root@salt-minion-01 ~]# chkconfig --list | grep salt-minion
salt-minion 0:off 1:off 2:on 3:on 4:on 5:on 6:off
Now I can ping the VMs again.
[root@salt-master ~]# salt -v 'salt-minion*' test.ping
Executing job with jid 20131021102229314417
-------------------------------------------
salt-minion-01:
True
salt-minion-00:
True
Versions reports:
[root@salt-master ~]# salt --versions-report
Salt: 0.17.1
Python: 2.6.6 (r266:84292, May 27 2013, 05:35:12)
Jinja2: 2.2.1
M2Crypto: 0.20.2
msgpack-python: 0.1.13
msgpack-pure: Not Installed
pycrypto: 2.0.1
PyYAML: 3.10
PyZMQ: 2.2.0.1
ZMQ: 3.2.4
[root@salt-minion-00 ~]# salt-call --versions-report
Salt: 0.17.1
Python: 2.6.6 (r266:84292, May 27 2013, 05:35:12)
Jinja2: 2.2.1
M2Crypto: 0.20.2
msgpack-python: 0.1.13
msgpack-pure: Not Installed
pycrypto: 2.0.1
PyYAML: 3.10
PyZMQ: 2.2.0.1
ZMQ: 3.2.4
[root@salt-minion-01 ~]# salt-call --versions-report
Salt: 0.17.1
Python: 2.6.8 (unknown, Nov 7 2012, 14:47:45)
Jinja2: unknown
M2Crypto: 0.21.1
msgpack-python: 0.1.12
msgpack-pure: Not Installed
pycrypto: 2.3
PyYAML: 3.08
PyZMQ: 2.1.9
ZMQ: 2.2.0
You'll notice that the upgrade proceeded correctly. The packages were upgraded, but the salt-minion services were not restarted as a part of the upgrade process (for both minion VMs - one is RHEL5 and the other is RHEL6). Unfortunately, I didn't think to run the upgrade packages command in verbose mode at the time.
Do I need to find some external remote-execution method to restart all of the minions post-upgrade (mussh/omnitty, etc...)? This is probably not a bug but it's still very frustrating... I'm unlikely to upgrade again until I can figure out how to do this properly.
We ran through the right way of doing this in salt training with Seth but I think I'm still missing something. I'm not sure if this is a bug or if I've missed something. I tried to run through the upgrade the master first / use salt to upgrade the minion service steps to upgrade from v.17 to v.17.1 of salt and ended up with losing access to most of my minions.
Long story short, I need a reliable way of upgrading all of the salt-minions and salt-master packages without losing access to the minions. From what I can tell, every time I perform such an upgrade I lose access to some if not all of my minions and need to login to each host/VM and restart the salt-minion package. This is doable in test/dev where we have 30 nodes being managed but not when I move this infrastructure to prod where I have over 200 nodes to manage. I need the upgrade path not to break the remote execution framework established between minions and master.
So without further ado here's what I did:
Update the master.
I restart the master and minion on my master VM.
Try to upgrade some of my test minion VMs.
I login to each minion VM and restart the salt-minion service.
Now I can ping the VMs again.
Versions reports:
You'll notice that the upgrade proceeded correctly. The packages were upgraded, but the salt-minion services were not restarted as a part of the upgrade process (for both minion VMs - one is RHEL5 and the other is RHEL6). Unfortunately, I didn't think to run the upgrade packages command in verbose mode at the time.
Do I need to find some external remote-execution method to restart all of the minions post-upgrade (mussh/omnitty, etc...)? This is probably not a bug but it's still very frustrating... I'm unlikely to upgrade again until I can figure out how to do this properly.