Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrading Salt via Salt results in dying minions and broken dpkg #43340

Closed
syphernl opened this issue Sep 4, 2017 · 19 comments
Closed

Upgrading Salt via Salt results in dying minions and broken dpkg #43340

syphernl opened this issue Sep 4, 2017 · 19 comments

Comments

@syphernl
Copy link
Contributor

@syphernl syphernl commented Sep 4, 2017

Description of Issue/Question

We use pkg.upgrade dist_upgrade=True to upgrade our servers semi-unattended.

They were running 2017.7.0 prior to issuing the upgrade command and were supposed to be upgraded to 2017.7.1.

However, a majority of the machines had issues where the salt-minion would not be running anymore afterwards and shows the following issue when doing anything with apt afterwards on the machine itself:

E: dpkg was interrupted, you must manually run 'dpkg —configure -a' to correct the problem.

Doing so shows the following issue:

dpkg: error processing package salt-minion (--configure):
 package is in a very bad inconsistent state; you should
 reinstall it before attempting configuration

After running the commands above, this

ISTM that this issue may be caused by the fact that the systemd config has changed, since it seems to only affect our Ubuntu 16.04 machines and works fine on 14.04.

Versions Report

Salt Version:
           Salt: 2017.7.1

Dependency Versions:
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: 2.4.2
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.8
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: 1.0.3
   msgpack-pure: Not Installed
 msgpack-python: 0.4.6
   mysql-python: Not Installed
      pycparser: Not Installed
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 2.7.12 (default, Nov 19 2016, 06:48:10)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 15.2.0
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.1.4

System Versions:
           dist: Ubuntu 16.04 xenial
         locale: ANSI_X3.4-1968
        machine: x86_64
        release: 4.4.59-1-pve
         system: Linux
        version: Ubuntu 16.04 xenial
@gtmanfred

This comment has been minimized.

Copy link
Contributor

@gtmanfred gtmanfred commented Sep 5, 2017

I have replicated this issue.

@terminalmage shouldn't this have been solved by using systemd-run --scope to move the apt-get and dpkg processes out of the same tree as the salt minion?

@syphernl a quick fix is to add KillMode=Process in a drop-in file for the salt-minion process. This will allow service to be restarted, but not kill off the minion process that is running the command.

Thanks,
Daniel

@gtmanfred gtmanfred added this to the Approved milestone Sep 5, 2017
@gtmanfred gtmanfred added the P1 label Sep 5, 2017
@terminalmage

This comment has been minimized.

Copy link
Collaborator

@terminalmage terminalmage commented Sep 5, 2017

@gtmanfred Using systemd-run --scope will place the apt-get install command in a different cgroup from the salt-minion process. The question here though is how dpkg manages restarting the service. As I understand it, it will perform a systemctl try-restart salt-minion.service, which should SIGTERM all of the PIDs in the same cgroup as salt-minion.service. Given that the apt-get install command is in a different cgroup, it should survive the service restart. This was actually the reason we made this change in the first place.

@syphernl

This comment has been minimized.

Copy link
Contributor Author

@syphernl syphernl commented Sep 6, 2017

@gtmanfred Using KillMode=process (which appears to be case sensitive) it seems to work fine now.
It properly returns back to the master and the fresh processes (with the new version) are up and running afterwards.

@babilen5

This comment has been minimized.

Copy link
Contributor

@babilen5 babilen5 commented Sep 15, 2017

It appears that there are multiple PRs and bugs that either add or remove KillMode settings from all unit files, while we want to keep it for salt-minion, but not for salt-master.

Please see #33792 for a discussion concerning the salt-master and #29295 for the one concerning salt-minion.

#36806 is the one that triggered this round of removal.

@jellisii

This comment has been minimized.

Copy link

@jellisii jellisii commented Oct 10, 2017

This particular bug has been haunting me ever since I started deploying xenial servers.

@syphernl

This comment has been minimized.

Copy link
Contributor Author

@syphernl syphernl commented Oct 11, 2017

The same thing is happening again with the upgrade of 2017.7.2:

root@web01:~# dpkg -l | grep salt
ii  salt-common                      2017.7.1+ds-1                              all          shared libraries that salt requires for all packages
iFR salt-minion                      2017.7.1+ds-1                              all          client package for salt, the distributed remote execution system
root@web01:~# apt -f install
E: dpkg was interrupted, you must manually run 'sudo dpkg --configure -a' to correct the problem. 
root@web01:~# sudo dpkg --configure -a

dpkg: error processing package salt-minion (--configure):
 package is in a very bad inconsistent state; you should
 reinstall it before attempting configuration

Errors were encountered while processing:
 salt-minion
@Ch3LL

This comment has been minimized.

Copy link
Contributor

@Ch3LL Ch3LL commented Oct 24, 2017

@terminalmage mind taking a look at this for 2017.7.3 release?

@rallytime

This comment has been minimized.

Copy link
Contributor

@rallytime rallytime commented Jan 5, 2018

This is fixed by #44427, which I have backported to 2017.7 in #45255. Closing. Thanks everyone!

@rallytime rallytime closed this Jan 5, 2018
@syphernl

This comment has been minimized.

Copy link
Contributor Author

@syphernl syphernl commented Feb 6, 2018

@rallytime Seems like this is still an issue with 2017.7.3. Our minions died after an upgrade and required manual intervention to come back up.

@Marx1st

This comment has been minimized.

Copy link

@Marx1st Marx1st commented Feb 6, 2018

I can confirm this. 2017.7.3 craps out on Ubuntu 16.04 and Debian 8+9, again... :(

@gtmanfred

This comment has been minimized.

Copy link
Contributor

@gtmanfred gtmanfred commented Feb 6, 2018

I am testing on debian 9, and I am unable to replicate this.

root@salt:~# salt-call --version
salt-call 2017.7.2 (Nitrogen)
root@salt:~# salt \* pkg.upgrade
salt:
    ----------
    salt-common:
        ----------
        new:
            2017.7.3+ds-1
        old:
            2017.7.2+ds-1
    salt-master:
        ----------
        new:
            2017.7.3+ds-1
        old:
            2017.7.2+ds-1
    salt-minion:
        ----------
        new:
            2017.7.3+ds-1
        old:
            2017.7.2+ds-1
    sensible-utils:
        ----------
        new:
            0.0.9+deb9u1
        old:
            0.0.9
root@salt:~# salt \* test.ping
salt:
    True

Can you provide the output you were seeing during the upgrade in the minion logs?

Thanks,
Daniel

@Marx1st

This comment has been minimized.

Copy link

@Marx1st Marx1st commented Feb 6, 2018

root@salt:~# salt-call --version
salt-call 2017.7.2 (Nitrogen)
root@salt:~# salt -C "suche2*" pkg.upgrade refresh=False, dist_upgrade=False
suche2.xxx.xxxxx.de:
    Minion did not return. [Not connected]

2018-02-06 16:04:08,472 [salt.minion      ][INFO    ][1568] User root Executing command pkg.upgrade with jid 20180206160408455420
2018-02-06 16:04:08,516 [salt.minion      ][INFO    ][12062] Starting a new job with PID 12062
2018-02-06 16:04:08,697 [salt.loader.salt.xxx.xxxx.de.int.module.cmdmod][INFO    ][12062] Executing command ['dpkg-query', '--showformat', '${Status} ${Package} ${Version} ${Architecture}', '-W'] in directory '/root'
2018-02-06 16:04:09,559 [salt.loader.salt.xxx.xxxx.de.int.module.cmdmod][INFO    ][12062] Executing command ['systemd-run', '--scope', 'apt-get', '-q', '-y', '-o', 'DPkg::Options::=--force-confold', '-o', 'DPkg::Options::=--force-confdef', 'upgrade'] in directory '/root'
2018-02-06 16:04:12,861 [salt.utils.parsers][WARNING ][1568] Minion received a SIGTERM. Exiting.
2018-02-06 16:04:12,865 [salt.cli.daemons ][INFO    ][1568] Shutting down the Salt Minion


root@salt:~# salt wiki* pkg.upgrade
[ERROR   ] saltutil returning errors on minion wiki.xxx.xxxxxxx.de

2018-02-06 16:10:32,006 [salt.minion      ][INFO    ][1354] User root Executing command pkg.upgrade with jid 20180206161031991788
2018-02-06 16:10:32,045 [salt.minion      ][INFO    ][7004] Starting a new job with PID 7004
2018-02-06 16:10:32,076 [salt.loader.salt.xxx.xxxxxx.de.int.module.cmdmod][INFO    ][7004] Executing command ['apt-get', '-q', 'update'] in directory '/root'
2018-02-06 16:10:37,136 [salt.minion      ][INFO    ][1354] User root Executing command saltutil.find_job with jid 20180206161037123566
2018-02-06 16:10:37,217 [salt.minion      ][INFO    ][7303] Starting a new job with PID 7303
2018-02-06 16:10:37,253 [salt.minion      ][INFO    ][7303] Returning information for job: 20180206161037123566
2018-02-06 16:10:41,166 [salt.loader.salt.xxx.xxxxx.de.int.module.cmdmod][INFO    ][7004] Executing command ['dpkg-query', '--showformat', '${Status} ${Package} ${Version} ${Architecture}', '-W'] in directory '/root'
2018-02-06 16:10:42,338 [salt.loader.salt.xxx.xxxxx.de.int.module.cmdmod][INFO    ][7004] Executing command ['systemd-run', '--scope', 'apt-get', '-q', '-y', '-o', 'DPkg::Options::=--force-confold', '-o', 'DPkg::Options::=--force-confdef', 'upgrade'] in directory '/root'
2018-02-06 16:10:47,226 [salt.minion      ][INFO    ][1354] User root Executing command saltutil.find_job with jid 20180206161047211797
2018-02-06 16:10:47,302 [salt.minion      ][INFO    ][7500] Starting a new job with PID 7500
2018-02-06 16:10:47,368 [salt.utils.parsers][WARNING ][1354] Minion received a SIGTERM. Exiting.
2018-02-06 16:10:47,370 [salt.cli.daemons ][INFO    ][1354] Shutting down the Salt Minion
2018-02-06 16:10:47,418 [salt.loader      ][ERROR   ][7500] Failed to import module saltutil as the module called exit()
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/salt/loader.py", line 1412, in _load_module
    mod = imp.load_module(mod_namespace, fn_, fpath, desc)
  File "/usr/lib/python2.7/dist-packages/salt/modules/saltutil.py", line 42, in <module>
    import salt.client.ssh.client
  File "/usr/lib/python2.7/dist-packages/salt/client/ssh/client.py", line 4, in <module>
    from __future__ import absolute_import
  File "/usr/lib/python2.7/dist-packages/salt/utils/process.py", line 713, in _handle_signals
    sys.exit(salt.defaults.exitcodes.EX_OK)
SystemExit: 0
2018-02-06 16:10:48,176 [salt.minion      ][INFO    ][7500] Returning information for job: 20180206161047211797
@gtmanfred

This comment has been minimized.

Copy link
Contributor

@gtmanfred gtmanfred commented Feb 6, 2018

Here is my whole trace log of it working https://gist.github.com/gtmanfred/0719ca5c4fac2b3a67cfcddf61bfb3a6 unfortunately i still cannot replicate this

If you have a machine that is still on 2017.7.2 on can you open /usr/lib/python2.7/dist-packages/salt/modules/aptpkg.py and add use_vt=True to the cmd.run_all on line 1065

    result = __salt__['cmd.run_all'](cmd,
                                     output_loglevel='trace',
                                     python_shell=False,
                                     use_vt=True,
                                     env=DPKG_ENV_VARS.copy())

And then restart the salt-minion and run the upgrade?

Thanks,
Daniel

@Marx1st

This comment has been minimized.

Copy link

@Marx1st Marx1st commented Feb 6, 2018

Hi,

strange behavior:

root@salt:~# salt indico* pkg.upgrade
indico.www-farm.uni-wuppertal.de:
    Minion did not return. [Not connected]

BUT, the update did finish successfully and the minion is running!

root@indico:~# dpkg -l salt*
|/ Fehler?=(kein)/R=Neuinstallation notwendig (Status, Fehler: GROSS=schlecht)
||/ Name                                            Version                      Architektur                  Beschreibung
+++-===============================================-============================-============================-===================================================================================================
ii  salt-common                                     2017.7.3+ds-1                all                          shared libraries that salt requires for all packages
ii  salt-minion                                     2017.7.3+ds-1                all                          client package for salt, the distributed remote execution system
@gtmanfred

This comment has been minimized.

Copy link
Contributor

@gtmanfred gtmanfred commented Feb 6, 2018

What is in the /var/log/salt/minion debug log when you have that use_vt enabled? it should show all the apt-get output in there

Daniel

@Marx1st

This comment has been minimized.

Copy link

@Marx1st Marx1st commented Feb 6, 2018

It looks like this (but that was the half way successful update):

Feb  6 16:52:55 indico salt-minion[23176]: [INFO    ] User root Executing command pkg.upgrade with jid 20180206165255853367
Feb  6 16:52:55 indico salt-minion[23176]: [INFO    ] Starting a new job with PID 23258
Feb  6 16:52:55 indico salt-minion[23176]: [INFO    ] Executing command ['apt-get', '-q', 'update'] in directory '/root'
Feb  6 16:53:00 indico salt-minion[23176]: [INFO    ] User root Executing command saltutil.find_job with jid 20180206165300982621
Feb  6 16:53:01 indico salt-minion[23176]: [INFO    ] Starting a new job with PID 23617
Feb  6 16:53:01 indico salt-minion[23176]: [INFO    ] Returning information for job: 20180206165300982621
Feb  6 16:53:02 indico salt-minion[23176]: [INFO    ] Executing command ['dpkg-query', '--showformat', '${Status} ${Package} ${Version} ${Architecture}', '-W'] in directory '/root'
Feb  6 16:53:02 indico salt-minion[23176]: [INFO    ] Executing command ['systemd-run', '--scope', 'apt-get', '-q', '-y', '-o', 'DPkg::Options::=--force-confold', '-o', 'DPkg::Options::=--force-confdef', 'upgrade'] in directory '/root'
Feb  6 16:53:03 indico salt-minion[23176]: Running scope as unit run-r2b3739df0402469785881f957e4db8c2.scope.
Feb  6 16:53:03 indico salt-minion[23176]: Reading package lists...
Feb  6 16:53:03 indico salt-minion[23176]: Building dependency tree...
Feb  6 16:53:03 indico salt-minion[23176]: Reading state information...
Feb  6 16:53:03 indico salt-minion[23176]: Calculating upgrade...
Feb  6 16:53:03 indico salt-minion[23176]: The following packages will be upgraded:
Feb  6 16:53:03 indico salt-minion[23176]:   grub-common grub-pc grub-pc-bin grub2-common salt-common salt-minion
Feb  6 16:53:03 indico salt-minion[23176]: 6 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Feb  6 16:53:03 indico salt-minion[23176]: Need to get 0 B/8269 kB of archives.
Feb  6 16:53:03 indico salt-minion[23176]: After this operation, 585 kB of additional disk space will be used.
Feb  6 16:53:03 indico salt-minion[23176]: Preconfiguring packages ...
Feb  6 16:53:04 indico salt-minion[23176]: (Reading database ... #015(Reading database ... 5%#015(Reading database ... 10%#015(Reading database ... 15%#015(Reading database ... 20%#015(Reading database ... 25%#015(Reading database ... 30%#015(Reading database ... 35%#015(Reading database ... 40%#015(Reading database ... 45%#015(Reading database ... 50%#015(Reading database ... 55%#015(Reading database ... 60%#015(Reading database ... 65%#015(Reading database ... 70%#015(Reading database ... 75%#015(Reading database ... 80%#015(Reading database ... 85%#015(Reading database ... 90%#015(Reading database ... 95%#015(Reading database ... 100%#015(Reading database ... 105273 files and directories currently installed.)
Feb  6 16:53:04 indico salt-minion[23176]: Preparing to unpack .../grub-pc_2.02~beta2-36ubuntu3.16_amd64.deb ...
Feb  6 16:53:04 indico salt-minion[23176]: Unpacking grub-pc (2.02~beta2-36ubuntu3.16) over (2.02~beta2-36ubuntu3.15) ...
Feb  6 16:53:04 indico salt-minion[23176]: Preparing to unpack .../grub-pc-bin_2.02~beta2-36ubuntu3.16_amd64.deb ...
Feb  6 16:53:04 indico salt-minion[23176]: Unpacking grub-pc-bin (2.02~beta2-36ubuntu3.16) over (2.02~beta2-36ubuntu3.15) ...
Feb  6 16:53:04 indico salt-minion[23176]: Preparing to unpack .../grub2-common_2.02~beta2-36ubuntu3.16_amd64.deb ...
Feb  6 16:53:04 indico salt-minion[23176]: Unpacking grub2-common (2.02~beta2-36ubuntu3.16) over (2.02~beta2-36ubuntu3.15) ...
Feb  6 16:53:05 indico salt-minion[23176]: Preparing to unpack .../grub-common_2.02~beta2-36ubuntu3.16_amd64.deb ...
Feb  6 16:53:05 indico salt-minion[23176]: Unpacking grub-common (2.02~beta2-36ubuntu3.16) over (2.02~beta2-36ubuntu3.15) ...
Feb  6 16:53:05 indico salt-minion[23176]: [WARNING ] Minion received a SIGTERM. Exiting.
Feb  6 16:53:05 indico salt-minion[23176]: [INFO    ] Shutting down the Salt Minion
Feb  6 16:53:05 indico salt-minion[23176]: The Salt Minion is shutdown. Minion received a SIGTERM. Exited.
Feb  6 16:53:19 indico salt-minion[25879]: [INFO    ] Setting up the Salt Minion "indico.www-farm.uni-wuppertal.de"
Feb  6 16:53:20 indico salt-minion[25879]: [INFO    ] Starting up the Salt Minion
Feb  6 16:53:20 indico salt-minion[25879]: [INFO    ] Starting pull socket on /var/run/salt/minion/minion_event_e43f7695da_pull.ipc
Feb  6 16:53:21 indico salt-minion[25879]: [INFO    ] Creating minion process manager
Feb  6 16:53:21 indico salt-minion[25879]: [INFO    ] Executing command ['date', '+%z'] in directory '/root'
Feb  6 16:53:21 indico salt-minion[25879]: [INFO    ] Updating job settings for scheduled job: __mine_interval
Feb  6 16:53:21 indico salt-minion[25879]: [INFO    ] Added mine.update to scheduler
Feb  6 16:53:21 indico salt-minion[25879]: [INFO    ] Minion is starting as user 'root'
Feb  6 16:53:21 indico salt-minion[25879]: [INFO    ] Minion is ready to receive requests!
Feb  6 16:57:40 indico salt-minion[25879]: [INFO    ] User root Executing command pkg.upgrade with jid 20180206165740126360
Feb  6 16:57:40 indico salt-minion[25879]: [INFO    ] Starting a new job with PID 26113
Feb  6 16:57:40 indico salt-minion[25879]: [INFO    ] Executing command ['apt-get', '-q', 'update'] in directory '/root'
Feb  6 16:57:45 indico salt-minion[25879]: [INFO    ] Executing command ['dpkg-query', '--showformat', '${Status} ${Package} ${Version} ${Architecture}', '-W'] in directory '/root'
Feb  6 16:57:45 indico salt-minion[25879]: [INFO    ] User root Executing command saltutil.find_job with jid 20180206165745221141
Feb  6 16:57:45 indico salt-minion[25879]: [INFO    ] Starting a new job with PID 26473
Feb  6 16:57:45 indico salt-minion[25879]: [INFO    ] Returning information for job: 20180206165745221141
Feb  6 16:57:46 indico salt-minion[25879]: [INFO    ] Executing command ['systemd-run', '--scope', 'apt-get', '-q', '-y', '-o', 'DPkg::Options::=--force-confold', '-o', 'DPkg::Options::=--force-confdef', 'upgrade'] in directory '/root'
Feb  6 16:57:47 indico salt-minion[25879]: [INFO    ] Executing command ['dpkg-query', '--showformat', '${Status} ${Package} ${Version} ${Architecture}', '-W'] in directory '/root'
Feb  6 16:57:47 indico salt-minion[25879]: [INFO    ] Returning information for job: 20180206165740126360
@gtmanfred

This comment has been minimized.

Copy link
Contributor

@gtmanfred gtmanfred commented Feb 6, 2018

Ok, i have been able to replicate that, the problem is the fact that the KillMode=process is not in place when the debian scripts do what they do, which is stop the services before laying down the files, then start it again after laying them down.

This should be fully fixed with the next upgrade, but I will make sure that it gets tested and is verified that it will work starting with 2017.7.4

Thanks,
Daniel

@gtmanfred

This comment has been minimized.

Copy link
Contributor

@gtmanfred gtmanfred commented Feb 23, 2018

I have confirmed that the upgrade to 2017.7.4 goes smoothly using our staging repository.

root@salt:~# salt \* pkg.upgrade refresh=true
salt.c.gtmanfred-1263.internal:
    ----------
    gcc-6-base:
        ----------
        new:
            6.3.0-18+deb9u1
        old:
            6.3.0-18
    google-cloud-sdk:
        ----------
        new:
            190.0.1-0
        old:
            187.0.0-0
    libgcc1:
        ----------
        new:
            1:6.3.0-18+deb9u1
        old:
            1:6.3.0-18
    libstdc++6:
        ----------
        new:
            6.3.0-18+deb9u1
        old:
            6.3.0-18
    libtasn1-6:
        ----------
        new:
            4.10-1.1+deb9u1
        old:
            4.10-1.1
    salt-common:
        ----------
        new:
            2017.7.4+ds-1
        old:
            2017.7.3+ds-1
    salt-master:
        ----------
        new:
            2017.7.4+ds-1
        old:
            2017.7.3+ds-1
    salt-minion:
        ----------
        new:
            2017.7.4+ds-1
        old:
            2017.7.3+ds-1
    tzdata:
        ----------
        new:
            2018c-0+deb9u1
        old:
            2017c-0+deb9u1
@babilen5

This comment has been minimized.

Copy link
Contributor

@babilen5 babilen5 commented Mar 9, 2018

I deployed the following before the upgrade, which allowed for a smooth one:

salt_minion_killmode_drop_in:
  file.managed:
    - name: /etc/systemd/system/salt-minion.service.d/killmode.conf
    - makedirs: True
    - dir_mode: 755
    - contents:
      - '[Service]'
      - 'KillMode=process'

reload_daemons:
  cmd.run:
    - name: 'systemctl daemon-reload'
    - onchanges_in:
      - file: salt_minion_killmode_drop_in
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants
You can’t perform that action at this time.