Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2019.2.1: Breaks Minion-Master Communication #54762

Closed
margau opened this issue Sep 26, 2019 · 10 comments

Comments

@margau
Copy link

commented Sep 26, 2019

Description of Issue

After our minions were upgraded from 2019.2.0 to 2019.2.1, the communication (IPv6 only) with the master (either 2019.2.0 or 2019.2.1) is broken.
Minion is stuck at:
SaltReqTimeoutError, retrying. (2/7)

Via tcpdump communication - but with length "0" - is observable.

Master says:
Sep 26 11:59:36 salt salt-master[9266]: [WARNING ] /usr/lib/python3/dist-packages/zmq/eventloop/ioloop.py:211: RuntimeWarning: IOLoop.current expected instance of <class 'zmq.eventloop.ioloop.ZMQIOLoop'>, got <tornado.platform.epoll.EPollIOLoop object at 0x7fd9e798feb8> Sep 26 11:59:36 salt salt-master[9266]: ioloop.IOLoop.instance() is IOLoop.instance(), "tornado IOLoop already initialized"

Setup

IPv6-only communication, no special configuration.

Steps to Reproduce Issue

After minion is started:
Sep 26 12:33:52 salt-minion[15491]: [DEBUG ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'minion', 'tcp://<master-ip>', 'clear') Sep 26 12:33:52 salt-minion[15491]: [DEBUG ] Connecting the Minion to the Master URI (for the return server): tcp://<master-ip> Sep 26 12:33:52 salt-minion[15491]: [DEBUG ] Trying to connect to: tcp://<master-ip> Sep 26 12:33:52 salt-minion[15491]: [DEBUG ] salt.crypt.get_rsa_pub_key: Loading public key Sep 26 12:33:57 salt-minion[15491]: [DEBUG ] SaltReqTimeoutError, retrying. (1/7) Sep 26 12:34:02 salt-minion[15491]: [DEBUG ] SaltReqTimeoutError, retrying. (2/7) Sep 26 12:34:07 salt-minion[15491]: [DEBUG ] SaltReqTimeoutError, retrying. (3/7) Sep 26 12:34:12 salt-minion[15491]: [DEBUG ] SaltReqTimeoutError, retrying. (4/7) Sep 26 12:34:17 salt-minion[15491]: [DEBUG ] SaltReqTimeoutError, retrying. (5/7) Sep 26 12:34:22 salt-minion[15491]: [DEBUG ] SaltReqTimeoutError, retrying. (6/7) Sep 26 12:34:27 salt-minion[15491]: [DEBUG ] SaltReqTimeoutError, retrying. (7/7) Sep 26 12:34:32 salt-minion[15491]: [DEBUG ] Re-init ZMQ socket: Message timed out Sep 26 12:34:32 salt-minion[15491]: [DEBUG ] Trying to connect to: tcp://<master-ip> Sep 26 12:34:32 salt-minion[15491]: [DEBUG ] Closing AsyncZeroMQReqChannel instance Sep 26 12:34:32 salt-minion[15491]: [ERROR ] Error while bringing up minion for multi-master. Is master at <master-hostname> responding?

Master:
Sep 26 11:59:36 salt salt-master[9266]: [WARNING ] /usr/lib/python3/dist-packages/zmq/eventloop/ioloop.py:211: RuntimeWarning: IOLoop.current expected instance of <class 'zmq.eventloop.ioloop.ZMQIOLoop'>, got <tornado.platform.epoll.EPollIOLoop object at 0x7fd9e798feb8> Sep 26 11:59:36 salt salt-master[9266]: ioloop.IOLoop.instance() is IOLoop.instance(), "tornado IOLoop already initialized"

With downgrade to 2019.2.0 of the minion everything works fine.

Versions Report

Master:

           Salt: 2019.2.1
 
Dependency Versions:
           cffi: 1.12.3
       cherrypy: Not Installed
       dateutil: 2.5.3
      docker-py: Not Installed
          gitdb: 2.0.3
      gitpython: 2.1.7
          ioflo: Not Installed
         Jinja2: 2.9.4
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.4.8
   mysql-python: Not Installed
      pycparser: 2.19
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 3.5.3 (default, Sep 27 2018, 17:25:39)
   python-gnupg: Not Installed
         PyYAML: 3.12
          PyZMQ: 16.0.2
           RAET: Not Installed
          smmap: 2.0.3
        timelib: Not Installed
        Tornado: 4.4.3
            ZMQ: 4.2.1
 
System Versions:
           dist: debian 9.11 
         locale: UTF-8
        machine: x86_64
        release: 4.19.0-0.bpo.5-amd64
         system: Linux
        version: debian 9.11

Minion:

Salt Version:
           Salt: 2019.2.1
 
Dependency Versions:
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: 2.5.3
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.9.4
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.4.8
   mysql-python: Not Installed
      pycparser: Not Installed
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 3.5.3 (default, Sep 27 2018, 17:25:39)
   python-gnupg: Not Installed
         PyYAML: 3.12
          PyZMQ: 16.0.2
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.4.3
            ZMQ: 4.2.1
 
System Versions:
           dist: debian 9.11 
         locale: UTF-8
        machine: x86_64
        release: 4.19.0-0.bpo.5-amd64
         system: Linux
        version: debian 9.11 
@jbouter

This comment has been minimized.

Copy link
Contributor

commented Sep 26, 2019

This is affecting our systems as well. We have a hybrid ipv4 and ipv6 setup, which no longer works. I haven't been able to link it to any of the items posted in the changelog yet though.

Only our ipv4 minions are connected. Thankfully, we don't heavily depend on ipv6 yet, so removing the ipv6 record to the master resolves it for now.

@awlx

This comment has been minimized.

Copy link

commented Sep 26, 2019

All our minions broke because of this issue ... switching to v4 is not an option here as not all hosts even have a v4 address.

@awlx

This comment has been minimized.

Copy link

commented Sep 26, 2019

I suspect an issue in file:
https://github.com/saltstack/salt/blob/develop/salt/utils/zeromq.py#L81
line 81

Or the method is not called at all ...

Since the connect string on old minions looks like this:
[DEBUG ] Trying to connect to: tcp://[2001:608:a01:ffff::28]:4506

And on new minions like this:
[DEBUG ] Trying to connect to: tcp://2001:608:a01:ffff::28:4506

So the brackets are missing which leads to a none connectable address.

@awlx

This comment has been minimized.

Copy link

commented Sep 26, 2019

The issue could be caused by this commit/PR:
24bb94f
#49755

@cmcmarrow

This comment has been minimized.

Copy link
Contributor

commented Sep 26, 2019

IPV6 is broken in 2019.2.1

@dhiltonp

This comment has been minimized.

Copy link
Contributor

commented Sep 26, 2019

Thanks everyone for the clear reports - check out #54784 and let me know if you see any issues :)

@mweinelt

This comment has been minimized.

Copy link

commented Sep 28, 2019

Went to ansible for a quick fix. Inventory came from salt-key. For us we enabled IPv6 in /etc/salt/minion.d/ipv6.conf, so killing that file and restarting restored connection.

Adapt as needed.
YMMV.

- name: salt_disable_ipv6
  hosts: salt
  become: yes
  become_method: sudo
  handlers:
    - name: restart salt-minion
      service:
        name: salt-minion
        state: restarted
  tasks:
    - name: "Remove /etc/salt/minion.d/ipv6.conf"
      file:
        path: /etc/salt/minion.d/ipv6.conf
        state: absent
      notify: restart salt-minion
@wangguoqin1001

This comment has been minimized.

Copy link

commented Sep 29, 2019

With #54784 applied, either over the release version of 2019.2.1, or just using the head of branch 2019.2.1, errors still triggered as below.

2019-09-29 12:59:40,157 [salt.minion      :1049][CRITICAL][2819594] Unexpected error while connecting to HOSTNAME.DOMAIN
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/salt/minion.py", line 1026, in _connect_minion
    yield minion.connect_master(failed=failed)
  File "/usr/local/lib/python2.7/dist-packages/tornado/gen.py", line 1133, in run
    value = future.result()
  File "/usr/local/lib/python2.7/dist-packages/tornado/concurrent.py", line 261, in result
    raise_exc_info(self._exc_info)
  File "/usr/local/lib/python2.7/dist-packages/tornado/gen.py", line 1141, in run
    yielded = self.gen.throw(*exc_info)
  File "/usr/local/lib/python2.7/dist-packages/salt/minion.py", line 1224, in connect_master
    master, self.pub_channel = yield self.eval_master(self.opts, self.timeout, self.safe, failed)
  File "/usr/local/lib/python2.7/dist-packages/tornado/gen.py", line 1133, in run
    value = future.result()
  File "/usr/local/lib/python2.7/dist-packages/tornado/concurrent.py", line 261, in result
    raise_exc_info(self._exc_info)
  File "/usr/local/lib/python2.7/dist-packages/tornado/gen.py", line 1141, in run
    yielded = self.gen.throw(*exc_info)
  File "/usr/local/lib/python2.7/dist-packages/salt/minion.py", line 749, in eval_master
    yield pub_channel.connect()
  File "/usr/local/lib/python2.7/dist-packages/tornado/gen.py", line 1133, in run
    value = future.result()
  File "/usr/local/lib/python2.7/dist-packages/tornado/concurrent.py", line 261, in result
    raise_exc_info(self._exc_info)
  File "/usr/local/lib/python2.7/dist-packages/tornado/gen.py", line 326, in wrapper
    yielded = next(result)
  File "/usr/local/lib/python2.7/dist-packages/salt/transport/zeromq.py", line 509, in connect
    log.debug('Connecting the Minion to the Master publish port, using the URI: %s', self.master_pub)
  File "/usr/local/lib/python2.7/dist-packages/salt/transport/zeromq.py", line 520, in master_pub
    source_port=self.opts.get('source_publish_port'))
  File "/usr/local/lib/python2.7/dist-packages/salt/transport/zeromq.py", line 84, in _get_master_uri
    master_ip=ip_bracket(master_ip), master_port=master_port)
  File "/usr/local/lib/python2.7/dist-packages/salt/utils/zeromq.py", line 86, in ip_bracket
    addr = ipaddress.ip_address(addr)
  File "/usr/local/lib/python2.7/dist-packages/salt/ext/ipaddress.py", line 120, in ip_address
    address)
ValueError: u'[xxxx:xxxx::xxxx:xxxx]' does not appear to be an IPv4 or IPv6 address

@dhiltonp

This comment has been minimized.

Copy link
Contributor

commented Sep 29, 2019

Thanks for that stack trace!

modules/status.py:ping_master
minion.py:eval_master

The above functions have a side effect of updating opts['master_ip'] to the resolved ip address.
opts['master_ip'] is later passed to _get_master_uri, which assumes a bare ip.

connect_master and reconnect are two of several functions that call eval_master.

We'll need to do make sure we're consistent in doing the conversion, and then we can assume we're dealing with URI-formatted ips everywhere.

@cachedout

This comment has been minimized.

Copy link
Collaborator

commented Oct 21, 2019

Shouldn't this be closed now that #54823 is merged?

@dhiltonp dhiltonp closed this Oct 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants
You can’t perform that action at this time.