Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Minion loses the connection after updating from 3004 to 3005 onedir #63325

Open
2 of 9 tasks
monofumado opened this issue Dec 15, 2022 · 8 comments
Open
2 of 9 tasks
Labels
Bug broken, incorrect, or confusing behavior needs-triage

Comments

@monofumado
Copy link

Description
I want to update my environment to 3005, for now I have updated my test master to 3005, all new minions with fresh 3005 installation seem to work correctly, but when I update minions from 3004 to 3005 then they lose the connection

Setup
Minion with 3004 classic version, already in contact with master (master running 3005 onedir), then I perform the update to 3005 onedir, everything seems normal on minion side, no errors, service runs normally, but I'm not able to contact the minion from the master having the error:

    Minion did not return. [No response]
    The minions may not have all finished running and any remaining minions will return upon completion. To look up the return data for this job later, run the following command: 

Please be as specific as possible and give set-up details.

  • on-prem machine
  • VM (Virtualbox, KVM, etc. please specify)
  • VM running on a cloud service, please be explicit and add details
  • container (Kubernetes, Docker, containerd, etc. please specify)
  • or a combination, please be explicit
  • jails if it is FreeBSD
  • classic packaging
  • onedir packaging
  • used bootstrap to install

Steps to Reproduce the behavior

  • upgrade minion from 3004.2 classic to 3005.1 onedir
  • run a test.ping
    Restarting the minion throws the following errors:
Dec 15 10:51:36 minion01 systemd[1]: Started The Salt Minion.
Dec 15 10:51:40 minion01 salt-minion[292047]: [WARNING ] Got events for closed stream <zmq.eventloop.zmqstream.ZMQStream object at 0x7f28c5a72d90>
Dec 15 10:52:54 minion01 systemd[1]: Stopping The Salt Minion...
Dec 15 10:52:54 minion01 salt-minion[292047]: [WARNING ] Minion received a SIGTERM. Exiting.
Dec 15 10:52:54 minion01 salt-minion[292047]: The Salt Minion is shutdown. Minion received a SIGTERM. Exited.
Dec 15 10:52:54 minion01 salt-minion[292047]: TRACE:salt.transport.ipc:IPCClient: Connecting to socket: /var/run/salt/minion/minion_event_ce6006f7de_pull.ipc
Dec 15 10:52:55 minion01 salt-minion[292047]: TRACE:salt.transport.ipc:IPCClient: Connecting to socket: /var/run/salt/minion/minion_event_ce6006f7de_pull.ipc
Dec 15 10:52:55 minion01 salt-minion[292047]: ERROR:salt.utils.event:Unable to connect pusher: Stream is closed
Dec 15 10:52:55 minion01 salt-minion[292047]: TRACE:salt.minion:ret_val = <salt.ext.tornado.concurrent.Future object at 0x7f28c47bd4c0>
Dec 15 10:52:55 minion01 salt-minion[292047]: TRACE:salt.transport.ipc:Subscriber disconnected from IPC /var/run/salt/minion/minion_event_ce6006f7de_pub.ipc
Dec 15 10:52:55 minion01 salt-minion[292047]: TRACE:salt.transport.ipc:Client disconnected from IPC /var/run/salt/minion/minion_event_ce6006f7de_pull.ipc
Dec 15 10:52:55 minion01 salt-minion[292047]: DEBUG:salt.crypt:salt.crypt.get_rsa_key: Loading private key 
Dec 15 10:52:55 minion01 salt-minion[292047]: DEBUG:salt.crypt:Loaded minion key: /etc/salt/pki/minion/minion.pem
Dec 15 10:52:55 minion01 salt-minion[292047]: DEBUG:salt.crypt:salt.crypt.verify_signature: Loading public key 
Dec 15 10:52:55 minion01 salt-minion[292047]: DEBUG:salt.crypt:salt.crypt.get_rsa_pub_key: Loading public key 
Dec 15 10:52:55 minion01 salt-minion[292047]: DEBUG:salt.crypt:salt.crypt.verify_signature: Verifying signature
Dec 15 10:52:55 minion01 salt-minion[292047]: DEBUG:salt.channel.client:Closing AsyncReqChannel instance
Dec 15 10:52:55 minion01 salt-minion[292047]: DEBUG:salt.minion:Refreshing matchers.
Dec 15 10:52:55 minion01 salt-minion[292047]: DEBUG:salt.transport.ipc:Closing IPCMessageClient instance
Dec 15 10:52:55 minion01 salt-minion[292047]: DEBUG:salt.minion:Refreshing beacons.
Dec 15 10:52:55 minion01 salt-minion[292047]: DEBUG:salt.utils.event:SaltEvent PUB socket URI: /var/run/salt/minion/minion_event_ce6006f7de_pub.ipc
Dec 15 10:52:55 minion01 salt-minion[292047]: DEBUG:salt.utils.event:SaltEvent PULL socket URI: /var/run/salt/minion/minion_event_ce6006f7de_pull.ipc
Dec 15 10:52:55 minion01 salt-minion[292047]: TRACE:salt.transport.ipc:IPCClient: Connecting to socket: /var/run/salt/minion/minion_event_ce6006f7de_pull.ipc
Dec 15 10:52:56 minion01 salt-minion[292047]: TRACE:salt.transport.ipc:IPCClient: Connecting to socket: /var/run/salt/minion/minion_event_ce6006f7de_pull.ipc
Dec 15 10:52:56 minion01 salt-minion[292047]: ERROR:salt.utils.event:Unable to connect pusher: Stream is closed
Dec 15 10:52:56 minion01 salt-minion[292047]: TRACE:salt.channel.client:Failed to send msg SaltReqTimeoutError('Message timed out')
Dec 15 10:52:56 minion01 salt-minion[292047]: TRACE:salt.channel.client:ReqChannel send crypt load={'id': 'minion01.corp.ad.net', 'cmd': '_minion_event', 'pretag': None, 'tok': b'3\x1f1\xd3\x18\xc7Hg\xabS<\x832\x1ct\xdf\xebD\xae:%4\xaf\xbeb\xadcm\xf3\x11\xad\xc2\xe3T\xd2B\xa6\';\xf2\xc3\xa2\x8f\xe5z\x00\xb9\xcd\xfcW>\x0f\xf7U\xf5<e\x97C]\x0f\xb1~\x19x\xd7\xc2\x8e5D\x1e\xc3\xae\xadp\x05`\xf4\xe6\x7f8\xda\xae*b\x88\xd2y\x8f\x0fG!\xd0\xc9:\xba2j)\x1c\xd1\xe47\xfbeo\x96\xffI\xf9\x89<\xc5V\x84J?\x8e\xd0\xf7E\xe4\\\xe3\xa9F\xd3-\xff%\xe4M\x1aET\xbd}\xa60\x88\xb9\x06\x12\x04ul\xfe\xa7\n\xe5\xea\xa0\xcf\x87?\xa48\xday/\x1d\xa7\x8b9\xcab\xdf\xc6N\xf6\xf1\xc8F\xec\xe4\xf9J\xd6\xb8\xa1\x9d "f\x8a0q\xfd\x1b\xc63\x1e\xbe\x87>\x15\x98\x85J\xb0\xb9\xf7\xb1/\x8c|\xaeY\xeb\x81\x91\xd8\x1d\xad\xe0\xc8~\x95\xafB6\x0f\x8b\x97\x86<Y!\x04a\x8d` \xc9\x82~\xc8\xd3\x04\xc7\xee[\xb0H{\xf6\xcc^^2\xcbRc\xb7\xfb\x86', 'data': 'Minion minion01.corp.ad.net started at Thu Dec 15 10:51:40 2022', 'tag': 'minion_start', '_stamp': '2022-12-15T09:51:40.733525', 'nonce': '2edea3c4551a4f889cdd815a3d3a34be'}
Dec 15 10:54:24 minion01 systemd[1]: salt-minion.service: State 'stop-sigterm' timed out. Killing.
Dec 15 10:54:24 minion01 systemd[1]: salt-minion.service: Killing process 292022 (/opt/saltstack/) with signal SIGKILL.
Dec 15 10:54:24 minion01 systemd[1]: salt-minion.service: Main process exited, code=killed, status=9/KILL
Dec 15 10:54:24 minion01 systemd[1]: salt-minion.service: Failed with result 'timeout'.
Dec 15 10:54:24 minion01 systemd[1]: salt-minion.service: Unit process 292047 (/opt/saltstack/) remains running after unit stopped.
Dec 15 10:54:24 minion01 systemd[1]: Stopped The Salt Minion.
Dec 15 10:54:24 minion01 systemd[1]: salt-minion.service: Consumed 12.902s CPU time.
Dec 15 10:54:26 minion01 salt-minion[292047]: ERROR:salt.scripts:Minion process encountered exception: [Errno 3] No such process 

Expected behavior
Communication from master to minion remains

Versions Report

salt --versions-report (Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)
Salt Version:
          Salt: 3005.1
 
Dependency Versions:
          cffi: 1.14.6
      cherrypy: 18.6.1
      dateutil: 2.8.1
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 3.1.0
       libgit2: Not Installed
      M2Crypto: 0.38.0
          Mako: Not Installed
       msgpack: 1.0.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     pycparser: 2.21
      pycrypto: Not Installed
  pycryptodome: 3.9.8
        pygit2: Not Installed
        Python: 3.9.15 (main, Nov  8 2022, 03:42:58)
  python-gnupg: 0.4.8
        PyYAML: 5.4.1
         PyZMQ: 23.2.0
         smmap: Not Installed
       timelib: 0.2.4
       Tornado: 4.5.3
           ZMQ: 4.3.4
 
System Versions:
          dist: ubuntu 22.04 jammy
        locale: utf-8
       machine: x86_64
       release: 5.15.0-56-generic
        system: Linux
       version: Ubuntu 22.04 jammy

Additional context
Add any other context about the problem here.

@monofumado monofumado added Bug broken, incorrect, or confusing behavior needs-triage labels Dec 15, 2022
@OrangeDog
Copy link
Contributor

Possibly related to #62881

@rayddteam
Copy link

Hey guys!
In my case only one minion start to lose a connection with ZMQ.
After some investigation I found that the minion disconnected due enabled status beacon.
status: []
Most likely it is related to amount of data or some special characters beacon collect on FreeBSD workstation.
Thanks!

@BlackMetalz
Copy link

Confirmed

@monofumado
Copy link
Author

Any update on this? it happens also with 3006.1

@djmmatracki
Copy link

djmmatracki commented Dec 20, 2023

I updated from 3004.2 to 3006.5 and seeing the same issue. After some time minions stop responding to test.ping. Updated to 3006.5 with one dir

@djmmatracki
Copy link

@monofumado Did You find a fix or workaround for this issue?

@monofumado
Copy link
Author

Unfortunately not, I updated the server but still having the minions on 3004 because of the problem, if I update the minions to 3005 or 3006 they keep disconnecting from master, "workaround" is to restart the minions, but that is not really working since we don't have access to all devices

@cdalvaro
Copy link
Contributor

cdalvaro commented Jan 9, 2024

I had a similar issue with my minions running on macOS (#64153)

I found that running the highstate with return_job: true made the minions to lose connection with master after the first run. So setting return_job: false fixed my issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior needs-triage
Projects
None yet
Development

No branches or pull requests

6 participants