Skip to content

complex nodegroups results in false "Minion did not return errors"  #65523

@matthewsht

Description

@matthewsht

Description
All,
We use nodegroups in our monthly patching cycle, building the final list of hosts to patch with several "not" groups anded together. These are in turn Grain and List based themselves. This leads to 4 hosts consistently yielding "Minion did not return errors" -- those minion should not have rec'd any command at all, and so this is a false error. [Sorry - this is hard to describe]

Setup
All affected systems are 3005.3 .
All systems are direct connected to salt-master03.
Note: upgrade to 3006 is scheduled, but we're govt and can't just push the patch out.

Nodegroups.conf contains relevant lines (other comments and unrelated nodegroups elided):

nodegroups:
  patch-excluded: '' # systems that are not patched on an existing schedule, or are excluded this month
  patch-foundation-q: '( N@backup-servers or L@distro-master,salt-master03 ) and not N@patch-excluded'
  not-hpc-internal: 'G@hpc_internal:False'
  # has bug
  patch-normal: ' N@not-hpc-internal and not N@patch-excluded and not N@patch-foundation-q'
  # does not have bug
  #patch-normal: 'N@not-hpc-internal and not N@patch-excluded'
  # has bug
  #patch-normal: 'not N@patch-foundation and N@not-hpc-internal and not N@patch-excluded'
  backup-servers: 'L@backup-slave,backup-master'

I've tried moving the "backup-servers" nodegroup before patch-normal, but the problem is not order dependant.

Please be as specific as possible and give set-up details.

  • on-prem machine
  • VM (Virtualbox, KVM, etc. please specify) -- some of these are VM's, some are physical hardware.
  • VM running on a cloud service, please be explicit and add details
  • container (Kubernetes, Docker, containerd, etc. please specify)
  • or a combination, please be explicit
  • jails if it is FreeBSD
  • classic packaging
  • onedir packaging
  • used bootstrap to install

Steps to Reproduce the behavior
This nodegroup setup yields a list of ALL of our systems (not-hpc-internal), EXCEPT the hpc-internal ones, and EXCEPT the
hosts specifically listed in N@backup-servers and N@patch-foundation-q
This explicit list is distro-master,salt-master03,backup-slave,backup-master

These 4 hosts yield the error referenced:

salt -N patch-normal test.ping
system1
    True
system2:
    True
system3:
AND MANY OTHERS
salt-master03:
    Minion did not return. [No response]
    The minions may not have all finished running and any remaining minions will return upon completion. To look up the return data for this job later, run the following command:

    salt-run jobs.lookup_jid 20231108154033009594
backup-slave:
    Minion did not return. [No response]
    The minions may not have all finished running and any remaining minions will return upon completion. To look up the return data for this job later, run the following command:

    salt-run jobs.lookup_jid 20231108154033009594
backup-master:
    Minion did not return. [No response]
    The minions may not have all finished running and any remaining minions will return upon completion. To look up the return data for this job later, run the following command:

    salt-run jobs.lookup_jid 20231108154033009594
distro-master:
    Minion did not return. [No response]
    The minions may not have all finished running and any remaining minions will return upon completion. To look up the return data for this job later, run the following command:

    salt-run jobs.lookup_jid 20231108154033009594
ERROR: Minions returned with non-zero exit code

This bug report is specifically around why these 4 nodes report this error: everything else is working as intended/desired.

Expected behavior
We expect the command to not generate errors for the 4 systems specifically excluded.

Versions Report

salt --versions-report ```shell Salt Version: Salt: 3005.3

Dependency Versions:
cffi: 1.14.6
cherrypy: unknown
dateutil: 2.8.1
docker-py: Not Installed
gitdb: 4.0.10
gitpython: 3.1.37
Jinja2: 3.1.0
libgit2: Not Installed
M2Crypto: Not Installed
Mako: Not Installed
msgpack: 1.0.2
msgpack-pure: Not Installed
mysql-python: Not Installed
pycparser: 2.21
pycrypto: Not Installed
pycryptodome: 3.9.8
pygit2: Not Installed
Python: 3.9.18 (main, Nov 1 2022, 00:00:00)
python-gnupg: 0.4.8
PyYAML: 6.0.1
PyZMQ: 23.2.0
smmap: 5.0.1
timelib: 0.2.4
Tornado: 4.5.3
ZMQ: 4.3.4

System Versions:
dist: centos 9
locale: utf-8
machine: x86_64
release: 5.14.0-370.el9.x86_64
system: Linux
version: CentOS Stream 9


I can pretty easily add/modify these nodegroups for testing - please let me know.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Corerelates to code central or existential to Saltbugbroken, incorrect, or confusing behaviorneeds-triage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions