Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Regression] Failhard, batch and retcodes #54521

Closed
Oloremo opened this issue Sep 18, 2019 · 1 comment
Closed

[Regression] Failhard, batch and retcodes #54521

Oloremo opened this issue Sep 18, 2019 · 1 comment
Assignees

Comments

@Oloremo
Copy link
Contributor

Oloremo commented Sep 18, 2019

Description of Issue

Seems like there are long forgotten issue regarding failhard support with batch.

Seems like an ability to stop batch execution after a first encountered error was added here: #22855

After that it was many times reported to be broken or not working as expected:
#29643 - supposed to be fixed by #31164
#44256 - not resolved, closed as stale.
#24996 - supposed to be fixed by #33048

As for 2019.2.0 Salt orchestration with failhard: True don't stop after a first error but continue to execute all hosts batch by batch.

This behavior is very dangerous and could lead to applying broken state on the whole fleet before you even notice or to loss data and just a huge pain for any network consensus type of bootstrapping like Mysql Galera.

Setup

Steps to Reproduce Issue

Create an orchestration file:

issue:
  salt.state:
    - tgt: '*'
    - sls:
      - bug
    - batch: 1

Create a state file bug:

first:
  cmd.run:
    - name: echo "first"

will_fail:
  http.query:
    - name: 'http://127.0.0.1:10000'
    - status: 200

unreachable:
  cmd.run:
    - name: echo "unreachable"

execute this orchestration with failhard: True set in configs.

Expected: Execution stopped after the first node returned an error from non-working state will_fail.
Currently: Execution of the state failing on will_fail but all nodes execute the state, batch by batch.

Versions Report

Salt Version:
           Salt: 2019.2.0

Dependency Versions:
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: Not Installed
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.10.1
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.6.1
   mysql-python: Not Installed
      pycparser: Not Installed
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 3.6.6 (default, Aug 13 2018, 18:24:23)
   python-gnupg: 0.4.4
         PyYAML: 5.1
          PyZMQ: 18.0.1
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.5.3
            ZMQ: 4.3.1

System Versions:
           dist: centos 7.5.1804 Core
         locale: UTF-8
        machine: x86_64
        release: 3.10.0-862.3.2.el7.x86_64
         system: Linux
        version: CentOS Linux 7.5.1804 Core
@Oloremo Oloremo changed the title Failhard, batch and retcodes [Regression] Failhard, batch and retcodes Sep 18, 2019
mattp- added a commit to bloomberg/salt that referenced this issue Sep 20, 2019
also noticed a bug in cli.batch itself, probably from backport mistake.
fixed that as well with the unindent.
sbrennan4 pushed a commit to sbrennan4/salt that referenced this issue Sep 25, 2019
also noticed a bug in cli.batch itself, probably from backport mistake.
fixed that as well with the unindent.
@Oloremo
Copy link
Contributor Author

Oloremo commented Nov 5, 2019

Fixed in 2019.2.2

@Oloremo Oloremo closed this as completed Nov 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants