[Regression] Failhard, batch and retcodes #54521

Oloremo · 2019-09-18T20:20:48Z

Description of Issue

Seems like there are long forgotten issue regarding failhard support with batch.

Seems like an ability to stop batch execution after a first encountered error was added here: #22855

After that it was many times reported to be broken or not working as expected:
#29643 - supposed to be fixed by #31164
#44256 - not resolved, closed as stale.
#24996 - supposed to be fixed by #33048

As for 2019.2.0 Salt orchestration with failhard: True don't stop after a first error but continue to execute all hosts batch by batch.

This behavior is very dangerous and could lead to applying broken state on the whole fleet before you even notice or to loss data and just a huge pain for any network consensus type of bootstrapping like Mysql Galera.

Setup

Steps to Reproduce Issue

Create an orchestration file:

issue:
  salt.state:
    - tgt: '*'
    - sls:
      - bug
    - batch: 1

Create a state file bug:

first:
  cmd.run:
    - name: echo "first"

will_fail:
  http.query:
    - name: 'http://127.0.0.1:10000'
    - status: 200

unreachable:
  cmd.run:
    - name: echo "unreachable"

execute this orchestration with failhard: True set in configs.

Expected: Execution stopped after the first node returned an error from non-working state will_fail.
Currently: Execution of the state failing on will_fail but all nodes execute the state, batch by batch.

Versions Report

Salt Version:
           Salt: 2019.2.0

Dependency Versions:
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: Not Installed
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.10.1
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.6.1
   mysql-python: Not Installed
      pycparser: Not Installed
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 3.6.6 (default, Aug 13 2018, 18:24:23)
   python-gnupg: 0.4.4
         PyYAML: 5.1
          PyZMQ: 18.0.1
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.5.3
            ZMQ: 4.3.1

System Versions:
           dist: centos 7.5.1804 Core
         locale: UTF-8
        machine: x86_64
        release: 3.10.0-862.3.2.el7.x86_64
         system: Linux
        version: CentOS Linux 7.5.1804 Core

The text was updated successfully, but these errors were encountered:

also noticed a bug in cli.batch itself, probably from backport mistake. fixed that as well with the unindent.

Oloremo · 2019-11-05T12:43:56Z

Fixed in 2019.2.2

Oloremo changed the title ~~Failhard, batch and retcodes~~ [Regression] Failhard, batch and retcodes Sep 18, 2019

mattp- added a commit to bloomberg/salt that referenced this issue Sep 20, 2019

fix saltstack#54521; failhard not being respected in batch/orch

178d2ab

also noticed a bug in cli.batch itself, probably from backport mistake. fixed that as well with the unindent.

Oloremo mentioned this issue Sep 20, 2019

failhard not being respected in batch/orch #54701

Closed

sbrennan4 pushed a commit to sbrennan4/salt that referenced this issue Sep 25, 2019

fix saltstack#54521; failhard not being respected in batch/orch

fd8bab9

also noticed a bug in cli.batch itself, probably from backport mistake. fixed that as well with the unindent.

Oloremo mentioned this issue Sep 29, 2019

[Regression] Batch with failhard fix #54806

Merged

garethgreenaway added the needs-triage label Oct 14, 2019

garethgreenaway assigned Akm0d Oct 14, 2019

Oloremo closed this as completed Nov 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Regression] Failhard, batch and retcodes #54521

[Regression] Failhard, batch and retcodes #54521

Oloremo commented Sep 18, 2019 •

edited

Loading

Oloremo commented Nov 5, 2019

[Regression] Failhard, batch and retcodes #54521

[Regression] Failhard, batch and retcodes #54521

Comments

Oloremo commented Sep 18, 2019 • edited Loading

Description of Issue

Setup

Steps to Reproduce Issue

Versions Report

Oloremo commented Nov 5, 2019

Oloremo commented Sep 18, 2019 •

edited

Loading