[BUG] Parallel salt.state with ssh fails if same minion is targeted #62612

808brinks · 2022-09-02T14:09:18Z

Description
Parallel salt.state with ssh fails if same minion is targeted. The first minion is successful the other minion fail without a clear error.

Setup
Version 3005

parallel-same-minion.sls:

sleep-one:
  salt.state:
    - parallel: True
    - tgt: 'app1'
    - tgt_type: pcre
    - ssh: True
    - sls:
        - sleep

sleep-two:
  salt.state:
    - parallel: True
    - tgt: 'app1'
    - tgt_type: pcre
    - ssh: True
    - sls:
        - sleep

sleep.sls:

sleep:
  cmd.run:
    - name: sleep 2

Steps to Reproduce the behavior
Added the 2 sls files above and run: sudo salt-run state.orchestrate parallel-same-minion.

See the following output:

----------
          ID: sleep-one
    Function: salt.state
      Result: False
     Comment: Run failed on minions: app1
     Started: 16:01:55.617138
    Duration: 3500.82 ms
     Changes:   
              app1:
              
              Summary for app1
              -----------
              Succeeded: 0
              Failed:   0
              -----------
              Total states run:    0
              Total run time:  0.000 ms
----------
          ID: sleep-twp
    Function: salt.state
      Result: True
     Comment: States ran successfully. Updating app1.
     Started: 16:01:55.621592
    Duration: 5599.432 ms
     Changes:   
              app1:
              ----------
                        ID: sleep
                  Function: cmd.run
                      Name: sleep 2
                    Result: True
                   Comment: Command "sleep 2" run
                   Started: 16:01:58.902050
                  Duration: 2009.44 ms
                   Changes:   
                            ----------
                            pid:
                                1089286
                            retcode:
                                0
                            stderr:
                            stdout:
              
              Summary for app1
              ------------
              Succeeded: 1 (changed=1)
              Failed:    0
              ------------
              Total states run:     1
              Total run time:   2.009 s

Expected behavior
Either both results successful or some sort of error on one of the results.
Now it just shows Total states run: 0 for one of the results

Screenshots
If applicable, add screenshots to help explain your problem.

Versions Report

salt --versions-report

(Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)

Salt Version:
          Salt: 3005
 
Dependency Versions:
          cffi: Not Installed
      cherrypy: Not Installed
      dateutil: 2.8.1
     docker-py: Not Installed
         gitdb: 4.0.5
     gitpython: 3.1.14
        Jinja2: 2.11.3
       libgit2: Not Installed
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 1.0.0
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     pycparser: Not Installed
      pycrypto: Not Installed
  pycryptodome: 3.9.7
        pygit2: Not Installed
        Python: 3.9.2 (default, Feb 28 2021, 17:03:44)
  python-gnupg: Not Installed
        PyYAML: 5.3.1
         PyZMQ: 20.0.0
         smmap: 4.0.0
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.3.4
 
System Versions:
          dist: debian 11 bullseye
        locale: utf-8
       machine: x86_64
       release: 5.10.0-16-amd64
        system: Linux
       version: Debian GNU/Linux 11 bullseye

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

welcome · 2022-09-02T14:09:21Z

Hi there! Welcome to the Salt Community! Thank you for making your first contribution. We have a lengthy process for issues and PRs. Someone from the Core Team will follow up as soon as possible. In the meantime, here’s some information that may help as you continue your Salt journey.
Please be sure to review our Code of Conduct. Also, check out some of our community resources including:

There are lots of ways to get involved in our community. Every month, there are around a dozen opportunities to meet with other contributors and the Salt Core team and collaborate in real time. The best way to keep track is by subscribing to the Salt Community Events Calendar.
If you have additional questions, email us at saltproject@vmware.com. We’re glad you’ve joined our community and look forward to doing awesome things with you!

Rudd-O · 2022-09-23T01:02:34Z

I can confirm the same issue happens to me. It's super easy to replicate, just run an orchestration SLS with parallel=True in three or more states targeting the same machine. Boom, most of them don't complete.

There seems to be a race condition when deploying the salt thin directory and tarball. There are no logs in the target machine indicating problems, nor does the Salt salt-run command offers logs either.

(To start with, the thin directory shouldn't need to be regenerated or redeployed on every execution. This indicates to me that there is a race condition in the way that Salt generates the tarball. Alternatively or complementarily, there must be a file being overwritten in the target machine during execution which is a shared resource and therefore one of the parallel state application processes "wins" and the others die.)

Rudd-O · 2022-09-23T01:22:23Z

Good stuff to report. With no thin dir options in my Saltfile, an orchestration run that targets the same machine across different states, doesn't work (only one of the runs "wins" and actually executes something). Watching the temporary directory very clearly indicates that Salt has attempted to deploy the thin dir multiple times.

With a fixed thin dir, same result.

Even with rand_thin_dir in the Saltfile, same result. At no point does Salt attempt to select a different thin dir based on different parallel salt-ssh runs (which I can see on my process list!) started by salt-run.

808brinks added Bug broken, incorrect, or confusing behavior needs-triage labels Sep 2, 2022

OrangeDog added the Salt-SSH label Sep 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Parallel salt.state with ssh fails if same minion is targeted #62612

[BUG] Parallel salt.state with ssh fails if same minion is targeted #62612

808brinks commented Sep 2, 2022

welcome bot commented Sep 2, 2022

Rudd-O commented Sep 23, 2022 •

edited

Rudd-O commented Sep 23, 2022 •

edited

[BUG] Parallel salt.state with ssh fails if same minion is targeted #62612

[BUG] Parallel salt.state with ssh fails if same minion is targeted #62612

Comments

808brinks commented Sep 2, 2022

welcome bot commented Sep 2, 2022

Rudd-O commented Sep 23, 2022 • edited

Rudd-O commented Sep 23, 2022 • edited

Rudd-O commented Sep 23, 2022 •

edited

Rudd-O commented Sep 23, 2022 •

edited