Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] [v3000] [vRC1 Sodium] [v3001] slsutil.renderer when called through salt-call not exiting after run #57574

Closed
whytewolf opened this issue Jun 5, 2020 · 27 comments · Fixed by #58364
Assignees
Labels
Bug broken, incorrect, or confusing behavior fixed-pls-verify fix is linked, bug author to confirm fix Magnesium Mg release after Na prior to Al salt-call severity-high 2nd top severity, seen by most users, causes major problems ZMQ
Projects
Milestone

Comments

@whytewolf
Copy link
Contributor

Description
Running salt-call slsutil.renderer is causing a freeze after the output.

Setup
install salt and render a file with saltutil.renderer salt-call slsutil.renderer watch it stick and have to hit ctrl+c to get a prompt back.

Steps to Reproduce the behavior
salt-call slsutil.renderer salt://test/init.sls file doesn't need to exist.

Expected behavior
salt-call exists after modules return.

Screenshots
If applicable, add screenshots to help explain your problem.

Versions Report

salt-call test.versions
local:
    Salt Version:
               Salt: 3001rc1

    Dependency Versions:
               cffi: Not Installed
           cherrypy: 8.9.1
           dateutil: 2.7.3
          docker-py: Not Installed
              gitdb: 2.0.6
          gitpython: 3.0.7
             Jinja2: 2.10.1
            libgit2: 0.28.3
           M2Crypto: Not Installed
               Mako: Not Installed
       msgpack-pure: Not Installed
     msgpack-python: 0.6.2
       mysql-python: Not Installed
          pycparser: Not Installed
           pycrypto: 2.6.1
       pycryptodome: 3.6.1
             pygit2: 1.0.3
             Python: 3.8.2 (default, Apr 27 2020, 15:53:34)
       python-gnupg: 0.4.5
             PyYAML: 5.3.1
              PyZMQ: 18.1.1
              smmap: 2.0.5
            timelib: Not Installed
            Tornado: 4.5.3
                ZMQ: 4.3.2

    System Versions:
               dist: ubuntu 20.04 focal
             locale: utf-8
            machine: x86_64
            release: 5.4.0-29-generic
             system: Linux
            version: Ubuntu 20.04 focal
@whytewolf whytewolf added Bug broken, incorrect, or confusing behavior severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around ZRelease-Sodium retired label labels Jun 5, 2020
@b-a-t
Copy link

b-a-t commented Jun 5, 2020

As we are heavily using slsutil.renderer for syntax validation in the Git hooks for sls that looks pretty bad to me.

@frogunder frogunder added this to the Approved milestone Jun 8, 2020
@frogunder
Copy link
Contributor

@whytewolf Thanks for reporting this issue.

@OrangeDog
Copy link
Collaborator

Sounds like the same thing as #57456

@sagetherage sagetherage added the v3001.1 vulnerable version label Jun 10, 2020
@sagetherage sagetherage added severity-high 2nd top severity, seen by most users, causes major problems and removed severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around labels Jun 10, 2020
@sagetherage
Copy link
Contributor

changing the severity to high - subjective definitions but my gut tells me this is high vs medium

@sagetherage sagetherage removed the ZRelease-Sodium retired label label Jun 12, 2020
@piterpunk
Copy link
Collaborator

I guess this bug was silently fixed by other changes.

I did tried to reproduce using 2020-06-09 git with an non-existent file:

root@marvin:~# salt-call slsutil.renderer salt://test/init.sls
[ERROR   ] Unable to fetch file salt://test/init.sls from saltenv base.
[ERROR   ] Template was specified incorrectly: False
local:
    ----------

And with an existent one:

root@marvin:~# salt-call slsutil.renderer salt://lvmtst/init.sls
local:
    ----------
    lv_opt:
        ----------
        lvm.lv_present:
            |_
              ----------
              name:
                  lala
            |_
              ----------
              vgname:
                  marvinvg01
            |_
              ----------
              size:
                  512

salt --versions-report

Salt Version:
           Salt: 3001
 
Dependency Versions:
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: Not Installed
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
         Jinja2: 2.11.2
        libgit2: Not Installed
       M2Crypto: 0.35.2
           Mako: 1.1.3
   msgpack-pure: Not Installed
 msgpack-python: 0.5.6
   mysql-python: Not Installed
      pycparser: Not Installed
       pycrypto: 2.6.1
   pycryptodome: 3.9.7
         pygit2: Not Installed
         Python: 3.8.3 (default, May 15 2020, 05:51:00)
   python-gnupg: Not Installed
         PyYAML: 3.13
          PyZMQ: 18.1.1
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.5.3
            ZMQ: 4.3.2
 
System Versions:
           dist: slackware 14.2 current
         locale: utf-8
        machine: i686
        release: 5.4.45
         system: Linux
        version: Slackware 14.2 current

@OrangeDog
Copy link
Collaborator

OrangeDog commented Jun 16, 2020

@piterpunk or it's not reproducible on Slackware. Do you see the problem using the RC instead?

@piterpunk
Copy link
Collaborator

@piterpunk or it's not reproducible on Slackware. Do you see the problem using the RC instead?

The reproduction issue doesn't seem to be Slackware related. Just created a CentOS 8 machine at Linode and I have the same results, with no hangs, with the code from git:

# salt-call slsutil.renderer salt://test/non-existent-file.sls
[ERROR   ] Unable to fetch file salt://test/non-existent-file.sls from saltenv base.
[ERROR   ] Template was specified incorrectly: False
local:
    ----------
# salt-call slsutil.renderer salt://othertst/init.sls
local:
    ----------
    sshd:
        service.running

salt --versions-report

Salt Version:
           Salt: 3001rc1-70-gb95213ec90
 
Dependency Versions:
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: Not Installed
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
         Jinja2: 2.11.2
        libgit2: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.6.2
   mysql-python: Not Installed
      pycparser: Not Installed
       pycrypto: 2.6.1
   pycryptodome: 3.9.7
         pygit2: Not Installed
         Python: 3.8.0 (default, May  7 2020, 02:49:39)
   python-gnupg: Not Installed
         PyYAML: 5.3.1
          PyZMQ: 19.0.1
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.5.3
            ZMQ: 4.3.2
 
System Versions:
           dist: centos 8 Core
         locale: utf-8
        machine: x86_64
        release: 4.18.0-147.8.1.el8_1.x86_64
         system: Linux
        version: CentOS Linux 8 Core

@OrangeDog
Copy link
Collaborator

@piterpunk is that the RC release or a later version?
The people who have reported it have all been using Ubuntu 20.04.

@piterpunk
Copy link
Collaborator

@piterpunk is that the RC release or a later version?
The people who have reported it have all been using Ubuntu 20.04.

It's a later version. I usually check the bug against the last version to see if it's already fixed and not start to write code for nothing.

I see now that I have to understand better the development dynamics here.

Should I try the current code on Ubuntu 20.04 to see if it this issue is gone there too and, if solved, bisect to find the commit which solved the problem?

@OrangeDog
Copy link
Collaborator

@piterpunk or it's not reproducible on Slackware. Do you see the problem using the RC instead?

You should check this. First try the RC code and confirm the issue is there. Then if the current code on the same system doesn't have the issue then it is fixed.

@piterpunk
Copy link
Collaborator

Tested 3001rc1 on Slackware machine and the issue was present there:

[TRACE   ] data = {'local': OrderedDict([('lv_opt', OrderedDict([('lvm.lv_present', [OrderedDict([('name', 'lala')]), OrderedDict([('vgname', 'marvinvg01')]), OrderedDict([('size', 512)])])])), ('pv_fail', OrderedDict([('lvm.pv_present', [OrderedDict([('name', '/dev/vdc')])])]))])}
local:
    ----------
    lv_opt:
        ----------
        lvm.lv_present:
            |_
              ----------
              name:
                  lala
            |_
              ----------
              vgname:
                  marvinvg01
            |_
              ----------
              size:
                  512
    pv_fail:
        ----------
        lvm.pv_present:
            |_
              ----------
              name:
                  /dev/vdc
[DEBUG   ] Closing AsyncZeroMQReqChannel instance

The execution waits forever in this last line, as described by OP.

salt --versions-report

Salt Version:
           Salt: 3001
 
Dependency Versions:
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: Not Installed
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
         Jinja2: 2.11.2
        libgit2: Not Installed
       M2Crypto: 0.35.2
           Mako: 1.1.3
   msgpack-pure: Not Installed
 msgpack-python: 0.5.6
   mysql-python: Not Installed
      pycparser: Not Installed
       pycrypto: 2.6.1
   pycryptodome: 3.9.7
         pygit2: Not Installed
         Python: 3.8.3 (default, May 15 2020, 02:05:39)
   python-gnupg: Not Installed
         PyYAML: 3.13
          PyZMQ: 19.0.1
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.5.3
            ZMQ: 4.3.2
 
System Versions:
           dist: slackware 14.2 current
         locale: utf-8
        machine: x86_64
        release: 5.4.43
         system: Linux
        version: Slackware 14.2 current

@sagetherage sagetherage changed the title [BUG] [RC1 Sodium] slsutil.renderer when called through salt-call not exiting after run [BUG] [v3000] [vRC1 Sodium] [v3001] slsutil.renderer when called through salt-call not exiting after run Jun 19, 2020
@sagetherage
Copy link
Contributor

trying something on this issue with the title, versions bug is reported in the brackets. We may all hate it so only trying it on this one issue ATM this and the referenced issue point to more issues and we have the Open Core Team looking at what we can fix in the point release 3001.1 and maybe there is more to fix Magnesium - likely.

@OrangeDog
Copy link
Collaborator

@sagetherage usually that's what the labels are for, and you use the milestone to show the planned fix release

@sagetherage
Copy link
Contributor

yes, we abuse label, though

@sagetherage
Copy link
Contributor

and right now I can't give community members the ability to apply labels, working on it.

@waynew waynew moved this from To do to Blocked in 3001.1 Bugfix release Jun 20, 2020
@CostelLupoaie
Copy link

CostelLupoaie commented Jun 22, 2020

Also tested it on Ubuntu 20.04:

Salt Version:
           Salt: 3001

Dependency Versions:
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: 2.7.3
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
         Jinja2: 2.10.1
        libgit2: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.6.2
   mysql-python: Not Installed
      pycparser: Not Installed
       pycrypto: Not Installed
   pycryptodome: 3.6.1
         pygit2: Not Installed
         Python: 3.8.2 (default, Apr 27 2020, 15:53:34)
   python-gnupg: 0.4.5
         PyYAML: 5.3.1
          PyZMQ: 18.1.1
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.5.3
            ZMQ: 4.3.2

System Versions:
           dist: ubuntu 20.04 focal
         locale: utf-8
        machine: x86_64
        release: 5.4.0-1015-aws
         system: Linux
        version: Ubuntu 20.04 focal 

In our case seems to be failing when we have more then file in definitions. For instance a simple sls with a single file:

local:
----------
          ID: vim_rc_file
    Function: file.managed
        Name: /root/.vimrc
      Result: True
     Comment: File /root/.vimrc is in the correct state
     Started: 13:17:07.210639
    Duration: 21.916 ms
     Changes:

Summary for local
------------
Succeeded: 1
Failed:    0
------------
Total states run:     1
Total run time:  21.916 ms

was ok. However:

local:
----------
          ID: syslog
    Function: pkg.latest
        Name: rsyslog
      Result: True
     Comment: Package rsyslog is already up-to-date
     Started: 13:17:42.593908
    Duration: 1531.023 ms
     Changes:
----------
          ID: authlogs_to_logserver_config
    Function: file.managed
        Name: /etc/rsyslog.d/50-authlog.conf
      Result: True
     Comment: File /etc/rsyslog.d/50-authlog.conf is in the correct state
     Started: 13:17:44.129623
    Duration: 17.261 ms
     Changes:
----------
          ID: syslogs_to_logserver_config
    Function: file.managed
        Name: /etc/rsyslog.d/50-syslogs.conf
      Result: True
     Comment: File /etc/rsyslog.d/50-syslogs.conf is in the correct state
     Started: 13:17:44.147013
    Duration: 8.082 ms
     Changes:
----------
          ID: syslog
    Function: service.running
        Name: rsyslog
      Result: True
     Comment: The service rsyslog is already running
     Started: 13:17:44.155312
    Duration: 31.784 ms
     Changes:

Summary for local
------------
Succeeded: 4
Failed:    0
------------
Total states run:     4
Total run time:   1.588 s
^C

did not exit (thus the Ctrl+C).

In our tests it fails to exit for states having service.running, cmd.wait, archive.extracted, so maybe it has something to do with the cleanup.

@krionbsd
Copy link
Contributor

Yeah, it seems every salt-call invocation on FreeBSD 13 and Salt 3001 has this problem

@piterpunk
Copy link
Collaborator

@CostelLupoaie pointed a bug applying states with more than one file.
@krionbsd all salt-call invocations on FreeBSD 13 and Salt 3001

The guess is that they are all related to the original slsutil.renderer bug?

@piterpunk
Copy link
Collaborator

When I was working on #57669, I had the same issue of the neverending "salt-call". There it was related to modules/disks.py and the optional loading.

It was solved with commit 8018b2a maybe it's something similar happening here.

@xcorvis
Copy link

xcorvis commented Jun 22, 2020

Can confirm this happens in some of my states in 3001 on Ubuntu 20.04. The same states running on a Ubuntu 16.04 system don't hang (at least, not as far as I've seen). This is in my bento test environment so it's pretty vanilla. I can provide more details if you think it would be helpful.

@waynew
Copy link
Contributor

waynew commented Jun 22, 2020

@xcorvis the more of an MCVE, the better. I haven't been able to repro personally - either on 20.04, or FreeBSD 12.1. So I'm sure that there is some essential difference between my setup and everyone else seeing this problem.

@xcorvis
Copy link

xcorvis commented Jun 22, 2020

@waynew Sure. This state proved reproducible:

nginx:
  pkg.installed

On the master, (salt '*' state.sls teststate) it executed normally. On minion-xenial (salt-call state.sls teststate) this worked fine, no issues. On minion-focal (same command) this executed but hung on the "Closing AsyncZeroMQReqChannel instance" line. It also hung with test=true, and whether or not nginx was already installed.

The only odd settings on my master server might be top_file_merging_strategy: same and default_top: base, otherwise it's a pretty vanilla setup. Minions had no special config. These were fresh VMs made from the most recent bento virtualbox images. I used vagrant with salt-boostrap and installed ifupdown and virtualbox guest additions.

master salt -V:

Salt Version:
           Salt: 3001
 
Dependency Versions:
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: 2.7.3
      docker-py: Not Installed
          gitdb: 2.0.6
      gitpython: 3.0.7
         Jinja2: 2.10.1
        libgit2: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.6.2
   mysql-python: Not Installed
      pycparser: Not Installed
       pycrypto: Not Installed
   pycryptodome: 3.6.1
         pygit2: Not Installed
         Python: 3.8.2 (default, Apr 27 2020, 15:53:34)
   python-gnupg: 0.4.5
         PyYAML: 5.3.1
          PyZMQ: 18.1.1
          smmap: 2.0.5
        timelib: Not Installed
        Tornado: 4.5.3
            ZMQ: 4.3.2
 
System Versions:
           dist: ubuntu 20.04 focal
         locale: utf-8
        machine: x86_64
        release: 5.4.0-31-generic
         system: Linux
        version: Ubuntu 20.04 focal

minion-xenial salt-call -V:

Salt Version:
           Salt: 3001
 
Dependency Versions:
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: 2.4.2
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
         Jinja2: 2.8
        libgit2: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.6.2
   mysql-python: Not Installed
      pycparser: Not Installed
       pycrypto: Not Installed
   pycryptodome: 3.4.7
         pygit2: Not Installed
         Python: 3.5.2 (default, Apr 16 2020, 17:47:17)
   python-gnupg: 0.3.8
         PyYAML: 3.11
          PyZMQ: 17.1.2
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.5.3
            ZMQ: 4.1.4
 
System Versions:
           dist: ubuntu 16.04 Xenial Xerus
         locale: UTF-8
        machine: x86_64
        release: 4.4.0-179-generic
         system: Linux
        version: Ubuntu 16.04 Xenial Xerus 

minion-focal salt-call -V:

Salt Version:
           Salt: 3001
 
Dependency Versions:
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: 2.7.3
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
         Jinja2: 2.10.1
        libgit2: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.6.2
   mysql-python: Not Installed
      pycparser: Not Installed
       pycrypto: Not Installed
   pycryptodome: 3.6.1
         pygit2: Not Installed
         Python: 3.8.2 (default, Apr 27 2020, 15:53:34)
   python-gnupg: 0.4.5
         PyYAML: 5.3.1
          PyZMQ: 18.1.1
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.5.3
            ZMQ: 4.3.2
 
System Versions:
           dist: ubuntu 20.04 focal
         locale: utf-8
        machine: x86_64
        release: 5.4.0-31-generic
         system: Linux
        version: Ubuntu 20.04 focal

@rmarchei
Copy link
Contributor

I'm experiencing the same issue, with a pretty similar setup, master is Ubuntu 18.04, minion is 20.04.

@waynew
Copy link
Contributor

waynew commented Jul 8, 2020

Definitely something screwy going on here.

Thanks to @xcorvis I was finally able to repro this:

Running salt-call with strace -y, we get this:

poll([{fd=10<anon_inode:[eventfd]>, events=POLLIN}], 1, -1

That's where it hangs indefinitely. From the poll manpage we can see:

poll(struct pollfd fds[], nfds_t nfds, int timeout);
...
If the value of
     timeout is -1, the poll blocks indefinitely.

So whatever we're polling for here, we're doing it for-ev-errrr. I'll post updates as I find them.

@sagetherage sagetherage removed this from Blocked in 3001.1 Bugfix release Jul 16, 2020
@sagetherage sagetherage removed the v3001.1 vulnerable version label Jul 16, 2020
@sagetherage sagetherage assigned cmcmarrow and unassigned waynew Jul 16, 2020
@sagetherage sagetherage added the Magnesium Mg release after Na prior to Al label Jul 16, 2020
@sagetherage
Copy link
Contributor

@whytewolf we didn't get this into the point release as we don't have the fix yet, so moving to Magnesium.

@sagetherage sagetherage added this to Commit in Magnesium Jul 16, 2020
@max-arnold
Copy link
Contributor

Does transport: tcp help? The issue seems to be related to ZMQ: #57456 (comment)

@sagetherage sagetherage modified the milestones: Approved, Magnesium Jul 29, 2020
@cmcmarrow cmcmarrow mentioned this issue Sep 2, 2020
3 tasks
@cmcmarrow
Copy link
Contributor

@whytewolf I believe I have the fix for your hang. It works for my environment. I would appreciate it if you tested it in your environment to make sure it works for you so you don't need to wait for another release.

@sagetherage sagetherage linked a pull request Sep 25, 2020 that will close this issue
3 tasks
@sagetherage sagetherage moved this from Commit to In progress in Magnesium Sep 25, 2020
@sagetherage sagetherage added the fixed-pls-verify fix is linked, bug author to confirm fix label Sep 25, 2020
Magnesium automation moved this from In progress to Done Sep 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior fixed-pls-verify fix is linked, bug author to confirm fix Magnesium Mg release after Na prior to Al salt-call severity-high 2nd top severity, seen by most users, causes major problems ZMQ
Projects
No open projects
Magnesium
  
Done
Development

Successfully merging a pull request may close this issue.