Salt-master crippled after receiving too much output from salt command. #55180

H20-17 · 2019-10-31T23:34:40Z

Description of Issue

I have a salt-master (v 2019.2.2 py3) and a salt minion (v 2019.2.2 py3). The salt minion is a Windows salt-minion and I have no idea if that makes any difference. If the salt-master receives too much output from the minion it cripples the salt master and the salt master does not auto-recover. Instead it has to be restarted.

Setup

No special setup other than as described in the Description

Steps to Reproduce Issue

Run the following command

/usr/local/bin/salt -t 100 windows_minion cmd.powershell_all ps

That command will translate to

ps | convertto-json

on the minion, which produces thousands of lines of output (in my case approx 35000 lines of output).

I don't have any opinion on whether or not the master should be able to handle this volume of output. EIther it should or it shouldn't, but the master shouldn't get done in by this. If the master is receiving too much output it should just cut the minion off and handle this gracefully.
What I observe is that the master just hangs or times out (I'm not sure which) and it doesn't print the job response (although jobs.lookup_jid does show you something).

That in itself isn't a big issue, but the thing is is that the master never recovers. The salt command on the master does not work after this and you have to restart the master.

Versions Report

Salt Version:
           Salt: 2019.2.2

Dependency Versions:
           cffi: 1.11.5
       cherrypy: Not Installed
       dateutil: 2.8.0
      docker-py: Not Installed
          gitdb: 2.0.3
      gitpython: 2.1.11
          ioflo: Not Installed
         Jinja2: 2.10.1
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.6.1
   mysql-python: Not Installed
      pycparser: 2.14
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 3.7.4 (default, Jul  9 2019, 16:32:37)
   python-gnupg: Not Installed
         PyYAML: 3.13
          PyZMQ: 17.0.0
           RAET: Not Installed
          smmap: 2.0.3
        timelib: Not Installed
        Tornado: 4.5.3
            ZMQ: 4.3.2

System Versions:
           dist: fedora 30 Thirty
         locale: UTF-8
        machine: x86_64
        release: 5.3.7-200.fc30.x86_64
         system: Linux
        version: Fedora 30 Thirty

The text was updated successfully, but these errors were encountered:

frogunder · 2019-11-01T16:13:08Z

@H20-17 Thank you for reporting this issue.

I am however not able to replicate it.

For me running that command, it produces about 32000 lines of output and takes about 15 seconds to run. The master is still alive and can continue to run commands.
I have a Fedora 30 py3 master and Win 2019 py3 minion.

H20-17 · 2019-11-01T17:19:01Z

I tried this again today and I got the same behaviour I got when I reported this. I also get the following error in the salt master log

Nov 01 13:05:48 OURSERGORSALT.MYORG.ORG salt-master[21849]: Total run time:  84.985 s[ERROR   ] Exception occurred while handling stream: 1563469 exceeds max_bin_len(1048576)

It looks like some problem is occuring that isn't getting handled properly. I didn't succeed in getting it to compromise the functioning of the salt master again today, but it happened yesterday.

H20-17 · 2019-11-01T18:59:51Z

It still seems to trigger the condition where the salt-master is crippled. It's not definite though since I can't get 100% reliability. After this happens once, about 1/5 salt commands timeout. Probably repeating it will increase the frequency.

frogunder · 2019-11-01T20:06:12Z

@saltstack/team-core @saltstack/team-windows

Can any of you take a look at this? Thanks.

arizvisa · 2019-11-05T08:27:42Z

@H20-17, can you check to see if the solutions/discussions in #54884, or #53230 solve your problem? Some versions of msgpack impose outrageous defaults to prevent denial of service attacks and result in that particular exception being thrown. Pinning your msgpack to a different version will likely solve that exception that you're seeing.

The thing that really sucks is that msgpack literally switched the kwargs used for this between minor versions, so it's not too easy for saltstack to support every single version of msgpack out there.

Maybe warnings should be thrown when using weird/untested versions of a module or we should bump the msgpack version in requirements/base.txt?

H20-17 · 2019-11-05T18:24:36Z

I'm sure it is msgpack and I'm not blaming the salt people for whatever is going on but something like a DOS occurs right now and I don't understand why the salt master software isn't dealing with it anyway. (However I don't know any of the underlying details if I'm asking too much). I'm OK with this being closed now since it seems to be the focus of several other open issues.

xeacott · 2019-11-05T19:15:39Z

I just set this environment up and my master is having no problem. I ran the command back-to-back and it seems fine. Is it possible this is a system issue or a network issue outside of salt? I mean, just thinking kind of outside the box but I piped the output to a file and it came back as 4.07mb. That doesn't seem to be anything the master would trip over....

stale · 2020-01-07T15:05:30Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

arizvisa · 2020-01-07T16:55:12Z

This should be closed as a msgpack issue...but @xeacott. Generally to test these msgpack-related issues we have to keep in mind that msgpack gets compressed and so to properly test these boundaries we'd need the state data to contain non-compressible data in order to exercise this particular code path.

The best way to generate noncompressible data is with white noise or output from a block cipher (which is nearly random as well). If you're in a posix environment, you can feed the /dev/random block device through tr -cd '[:alpha:]' to filter in yaml-compatible parameters.

i.e. to generate a 1.5mb (mostly) non-compressible kwarg to output into a state so it will get compressed by msgpack:

$ dd if=/dev/random | tr -cd '[:alpha:]' | dd bs=1024 count=1536

stale · 2020-01-07T16:55:15Z

Thank you for updating this issue. It is no longer marked as stale.

stale · 2020-02-06T17:39:05Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

salt 3000 introduced a dependency on Python msgpack >= 0.6. In msgpack 0.6 their developers changed the max length of a string that could be unpacked from 2GiB to 1MiB. They then proceeded to change it to 100MiB in msgpack 1.0. Now when using msgpack >=0.6 and < 1.0, any message that is unpacked larger than 1MiB will throw an error. For example a file state which tries to send a file larger than 1MiB will get the following error: Unable to manage file: 1048688 exceeds max_str_len(1048576) This error could send the master into a tailspin where subsequent file operations take a long time or just time out. This patch sets the keyword argument max_buffer_size for the Unpacker to 100MiB which is the msgpack 1.0 default now. It will allow people to use msgpack versions >.6 and < 1.0 without this crazy low limit. It should be discussed if we go back to the 2GiB size from < 0.6 days (my vote), or just anything larger than 100MB. This code does a version check, and will only set the keyword on msgpack > = 0.6 and < 1.0. The test checks that we have a buffer size of at least 100MiB in case the Python msgpack devs decide to lower this value again, and break things. We want enforce a decent minimum. fixes saltstack#53230, fixes saltstack#54884, fixes saltstack#55180, fixes saltstack#55184, fixes saltstack#57026, fixes saltstack#57921

salt 3000 introduced a dependency on Python msgpack >= 0.6. In msgpack 0.6 their developers changed the max length of a string that could be unpacked from 2GiB to 1MiB. They then proceeded to change it to 100MiB in msgpack 1.0. Now when using msgpack >=0.6 and < 1.0, any message that is unpacked larger than 1MiB will throw an error. For example a file state which tries to send a file larger than 1MiB will get the following error: Unable to manage file: 1048688 exceeds max_str_len(1048576) This error could send the master into a tailspin where subsequent file operations take a long time or just time out. This patch sets the keyword argument max_buffer_size for the Unpacker to 100MiB which is the msgpack 1.0 default now. It will allow people to use msgpack versions >.6 and < 1.0 without this crazy low limit. It should be discussed if we go back to the 2GiB size from < 0.6 days (my vote), or just anything larger than 100MB. This code does a version check, and will only set the keyword on msgpack > = 0.6 and < 1.0. The test checks that we have a buffer size of at least 100MiB in case the Python msgpack devs decide to lower this value again, and break things. We want enforce a decent minimum. fixes #53230, fixes #54884, fixes #55180, fixes #55184, fixes #57026, fixes #57921

frogunder added this to the Blocked milestone Nov 1, 2019

frogunder added the cannot-reproduce cannot be replicated with info/context provided label Nov 1, 2019

frogunder added the Pending-Discussion The issue or pull request needs more discussion before it can be closed or merged label Nov 1, 2019

frogunder mentioned this issue Nov 1, 2019

Salt-master does not process batch calls correctly after receiving too much data #55184

Closed

stale bot added the stale label Jan 7, 2020

stale bot removed the stale label Jan 7, 2020

stale bot added the stale label Feb 6, 2020

sagetherage closed this as completed Feb 10, 2020

bcstechdept mentioned this issue Jul 28, 2020

Timeouts trying to refresh minions list latenighttales/alcali#235

Closed

psyer mentioned this issue Aug 18, 2020

salt.transport.ipc: msgpack.Unpacker exceeding max_str_len(1048576) #54884

Closed

psyer mentioned this issue Aug 26, 2020

fix msgpack max_buffer_size being to low in ver >= 0.6 and < 1.0 #58283

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Salt-master crippled after receiving too much output from salt command. #55180

Salt-master crippled after receiving too much output from salt command. #55180

H20-17 commented Oct 31, 2019 •

edited

Loading

frogunder commented Nov 1, 2019

H20-17 commented Nov 1, 2019 •

edited

Loading

H20-17 commented Nov 1, 2019

frogunder commented Nov 1, 2019

arizvisa commented Nov 5, 2019 •

edited

Loading

H20-17 commented Nov 5, 2019

xeacott commented Nov 5, 2019

stale bot commented Jan 7, 2020

arizvisa commented Jan 7, 2020

stale bot commented Jan 7, 2020

stale bot commented Feb 6, 2020

Salt-master crippled after receiving too much output from salt command. #55180

Salt-master crippled after receiving too much output from salt command. #55180

Comments

H20-17 commented Oct 31, 2019 • edited Loading

Description of Issue

Setup

Steps to Reproduce Issue

Versions Report

frogunder commented Nov 1, 2019

H20-17 commented Nov 1, 2019 • edited Loading

H20-17 commented Nov 1, 2019

frogunder commented Nov 1, 2019

arizvisa commented Nov 5, 2019 • edited Loading

H20-17 commented Nov 5, 2019

xeacott commented Nov 5, 2019

stale bot commented Jan 7, 2020

arizvisa commented Jan 7, 2020

stale bot commented Jan 7, 2020

stale bot commented Feb 6, 2020

H20-17 commented Oct 31, 2019 •

edited

Loading

H20-17 commented Nov 1, 2019 •

edited

Loading

arizvisa commented Nov 5, 2019 •

edited

Loading