Salt-master does not process batch calls correctly after receiving too much data #55184

keagan-thinkst · 2019-11-01T13:46:59Z

Description of Issue

Whenever I run a salt module function against my minions that returns too much data, I get the following error in the salt master's logs:

[salt.transport.ipc:195 ][ERROR ][28934] Exception occurred while handling stream: 11844764 exceeds max_str_len(1048576)

Once this error has been seen, then any call from the master to a batch of minions fails on a random set of minions that seem to not reply back to the master. I test it with a simple test.ping with the -ldebug flag set.

Each time I run test.ping against * a different set of minions fail.

From the masters side the debug lines state that the master is not receiving any info from the minion, it retries the query once, then after that fails again it times out and then the call on the master finishes with output stating that the call failed against a random set of minions.

When I look into the salt logs of any of the minions that failed, I see that the minion is receiving all the correct data from the master, processing the request for the ping correctly and then connecting to the master with its response to the call. The master seemingly does not receive this communication from the minion and therefore the master retries. Again in the minions logs I see it receiving the retry attempt from the master, the minion processes the request and sends it onto the master successfully. But the master does not receive or process that data correctly and therefore the master sees the call as failing against the minion.
So to sum up, the minions are performing as expected but the master is not processing the info sent to it from a random set of minions correctly.

The master will continue to fail batch salt calls like this until I restart the salt master service. Then it runs again correctly until a module call returns too much data such as the log line posted above.

Steps to Reproduce Issue

Run a module function that causes the minion to return too much data for the master, ie causing the following error in the master logs:
[salt.transport.ipc:195 ][ERROR ][28934] Exception occurred while handling stream: 11844764 exceeds max_str_len(1048576)

Then performing any salt call from master against a large batch such as test.ping against *.

Versions Report

Master:

           Salt: 2019.2.0

Dependency Versions:
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: 2.8.0
      docker-py: Not Installed
          gitdb: 2.0.5
      gitpython: 2.1.14
          ioflo: Not Installed
         Jinja2: 2.10.1
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.6.1
   mysql-python: 1.3.10
      pycparser: Not Installed
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 2.7.15+ (default, Jul  9 2019, 16:51:35)
   python-gnupg: 0.4.1
         PyYAML: 5.1.2
          PyZMQ: 18.1.0
           RAET: Not Installed
          smmap: 2.0.5
        timelib: Not Installed
        Tornado: 5.1.1
            ZMQ: 4.3.2

System Versions:
           dist: Ubuntu 18.04 bionic
         locale: UTF-8
        machine: x86_64
        release: 4.15.0-1032-aws
         system: Linux
        version: Ubuntu 18.04 bionic

Minions are the same except do not have the following installed:

dateutil
gitdb
gitpython
mysql-python

The text was updated successfully, but these errors were encountered:

frogunder · 2019-11-01T20:34:49Z

@keagan-thinkst Thank you for reporting this issue.

Seems to be a duplicate to #55180

arizvisa · 2019-11-05T08:22:53Z

This bug is actually a duplicate of #54884 and is related to some outrageous defaults set by the version of msgpack being used. This bug is heavily discussed in #53230, so there's more info there too.

@keagan-thinkst, try fiddling around with the msgpack version as per these other issues. The msgpack module had some absurd constraints imposed in order to attempt to prevent denial-of-service attacks.

(Not trying to stop on your toes or anything, frogunder)

keagan-thinkst · 2019-11-07T13:56:59Z

Hi @arizvisa

Thanks for your suggestion, however this does not seem to fix my problem. I pinned the msgpack version on my salt stack to 0.5.6 and still get the same behaviour where the salt master does not process the batch calls correctly after I run my module function. The module function returns a large dictionary back from the minions.
My current fix/work around has been to change the module function to only return very small dictionaries rather. Though am still very weary of the uncertainty in this case as I can reproduce this failing master state whenever a module function returns large amounts of information.

arizvisa · 2019-11-07T16:33:25Z

Ok then.. Well in that case, the only other option is literally to modify the calls to msgpack.Unpacker in salt.transport.* to add the max_buffer_size parameter set to a large value.

keagan-thinkst · 2019-11-08T09:41:07Z

Thank you very much @arizvisa !!!

I set max_buffer_size=2147483647 in all calls to msgpack.Unpacker in salt.transport.* and I no longer get this issue!

I also set my msgpack version back to 0.6.1 which is the suggested version for my salt deploy and the changes you suggested still fix my issue that I was seeing.

Is this possibly going to be a configurable value to set in an upcoming version of salt so that the fix for the problem is easier to do than hardcoding/patching salt files?

arizvisa · 2019-11-08T18:25:10Z

It's up to the maintainers really. Maybe @frogunder would know better. I have a number of PRs that are waiting to get merged by their team, so I'm hesitant to create fixes for these other bugs until they get around to merging those.

stale · 2020-01-07T14:05:45Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

arizvisa · 2020-01-07T16:35:12Z

As prior mentioned, this issue is a duplicate of #54884 and should be closed as a result.

stale · 2020-01-07T16:35:15Z

Thank you for updating this issue. It is no longer marked as stale.

stale · 2020-02-06T16:39:04Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

salt 3000 introduced a dependency on Python msgpack >= 0.6. In msgpack 0.6 their developers changed the max length of a string that could be unpacked from 2GiB to 1MiB. They then proceeded to change it to 100MiB in msgpack 1.0. Now when using msgpack >=0.6 and < 1.0, any message that is unpacked larger than 1MiB will throw an error. For example a file state which tries to send a file larger than 1MiB will get the following error: Unable to manage file: 1048688 exceeds max_str_len(1048576) This error could send the master into a tailspin where subsequent file operations take a long time or just time out. This patch sets the keyword argument max_buffer_size for the Unpacker to 100MiB which is the msgpack 1.0 default now. It will allow people to use msgpack versions >.6 and < 1.0 without this crazy low limit. It should be discussed if we go back to the 2GiB size from < 0.6 days (my vote), or just anything larger than 100MB. This code does a version check, and will only set the keyword on msgpack > = 0.6 and < 1.0. The test checks that we have a buffer size of at least 100MiB in case the Python msgpack devs decide to lower this value again, and break things. We want enforce a decent minimum. fixes saltstack#53230, fixes saltstack#54884, fixes saltstack#55180, fixes saltstack#55184, fixes saltstack#57026, fixes saltstack#57921

salt 3000 introduced a dependency on Python msgpack >= 0.6. In msgpack 0.6 their developers changed the max length of a string that could be unpacked from 2GiB to 1MiB. They then proceeded to change it to 100MiB in msgpack 1.0. Now when using msgpack >=0.6 and < 1.0, any message that is unpacked larger than 1MiB will throw an error. For example a file state which tries to send a file larger than 1MiB will get the following error: Unable to manage file: 1048688 exceeds max_str_len(1048576) This error could send the master into a tailspin where subsequent file operations take a long time or just time out. This patch sets the keyword argument max_buffer_size for the Unpacker to 100MiB which is the msgpack 1.0 default now. It will allow people to use msgpack versions >.6 and < 1.0 without this crazy low limit. It should be discussed if we go back to the 2GiB size from < 0.6 days (my vote), or just anything larger than 100MB. This code does a version check, and will only set the keyword on msgpack > = 0.6 and < 1.0. The test checks that we have a buffer size of at least 100MiB in case the Python msgpack devs decide to lower this value again, and break things. We want enforce a decent minimum. fixes #53230, fixes #54884, fixes #55180, fixes #55184, fixes #57026, fixes #57921

frogunder added this to the Blocked milestone Nov 1, 2019

frogunder added Duplicate Duplicate of another issue or PR - will be closed Pending-Discussion The issue or pull request needs more discussion before it can be closed or merged labels Nov 1, 2019

stale bot added the stale label Jan 7, 2020

stale bot removed the stale label Jan 7, 2020

stale bot added the stale label Feb 6, 2020

sagetherage closed this as completed Feb 10, 2020

bcstechdept mentioned this issue Jul 28, 2020

Timeouts trying to refresh minions list latenighttales/alcali#235

Closed

psyer mentioned this issue Aug 26, 2020

fix msgpack max_buffer_size being to low in ver >= 0.6 and < 1.0 #58283

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Salt-master does not process batch calls correctly after receiving too much data #55184

Salt-master does not process batch calls correctly after receiving too much data #55184

keagan-thinkst commented Nov 1, 2019

frogunder commented Nov 1, 2019 •

edited

Loading

arizvisa commented Nov 5, 2019

keagan-thinkst commented Nov 7, 2019

arizvisa commented Nov 7, 2019

keagan-thinkst commented Nov 8, 2019

arizvisa commented Nov 8, 2019

stale bot commented Jan 7, 2020

arizvisa commented Jan 7, 2020

stale bot commented Jan 7, 2020

stale bot commented Feb 6, 2020

Salt-master does not process batch calls correctly after receiving too much data #55184

Salt-master does not process batch calls correctly after receiving too much data #55184

Comments

keagan-thinkst commented Nov 1, 2019

Description of Issue

Steps to Reproduce Issue

Versions Report

frogunder commented Nov 1, 2019 • edited Loading

arizvisa commented Nov 5, 2019

keagan-thinkst commented Nov 7, 2019

arizvisa commented Nov 7, 2019

keagan-thinkst commented Nov 8, 2019

arizvisa commented Nov 8, 2019

stale bot commented Jan 7, 2020

arizvisa commented Jan 7, 2020

stale bot commented Jan 7, 2020

stale bot commented Feb 6, 2020

frogunder commented Nov 1, 2019 •

edited

Loading