Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Salt-master does not process batch calls correctly after receiving too much data #55184

Closed
keagan-thinkst opened this issue Nov 1, 2019 · 10 comments · Fixed by #58283
Closed
Labels
Duplicate Duplicate of another issue or PR - will be closed Pending-Discussion The issue or pull request needs more discussion before it can be closed or merged stale
Milestone

Comments

@keagan-thinkst
Copy link

Description of Issue

Whenever I run a salt module function against my minions that returns too much data, I get the following error in the salt master's logs:

[salt.transport.ipc:195 ][ERROR ][28934] Exception occurred while handling stream: 11844764 exceeds max_str_len(1048576)

Once this error has been seen, then any call from the master to a batch of minions fails on a random set of minions that seem to not reply back to the master. I test it with a simple test.ping with the -ldebug flag set.

Each time I run test.ping against * a different set of minions fail.

From the masters side the debug lines state that the master is not receiving any info from the minion, it retries the query once, then after that fails again it times out and then the call on the master finishes with output stating that the call failed against a random set of minions.

When I look into the salt logs of any of the minions that failed, I see that the minion is receiving all the correct data from the master, processing the request for the ping correctly and then connecting to the master with its response to the call. The master seemingly does not receive this communication from the minion and therefore the master retries. Again in the minions logs I see it receiving the retry attempt from the master, the minion processes the request and sends it onto the master successfully. But the master does not receive or process that data correctly and therefore the master sees the call as failing against the minion.
So to sum up, the minions are performing as expected but the master is not processing the info sent to it from a random set of minions correctly.

The master will continue to fail batch salt calls like this until I restart the salt master service. Then it runs again correctly until a module call returns too much data such as the log line posted above.

Steps to Reproduce Issue

Run a module function that causes the minion to return too much data for the master, ie causing the following error in the master logs:
[salt.transport.ipc:195 ][ERROR ][28934] Exception occurred while handling stream: 11844764 exceeds max_str_len(1048576)

Then performing any salt call from master against a large batch such as test.ping against *.

Versions Report

Master:

           Salt: 2019.2.0

Dependency Versions:
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: 2.8.0
      docker-py: Not Installed
          gitdb: 2.0.5
      gitpython: 2.1.14
          ioflo: Not Installed
         Jinja2: 2.10.1
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.6.1
   mysql-python: 1.3.10
      pycparser: Not Installed
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 2.7.15+ (default, Jul  9 2019, 16:51:35)
   python-gnupg: 0.4.1
         PyYAML: 5.1.2
          PyZMQ: 18.1.0
           RAET: Not Installed
          smmap: 2.0.5
        timelib: Not Installed
        Tornado: 5.1.1
            ZMQ: 4.3.2

System Versions:
           dist: Ubuntu 18.04 bionic
         locale: UTF-8
        machine: x86_64
        release: 4.15.0-1032-aws
         system: Linux
        version: Ubuntu 18.04 bionic

Minions are the same except do not have the following installed:

dateutil
gitdb
gitpython
mysql-python
@frogunder
Copy link
Contributor

frogunder commented Nov 1, 2019

@keagan-thinkst Thank you for reporting this issue.

Seems to be a duplicate to #55180

@frogunder frogunder added this to the Blocked milestone Nov 1, 2019
@frogunder frogunder added Duplicate Duplicate of another issue or PR - will be closed Pending-Discussion The issue or pull request needs more discussion before it can be closed or merged labels Nov 1, 2019
@arizvisa
Copy link
Contributor

arizvisa commented Nov 5, 2019

This bug is actually a duplicate of #54884 and is related to some outrageous defaults set by the version of msgpack being used. This bug is heavily discussed in #53230, so there's more info there too.

@keagan-thinkst, try fiddling around with the msgpack version as per these other issues. The msgpack module had some absurd constraints imposed in order to attempt to prevent denial-of-service attacks.

(Not trying to stop on your toes or anything, frogunder)

@keagan-thinkst
Copy link
Author

Hi @arizvisa

Thanks for your suggestion, however this does not seem to fix my problem. I pinned the msgpack version on my salt stack to 0.5.6 and still get the same behaviour where the salt master does not process the batch calls correctly after I run my module function. The module function returns a large dictionary back from the minions.
My current fix/work around has been to change the module function to only return very small dictionaries rather. Though am still very weary of the uncertainty in this case as I can reproduce this failing master state whenever a module function returns large amounts of information.

@arizvisa
Copy link
Contributor

arizvisa commented Nov 7, 2019

Ok then.. Well in that case, the only other option is literally to modify the calls to msgpack.Unpacker in salt.transport.* to add the max_buffer_size parameter set to a large value.

@keagan-thinkst
Copy link
Author

Thank you very much @arizvisa !!!

I set max_buffer_size=2147483647 in all calls to msgpack.Unpacker in salt.transport.* and I no longer get this issue!

I also set my msgpack version back to 0.6.1 which is the suggested version for my salt deploy and the changes you suggested still fix my issue that I was seeing.

Is this possibly going to be a configurable value to set in an upcoming version of salt so that the fix for the problem is easier to do than hardcoding/patching salt files?

@arizvisa
Copy link
Contributor

arizvisa commented Nov 8, 2019

It's up to the maintainers really. Maybe @frogunder would know better. I have a number of PRs that are waiting to get merged by their team, so I'm hesitant to create fixes for these other bugs until they get around to merging those.

@stale
Copy link

stale bot commented Jan 7, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

@stale stale bot added the stale label Jan 7, 2020
@arizvisa
Copy link
Contributor

arizvisa commented Jan 7, 2020

As prior mentioned, this issue is a duplicate of #54884 and should be closed as a result.

@stale
Copy link

stale bot commented Jan 7, 2020

Thank you for updating this issue. It is no longer marked as stale.

@stale stale bot removed the stale label Jan 7, 2020
@stale
Copy link

stale bot commented Feb 6, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

@stale stale bot added the stale label Feb 6, 2020
oeuftete pushed a commit to oeuftete/salt that referenced this issue Oct 6, 2020
salt 3000 introduced a dependency on Python msgpack >= 0.6. In msgpack 0.6 their
developers changed the max length of a string that could be unpacked from 2GiB
to 1MiB. They then proceeded to change it to 100MiB in msgpack 1.0. Now when
using msgpack >=0.6 and < 1.0, any message that is unpacked larger than 1MiB
will throw an error. For example a file state which tries to send a file larger
than 1MiB will get the following error:

Unable to manage file: 1048688 exceeds max_str_len(1048576)

This error could send the master into a tailspin where subsequent file
operations take a long time or just time out.

This patch sets the keyword argument max_buffer_size for the Unpacker to 100MiB
which is the msgpack 1.0 default now. It will allow people to use msgpack
versions >.6 and < 1.0 without this crazy low limit. It should be discussed if
we go back to the 2GiB size from < 0.6 days (my vote), or just anything larger
than 100MB. This code does a version check, and will only set the keyword on
msgpack > = 0.6 and < 1.0.

The test checks that we have a buffer size of at least 100MiB in case the Python
msgpack devs decide to lower this value again, and break things. We want enforce
a decent minimum.

fixes saltstack#53230, fixes saltstack#54884,
fixes saltstack#55180, fixes saltstack#55184,
fixes saltstack#57026, fixes saltstack#57921
twangboy pushed a commit to psyer/salt that referenced this issue Oct 6, 2020
salt 3000 introduced a dependency on Python msgpack >= 0.6. In msgpack 0.6 their
developers changed the max length of a string that could be unpacked from 2GiB
to 1MiB. They then proceeded to change it to 100MiB in msgpack 1.0. Now when
using msgpack >=0.6 and < 1.0, any message that is unpacked larger than 1MiB
will throw an error. For example a file state which tries to send a file larger
than 1MiB will get the following error:

Unable to manage file: 1048688 exceeds max_str_len(1048576)

This error could send the master into a tailspin where subsequent file
operations take a long time or just time out.

This patch sets the keyword argument max_buffer_size for the Unpacker to 100MiB
which is the msgpack 1.0 default now. It will allow people to use msgpack
versions >.6 and < 1.0 without this crazy low limit. It should be discussed if
we go back to the 2GiB size from < 0.6 days (my vote), or just anything larger
than 100MB. This code does a version check, and will only set the keyword on
msgpack > = 0.6 and < 1.0.

The test checks that we have a buffer size of at least 100MiB in case the Python
msgpack devs decide to lower this value again, and break things. We want enforce
a decent minimum.

fixes saltstack#53230, fixes saltstack#54884,
fixes saltstack#55180, fixes saltstack#55184,
fixes saltstack#57026, fixes saltstack#57921
twangboy pushed a commit to psyer/salt that referenced this issue Oct 6, 2020
salt 3000 introduced a dependency on Python msgpack >= 0.6. In msgpack 0.6 their
developers changed the max length of a string that could be unpacked from 2GiB
to 1MiB. They then proceeded to change it to 100MiB in msgpack 1.0. Now when
using msgpack >=0.6 and < 1.0, any message that is unpacked larger than 1MiB
will throw an error. For example a file state which tries to send a file larger
than 1MiB will get the following error:

Unable to manage file: 1048688 exceeds max_str_len(1048576)

This error could send the master into a tailspin where subsequent file
operations take a long time or just time out.

This patch sets the keyword argument max_buffer_size for the Unpacker to 100MiB
which is the msgpack 1.0 default now. It will allow people to use msgpack
versions >.6 and < 1.0 without this crazy low limit. It should be discussed if
we go back to the 2GiB size from < 0.6 days (my vote), or just anything larger
than 100MB. This code does a version check, and will only set the keyword on
msgpack > = 0.6 and < 1.0.

The test checks that we have a buffer size of at least 100MiB in case the Python
msgpack devs decide to lower this value again, and break things. We want enforce
a decent minimum.

fixes saltstack#53230, fixes saltstack#54884,
fixes saltstack#55180, fixes saltstack#55184,
fixes saltstack#57026, fixes saltstack#57921
dwoz pushed a commit that referenced this issue Oct 8, 2020
salt 3000 introduced a dependency on Python msgpack >= 0.6. In msgpack 0.6 their
developers changed the max length of a string that could be unpacked from 2GiB
to 1MiB. They then proceeded to change it to 100MiB in msgpack 1.0. Now when
using msgpack >=0.6 and < 1.0, any message that is unpacked larger than 1MiB
will throw an error. For example a file state which tries to send a file larger
than 1MiB will get the following error:

Unable to manage file: 1048688 exceeds max_str_len(1048576)

This error could send the master into a tailspin where subsequent file
operations take a long time or just time out.

This patch sets the keyword argument max_buffer_size for the Unpacker to 100MiB
which is the msgpack 1.0 default now. It will allow people to use msgpack
versions >.6 and < 1.0 without this crazy low limit. It should be discussed if
we go back to the 2GiB size from < 0.6 days (my vote), or just anything larger
than 100MB. This code does a version check, and will only set the keyword on
msgpack > = 0.6 and < 1.0.

The test checks that we have a buffer size of at least 100MiB in case the Python
msgpack devs decide to lower this value again, and break things. We want enforce
a decent minimum.

fixes #53230, fixes #54884,
fixes #55180, fixes #55184,
fixes #57026, fixes #57921
dwoz pushed a commit that referenced this issue Oct 8, 2020
salt 3000 introduced a dependency on Python msgpack >= 0.6. In msgpack 0.6 their
developers changed the max length of a string that could be unpacked from 2GiB
to 1MiB. They then proceeded to change it to 100MiB in msgpack 1.0. Now when
using msgpack >=0.6 and < 1.0, any message that is unpacked larger than 1MiB
will throw an error. For example a file state which tries to send a file larger
than 1MiB will get the following error:

Unable to manage file: 1048688 exceeds max_str_len(1048576)

This error could send the master into a tailspin where subsequent file
operations take a long time or just time out.

This patch sets the keyword argument max_buffer_size for the Unpacker to 100MiB
which is the msgpack 1.0 default now. It will allow people to use msgpack
versions >.6 and < 1.0 without this crazy low limit. It should be discussed if
we go back to the 2GiB size from < 0.6 days (my vote), or just anything larger
than 100MB. This code does a version check, and will only set the keyword on
msgpack > = 0.6 and < 1.0.

The test checks that we have a buffer size of at least 100MiB in case the Python
msgpack devs decide to lower this value again, and break things. We want enforce
a decent minimum.

fixes #53230, fixes #54884,
fixes #55180, fixes #55184,
fixes #57026, fixes #57921
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Duplicate of another issue or PR - will be closed Pending-Discussion The issue or pull request needs more discussion before it can be closed or merged stale
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants