-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory usage issues with large Apache httpd configuration files #569
Comments
Thanks for the detailed description. I am having trouble reproducing this behavior. I followed your instructions, and with a file where the line is repeated 50k times, memory usage only goes up to ~ 120MB. It also doesn't seem to make a difference whether I use that match or not in terms of memory usage. I tried this with augeas 1.8.0 (the version in stretch, AFAIK), augeas 1.10.0, and the latest from git HEAD with roughly the same results. I did all my experiments on Fedora 28; I'll try and get my hands on Debian Stretch, too. Can you confirm that you see this crazy amount of memory used if you run BTW, the |
I don't. The memory usage balloons out when while running the I'm running augeas version: 1.8.0-1+deb9u1 |
Thanks .. after trying some more (and realizing where I made an early morning mistake) I can reproduce this now. It looks like the amount of memory taken is related to the size of the regexp. I'll look into it. |
I looked into this some more and have a good understanding of what's causing the issue. The basic problem is that the interpreter for path expressions does some very naive things (which are fine when you search over a few hundred nodes, but really hurt when you are dealing with 50k nodes) In particular, the interpreter recompiles the regex every time it needs to check a node (there's no lifting of constant expressions out of loops) and that the interpreter simplifies its memory management by not releasing memory until it's done evaluating a path expression. Those two together lead to the memory blowup. Addressing that will take a bit of time; I have some POC code for lifting constant expressions which brings memory usage down from > 2GB to ~ 180MB. But since this is a fairly intrusive change, it needs more work and testing. One thing I realized is that you can change your expression slightly: instead of I saw the segfault with 80k entries, too. I haven't looked into it, but it looks like it' s caused by an integer overflow in a part of the code that's unrelated to this problem. I'll look at that in more detail once I've got a handle on the memory usage. |
In path expressions, we generally need to evaluate functions against every node that we consider for the result set. For example, in the path expression /files/etc/hosts/*[ipaddr =~ regexp('127\\.')], the regexp function was evaluated against every entry in /etc/hosts. Evaluating that function requires the construction and compilation of a new regexp. Because of how memory is managed during evaluation of path expressions, the memory used by all these copies of the same regexp is only freed after we are done evaluating the path expression. This causes unacceptable memory usage in large files (see hercules-team#569) To avoid these issues, we now distinguish between pure and impure functions in the path expression interpreter. When we encounter a pure function, we change the AST for the path expression so that the function invocation is replaced with the result of invoking the function. With the example above, that means we only construct and compile the regexp '127\\.' once, regardless of how many nodes it gets checked against. That leads to a dramatic reduction in the memory required to evaluate path expressions with such constructs against large files. Fixes hercules-team#569
At long last, PR #578 has a patch that reduces memory consumption for this issue a lot. In my testing, for a 50000 line If you have a chance to give this a spin, I would very much appreciate confirmation of my testing. |
In path expressions, we generally need to evaluate functions against every node that we consider for the result set. For example, in the path expression /files/etc/hosts/*[ipaddr =~ regexp('127\\.')], the regexp function was evaluated against every entry in /etc/hosts. Evaluating that function requires the construction and compilation of a new regexp. Because of how memory is managed during evaluation of path expressions, the memory used by all these copies of the same regexp is only freed after we are done evaluating the path expression. This causes unacceptable memory usage in large files (see hercules-team#569) To avoid these issues, we now distinguish between pure and impure functions in the path expression interpreter. When we encounter a pure function, we change the AST for the path expression so that the function invocation is replaced with the result of invoking the function. With the example above, that means we only construct and compile the regexp '127\\.' once, regardless of how many nodes it gets checked against. That leads to a dramatic reduction in the memory required to evaluate path expressions with such constructs against large files. Fixes hercules-team#569
In path expressions, we generally need to evaluate functions against every node that we consider for the result set. For example, in the path expression /files/etc/hosts/*[ipaddr =~ regexp('127\\.')], the regexp function was evaluated against every entry in /etc/hosts. Evaluating that function requires the construction and compilation of a new regexp. Because of how memory is managed during evaluation of path expressions, the memory used by all these copies of the same regexp is only freed after we are done evaluating the path expression. This causes unacceptable memory usage in large files (see hercules-team#569) To avoid these issues, we now distinguish between pure and impure functions in the path expression interpreter. When we encounter a pure function, we change the AST for the path expression so that the function invocation is replaced with the result of invoking the function. With the example above, that means we only construct and compile the regexp '127\\.' once, regardless of how many nodes it gets checked against. That leads to a dramatic reduction in the memory required to evaluate path expressions with such constructs against large files. Fixes hercules-team#569
….11.0 1.11.0 - 2018-08-24 - General changes/additions * augmatch: add a --quiet option; make the exit status useful to tell whether there was a match or not * Drastically reduce the amount of memory needed to evaluate complex path expressions against large files (Issue #569) * Fix a segfault on OSX when 'augmatch' is run without any arguments (Issue #556) - API changes * aug_source did not in fact return the source; and always returned NULL for that. That has been fixed. - Lens changes/additions * Chrony: add new options supported in chrony 3.2 and 3.3 (Miroslav Lichvar) * Dhclient: fix parsing of append/prepend and similar directives (John Morrissey) (NEWS truncated at 15 lines) Version 1.11.0 2018-08-22 David Lutterkort <lutter@watzmann.net> Replace pure function invocations in path expressions with their result In path expressions, we generally need to evaluate functions against every node that we consider for the result set. For example, in the path expression /files/etc/hosts/*[ipaddr =~ regexp('127\\.')], the regexp function was evaluated against every entry in /etc/hosts. Evaluating that function requires the construction and compilation of a new regexp. Because of how memory is managed during evaluation of path expressions, the memory used by all these copies of the same regexp is only freed after we are done evaluating the path expression. This causes unacceptable memory usage in large files (see hercules-team/augeas#569) (NEWS truncated at 15 lines)
An user reported an memory usage issue in Certbot, and when investigating the cause I found out that our (very suboptimal) regexes caused the Augeas
match
memory usage to balloon out of proportions.This only happens with really large configurations, and complex regexes however. The reporting users configuration had roughly 300 configuration files with 1500 lines each. I was able to create a much simplified file with similar effect. The effects start to be really visible after 40k lines, and at 80k lines
augtool
with stock httpd lens segfaults in startup on Debian stretch.Please note that the regex is really suboptimal, and is used for backwards compatibility for environments with too old version of Augeas to support case insensitive regex flag
'i'
. Using recent version and the case insensitive flag reduces the memory footprint roughly by two thirds, but it is still roughly 400 fold compared to actual configuration size.How to reproduce
AddType huge/memory usage
50k times.Copy the modified template to
/etc/apache2/sites-available/memorybomb.conf
Start
augtool -I httpd.aug
Use suboptimal regex:
The text was updated successfully, but these errors were encountered: