-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NginX binary with modsecurity consume all memory and crashes #1563
Comments
Hi intelbg, I've so far been unable to replicate the issue that you're experiencing with my setup, I'm compiling the v3/master branch of ModSecurity (with a few tweaks to work-around SELinux but nothing major) and thrown an arbitrary 1000 vhosts at nginx, reloaded & restarted fine albeit slow. The machine being used is physical, 32G RAM with 12 core cpu aswell as SSD for disk. For you to reach the load averages you mention are you performing any specific tasks or are the vhosts particularly active? |
Thank you for your reply! I didn't perform any specific tasks - only yum update of the nginx and service nginx configtest or restart. The vhosts are particularly active, but with nginx without modsec there is not an issue. Maybe for every request by some way there is an overhead even if modsecurity is not on for the ghosts? Some calculations in the nginx binary? |
Hi @intelbg , The domains I've added aren't in use so perhaps that is the reason for being unable to replicate, I will attempt to run some tests against these hosts and get back to you soon if I can replicate it or not. |
Thank you again. Meanwhile, is it possible the problem to come from different versions of gcc, g++, make, automate on the compile server and the production server to which transfer the binary later? Or in the parameters of the machine of the compilation server and the production servers (production servers are 10 times faster than this for the compilation)? But I compiled before that nginx in the same environment and without modsec no problems. |
Hi @intelbg - Interestingly, my server load has not massively increased but it has exhausted it's memory and I'm getting a -
I will look at this a bit more now and see if I can see any obvious culprits. |
Did you found anything unexpected? |
Hi @intelbg - I've not found the reason for this yet, but with further tests I have found an nginx out-of-memory event in the system messages. I will be trying to trace the cause of this, but from my preliminary checks It seems to be due to the large quantity of configs. |
But why this affects the system loads as these configs has not modsecurity enabled respectively they does not invoke modsecurity functions? |
I am having this issue currently. we have about 1500 vhosts. Works fine as long as I don't send any big traffic at it, but with a moderate 20requests per second or so this is what the log ends up looking like the following (I've fixed the PCRE limits exceeded issue, but still having it crash out
|
Hi @sethinsd - For the error you're seeing here:
You may need to configure the following settings:
To a value of your requirements, the reason this has a limit is to protect from DoS type attacks. |
ah yes I fixed that one already. But the other issue is persisting with it crashing under load |
@sethinsd how did you fixed it? Can you please share all of your experience? I am not sure why until now nobody has reported these problems as they are serious and now I hope as this number of people is increasing to find a solution. If I can provide another information that can be useful just let me know. |
I only fixed that one PCRE issue - the crashes and memory with fork problem persists and I cannot enable modsecurity in my production environment right now. I think it's likely related to both of us having so many host configuration files. I have about 1600 server {} declarations if not more. |
One thing we are noticing in our setup even with modsec disabled is the active connections climb to unreal levels over a few days until we restart nginx non gracefully. Like upwards of 60k when really we only have 100-200 active connections. So something is seriously wrong with the nginx compile. |
Any progress found? |
I've been looking in to the same or very similar issue with @met3or and one of the other guys in the office and we have some thoughts on the cause of the memory usage but could do with someone more familiar with the code base commenting on this as we could be way off the mark. We found here that every time a new vhost is parsed that this chunk of memory is malloc'd even if modsec isn't explicitly enabled for the domain. We've modified the source so that there's only a single shared instance of unicode_map_table. No idea yet if this is sensible but I'm trusting my colleague for now ;) Got to do a whole bunch more testing with it to see what explodes, but happy to provide a diff if others want to test in their dev environments. |
Very rough and not very well tested patch attached. |
What happened to the comment regarding :in /modsecurity/nginx/modsecurity/apr_bucket_nginx.c,at line 217 and 219 add cl->next = NULL; I did this and recompiled nginx and gave it a bunch of load last night to test and it seemed to hold up without the issue that was crashing it previously. This might be a true fix. |
Hi @sethinsd, maybe I should have created a new issue with my comment. I was trying to avoid creating unnecessary extra issues and this seemed relevant so apologies if I'm just confusing the matter. The patch I attached is aimed at fixing the memory usage with a large number of domains configured. Specifically in the example we're testing in a dev environment. With 1000 domains configured, even with 'modsecurity off' set, the resident memory usage after starting Nginx is in the region of 1.8GB and seems to increase roughly linearly with the number of configured domains/server{} blocks. This also impacts the restart/reload/configtest times of Nginx. Our goal is to see what happens when Nginx is configured with 10x-20x that number of domains. |
@Tiki-God I work in real production environment (shared hosting env), so can you please confirm to me that this is the right step I should follow to apply your patch to test it and will be this applied to some of the branches so I re-pull the branch?
If there is an mistake in my steps please provide me the right one to test and confirm it works. Thank you in advance! |
@intelbg I would not put this in a live environment yet as it hasn't been tested properly. We're still testing in our dev environment to see if we can trigger any odd behaviour. |
@Tiki-God what about the steps? Are they the right one? |
[root@XXXXX modsecurity]# patch rules.h < memory.patch |
@intelbg this is what I just did to check it's patching fine. After that I rebuild and install. git clone https://github.com/SpiderLabs/ModSecurity.git |
Just to report back again - I tried giving a server our full load last night, and the problem I was experiencing returned. Although, everything seemed to be working ok, I would get these alerts too frequently to feel comfortable. Probably some clients out there were just dropping connection and retrying. With ModSec off, this problem does not exist. And, this problem exists without any includes/rulesets turned on. |
@Tiki-God I receive the following errors when compiling libmodsecurity after applying your patch: rules.cc:35:13: error: ‘int* modsecurity::Rules::unicode_map_table’ is not a static member of ‘class modsecurity::Rules’ |
@intelbg It doesn't look like the patch has been applied to rules.h.
And the contents of the constructors (the stuff between the curly braces) on lines 50 and 58 should be deleted. |
Hi, Lets make sure that we are all talking about the same code base here. libModSecurity is under the v3/master branch and demands the ModSecurity-nginx connector. The nginx_refactoring branch is not version 3. At nginx_refactoring you can find some patches on top of v2.x but we don't recommend you to use it. Instead we suggest libModSecurity + ModSecurity-nginx project. For instance, the file apr_bucket_nginx.c is part of v2.x only while the file rules.h is v3.x. With that said, lets try to understand if the reported bug happens on v3, as we are suggesting the users to move forward it is likely that, if there is a bug, it won't be a fix for version 2.x The error message: @Tiki-God the memory you pointed out maybe an issue, but is unlikely to affect this specific bug because you found an issue in v3, while the bugs reported are v2-related [at least given the error messages]. So, let me ask you guys to make sure that you are running the version 3, as pointed out here:
once we are all in v3, we can continue debug the issue, if any. How that sounds? |
Ah that sounds great, apologies. I was compiling off 2.9.2. I'll pull down 3 and try soon and report back - my issue may not have been related to intelbg's because of these differences. |
Thanks @sethinsd, looking forward for your feedback ;) |
compiled with nginx (without patch provided above) and when trying to start nginx with modsecurity, it just failed to start. When I turned off the modsecurity on; directive, and tried to start nginx, system basically hung for quite a while. Eventually hit control c and enter, system didn't respond... after another min or two the control c finally came through, checked status and it was the same:
|
patch also failed similarly for me, so edited the rules.h manually removing the malloc stuff between curly braces in constructors, verified that it was declared static int, recompiled and installed, tried running without modsecurity on; and it's still stalling. previously I was running the old 2.9 configured with --enable-standalone-module which worked but core dumped occasionally. with v3 I can't even get it to start without it enabled, probably due to the 1300+ domain conf files I have. Eventually after running sudo nginx and waiting a few minutes it just says Killed |
Hi @sethinsd, How is your ModSecurity configuration? Do you have a global ModSecurity configuration for all hosts or each one loads its own ModSecurity configuration? |
I am with modsec V3 and I would like to confirm that until now there isn't a problem with memory leaks on the heavy servers with modsec disabled for the vhosts. I upgraded one more server with the patch and if there is no problem on it too I will active some vhosts to check the situation there. Thank you about the support! |
I found the following thing after implementing the patch only on the servers with the mod security and the patch. On nginx reload it dies, but the pid file exists. This is the history: activeshell@XXX1: Server 2 with nginx and modsec: activeshell@XXX2: And this is the result where the patch and the mod security at all does not exists: activeshell@working: |
@intelbg Could you attach your rules.h and rules.cc file? |
@zimmerle well, it was crashing even without modsecurity on; but I was attempting to put it in the http {} block, not any particular server or location. |
@Cloaked9000 I attach the two files: |
@intelbg Yeah, I can see why it's crashing. The changes have been made to rules.h but not rules.cc fully. The code in the rules.cc destructor should look like this: Rules::~Rules() {
} The patch statically defines On the topic of destructors, @sethinsd You might want to look at marking RulesProperties' destructor as virtual, because Rules inherits it. I've not looked at the code close enough to tell if this will make a practical difference, but I'd definitely make the change. Edit: Class Driver in parser/driver.h also inherits RulesProperties and defines a destructor. I'll attach my versions of rules.h and rules.cc, with the patch applied. |
@Cloaked9000 I removed the two lines in the block you provided and recompiled. Now it seems to work. Thank you really about the great support. If I find something other I will write here. Have a nice day! |
I can confirm that this patch fixes owasp-modsecurity/ModSecurity-nginx#67 ( Mod_sec v3) . Tested this on a server with around 6000 domains and nginx -t and nginx -s reload completes just fine. So without merging this v3 is definitely unusable on a server with more than 20-30 domains. |
@intelbg @AnoopAlias Just a heads up, neither me nor @Tiki-God are affiliated with Mod Security, we just had the same issue. If you're going to test the patch, do so with unicode URLs |
Hi, The initially investigation that pointed the unicode malloc as the guilt for the mass memory consumption was indeed true. The proposed solution, however, may not be the best approach. Every single location or { } block inside nginx uses a Rules instance, regardless having ModSecurity rules or not. That is normal and expected, however, allocating a huge block on that structure is doomed to failed. Keeping the Rules object light is the key for the success of that approach. Instead of removing the unicode support, the patch on the main tree suggest the utilization of a structure which is only occupies the memory when the configuration flag is used. In practice, for those whom are not using the unicode map, is the same as remove it. Changes are available here: 1ad9525 |
Just tested this using the official patch in v3/Master and things look ok. nginx reload and restart works fine |
Hello again and sorry for the inconvenience,
I am using libmodsecurity V3 (nginx_refactoring branch) + nginx connector with CRSv3 and when I put this binary of nginx in production with around 150 -200 vhosts, after I make even service nginx configtest (or restart) the server is on 150-200 load immediately (strace show that there are a lot of wait4 sys calls) and the process takes around 20-30 seconds. I tried to remove the CRS rules folder to see if the problem is in the number of lines/files or in the binary but no big difference. With binary without modsec everything is ok. Also, with the modsec nginx binary there is a memory allocation problems on restart and nginx does not want to start. Removed all limits on the operating systems including cgroups, ulimit, sysctl etc but no effect. If I back the binary without modsec everything is fine.
Is here anyone that faced such a problems and is there a chance to:
The machines are with 32GB of ram, 12 or 16 cpu cores and normal HDDs. On SSD it's a little bit better.
The text was updated successfully, but these errors were encountered: