auditd(8) doesn't reap children until startup is complete #35

asomers · 2018-06-28T15:13:11Z

On startup, auditd(8) expires any audit trails that satisfy the criteria for expiration. It logs each expiration not using syslog(3), but by executing the script /etc/security/audit_warn. However, it does not attempt to reap any of its children until it enters the main loop. This is a problem if there are thousands of audit trail files, as it can quickly hit kern.maxproc. That can easily lock up the system if the admin cannot kill auditd. I've seen it happen 3 times (I have a lot of audit trail files because I'm working on the audit(4) test suite).

A simple solution would be to use system(3) instead of fork(2)/execve(2) to execute /etc/security/audit_warn. However, this would cause a slight delay, which could conceivably result in audit records being dropped, especially if /etc/security/audit_warn does something slow, like pipe over the network. A subtler solution would be to call auditd_reap_children in the loop in auditd_expire_trails, if sigchlds gets too high. Or, have a "startup mode", which uses system(3) to fix the kern.maxproc problem, but fork/execve for lower latency after startup is complete.

Downstream bug: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229380

asomers · 2018-06-28T15:46:37Z

There are several solutions to this problem, but they all have drawbacks. Would the openbsm maintainers please indicate their preferences?

Register the SIGCHLD handler so it autoreaps child processes. This is the simplest solution and will remove all the reaping code. The downside is that auditd won't log a warning if /etc/security/audit_warn fails.
Run /etc/security/audit_warn synchronously with system(3). This is also quite simple. The downside is that a slow audit_warn script can delay auditd(8), potentially causing audit events to be dropped.
Check the value of sigchlds in auditd_expire_trails. Don't let it get too large without reaping children. The only downside to this approach is that it's the most complicated.
Run audit_warn synchronously during startup, but asynchronously after startup. This approach is also complicated. It could also block auditd if audit_warn hangs during startup.

cemeyer · 2018-06-28T18:22:28Z

I prefer to avoid system(3) for this. 3/4 look fine. Not sure if 1 is ok.

auditd(8) invokes /var/security/audit_warn every time it expires an audit trail. During startup, it can expire thousands. It doesn't reap any of these children until after startup is complete. That can reach the kern.maxproc process limit, DOSing the system. Fix this bug by setting SA_NOCLDWAIT for the SIGCHLD handler, so the system won't create any zombie processes in the first place. The downside is that syslog(3) will no longer log an error if the /etc/security/audit_warn script fails. Fixes openbsm#35

csjayp · 2018-07-25T02:12:42Z

OK so I've had the time to catch up on this thread. The way I see it, serializing the execution of audit_warn (i.e. just calling a waitpid() loop right after the fork/exec) could work but may not be desirable in some cases. When I look at the Darwin code, they reap the children from their version of the signal handler (which is a bit difference since I dont think it runs on the signal stack because signal delivery piggy backs on mach).

I am pretty sure though waitpid(2) is required to be async/signal safe by POSIX.1. Another option: I wonder if we should call audit_reap_children() directly from the signal handler on receiving SIGCHLD (similar in behaviour to how it's handled on Darwin). I don't think that we should commit the current patch as it removes the collection and reporting of the exit status. If nobody else wants to write I can try to patch this within the next couple of days if nobody else beats me to it.

Or possibly add:

 		if (sigchlds != sigchlds_handled) {
			sigchlds_handled = sigchlds;
			auditd_reap_children();
		}

To after the fork/exec?

I would also be OK with option 4, too.

Thoughts?

asomers · 2018-07-25T13:59:38Z

We can't do all of the reaping from the signal handler, because syslog(3) does lots of stuff and I doubt that it's all async-signal-safe. How about a compromise approach that won't allow the number of children to grow too large, but won't let slow children block auditd as long as the number of children is small? I'm thinking something like this after the fork/exec:

if (sigchlds - sigchlds_handled > 100) {
    sigchlds_handled = sigchlds;
    auditd_reap_children();
}

It's still simpler than option 3.

- Compile tested only for discussion

csjayp · 2018-07-26T18:10:05Z

I think thats probably fine though, I am not sure I like how the process number is hardcoded since on some systems it might be fine. For discussion purposes I have pushed #40 (do not merge, compile tested only!) But in this version if we can get the limits/current process counts and only trigger the rotation if we are using more than %10 of our available processes.

Totally might be overkill but I thought I would push this up to see what people think.

Thoughts?

In the event we have thousands of audit trails to expuire, we go through and expire each one, executing the audit_warn script. We do not reap these children until we enter the main event loop. This change adds a reap check right after the fork/exec operations, if we have over 100 child processes that are waiting to be reaped, do it. This hopefully fixes a denial of service condition when starting up with many audit trail files. In collaboration with: asomers Issue: #35

csjayp · 2018-07-27T21:56:11Z

Merged in #41

Thanks for reporting, collaborating and testing @asomers !

asomers mentioned this issue Jul 10, 2018

Don't overflow kern.maxproc during startup #37

Closed

csjayp added a commit that referenced this issue Jul 26, 2018

Check in proposed fix for issue #35

bc3b5da

- Compile tested only for discussion

csjayp mentioned this issue Jul 27, 2018

Reap children before entering main event loop #41

Merged

csjayp closed this as completed Jul 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

auditd(8) doesn't reap children until startup is complete #35

auditd(8) doesn't reap children until startup is complete #35

asomers commented Jun 28, 2018

asomers commented Jun 28, 2018

cemeyer commented Jun 28, 2018

csjayp commented Jul 25, 2018 •

edited

asomers commented Jul 25, 2018

csjayp commented Jul 26, 2018 •

edited

csjayp commented Jul 27, 2018

auditd(8) doesn't reap children until startup is complete #35

auditd(8) doesn't reap children until startup is complete #35

Comments

asomers commented Jun 28, 2018

asomers commented Jun 28, 2018

cemeyer commented Jun 28, 2018

csjayp commented Jul 25, 2018 • edited

asomers commented Jul 25, 2018

csjayp commented Jul 26, 2018 • edited

csjayp commented Jul 27, 2018

csjayp commented Jul 25, 2018 •

edited

csjayp commented Jul 26, 2018 •

edited