New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
auditd(8) doesn't reap children until startup is complete #35
Comments
There are several solutions to this problem, but they all have drawbacks. Would the openbsm maintainers please indicate their preferences?
|
I prefer to avoid |
auditd(8) invokes /var/security/audit_warn every time it expires an audit trail. During startup, it can expire thousands. It doesn't reap any of these children until after startup is complete. That can reach the kern.maxproc process limit, DOSing the system. Fix this bug by setting SA_NOCLDWAIT for the SIGCHLD handler, so the system won't create any zombie processes in the first place. The downside is that syslog(3) will no longer log an error if the /etc/security/audit_warn script fails. Fixes openbsm#35
OK so I've had the time to catch up on this thread. The way I see it, serializing the execution of I am pretty sure though Or possibly add: if (sigchlds != sigchlds_handled) {
sigchlds_handled = sigchlds;
auditd_reap_children();
} To after the fork/exec? I would also be OK with option 4, too. Thoughts? |
We can't do all of the reaping from the signal handler, because syslog(3) does lots of stuff and I doubt that it's all async-signal-safe. How about a compromise approach that won't allow the number of children to grow too large, but won't let slow children block auditd as long as the number of children is small? I'm thinking something like this after the fork/exec: if (sigchlds - sigchlds_handled > 100) {
sigchlds_handled = sigchlds;
auditd_reap_children();
} It's still simpler than option 3. |
I think thats probably fine though, I am not sure I like how the process number is hardcoded since on some systems it might be fine. For discussion purposes I have pushed #40 (do not merge, compile tested only!) But in this version if we can get the limits/current process counts and only trigger the rotation if we are using more than %10 of our available processes. Totally might be overkill but I thought I would push this up to see what people think. Thoughts? |
In the event we have thousands of audit trails to expuire, we go through and expire each one, executing the audit_warn script. We do not reap these children until we enter the main event loop. This change adds a reap check right after the fork/exec operations, if we have over 100 child processes that are waiting to be reaped, do it. This hopefully fixes a denial of service condition when starting up with many audit trail files. In collaboration with: asomers Issue: #35
On startup, auditd(8) expires any audit trails that satisfy the criteria for expiration. It logs each expiration not using syslog(3), but by executing the script /etc/security/audit_warn. However, it does not attempt to reap any of its children until it enters the main loop. This is a problem if there are thousands of audit trail files, as it can quickly hit kern.maxproc. That can easily lock up the system if the admin cannot kill auditd. I've seen it happen 3 times (I have a lot of audit trail files because I'm working on the audit(4) test suite).
A simple solution would be to use system(3) instead of fork(2)/execve(2) to execute /etc/security/audit_warn. However, this would cause a slight delay, which could conceivably result in audit records being dropped, especially if /etc/security/audit_warn does something slow, like pipe over the network. A subtler solution would be to call auditd_reap_children in the loop in auditd_expire_trails, if sigchlds gets too high. Or, have a "startup mode", which uses system(3) to fix the kern.maxproc problem, but fork/execve for lower latency after startup is complete.
Downstream bug: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229380
The text was updated successfully, but these errors were encountered: