New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Too many open files" from clients running OS X #3984
Comments
OSX: 10.12.6 +1 on this error - seeing this here as well This is really not good because the TLS server shows these hosts as online, but they're not actually logging any data to it.
|
It's interesting, if you Google around there's a handful of projects having issues with FSEvents somewhat recently. Examples: I just checked with my co-worker who is experiencing this issue and our ulimit settings are somehow different: His (broken osquery):
Mine (Working):
Not sure how that discrepancy happened since we're on the same OS version :-/ |
Full stack trace:
|
@clong Perhaps your coworker did an upgrade and you a clean install (to explain the different ulimit). Ulimit on my machine for FDs is 10240. |
|
Can you also provide the output of the following command? |
|
Thanks @zbuc Are you able to see message "kqueue: Too many open files" with the same instance of osqueryd for which you executed "lsof" ? |
Yes
…On Wed, Dec 13, 2017 at 12:06 PM, uptycs-nishant ***@***.***> wrote:
Thanks @zbuc <https://github.com/zbuc>
Are you able to see message "kqueue: Too many open files" with the same
instance of osqueryd for which you executed "lsof" ?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3984 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAiyuBIXMOX0uEa-e2QCFrUlXwkauISoks5tAAQygaJpZM4Q1OL3>
.
|
Figured out a workaround for this. Ultimately, we were seeing this happen on hosts with a low value for open files according to
The value of
|
Based upon the evidence provided by @zbuc , I have come up with following analysis - osqueryd limit for can be opened files provided by @zbuc 300_spartans.c
At First I was getting following message - I believe in case of @zbuc system was under stress and that caused the problem. Need to increase system-wide limit as well. Further findingThe above message "kqueue:Too many open files in system" converted to "kqueue:Too many open files" got me curious and I believe this is bug. It led me to -
I believe these two if statements 'if ( (error = fdalloc(p, 0, &nfd)) )' and 'if (nfiles >= maxfiles)' needs to be swapped. |
Getting hit with this as well, let me know if there is anything I can provide. |
Change of message from "kqueue:Too many open files in system" to "kqueue:Too many open files" is manifestation of a bug in the form of resource leak ( process's file descriptor table slot is being leaked here) - If "(nfiles >= maxfiles) " fails then is not un-reserving that slot. That way here is the resource leak. |
@uptycs-nishant, if I understand correctly then you should report that bug to Apple's RADR bug tracker. |
We have seen an error similar to this on at least one of our hosts. This environment: macOS 10.12.6 osquery version 2.10.0
Output of ulimit -a:
And another mac, 10.12.6, osquery 2.10.0:
|
opened a bug with apple - |
I see a lot of folks with 256 as the soft limit for open descriptors. We can change this limit to be the max of about 10k when the program starts. This should handle the RocksDB cases. For those having issues I can provide some test binaries if you’d like. Just +1 this comment or DM me in Slack. There’s another potential issue, alongside the FSEvents bug, where we could potentially subscribe to 10k+ locations for FIM. We should add some protection and alarming logic around this. |
On the RADAR now - http://www.openradar.me/36148377 |
2.11.2 Completely resolved these issues for us |
I'm seeing this happening widely across our fleet of OS X machines with OSQuery deployed:
Invocation of
osqueryd
:/usr/local/bin/osqueryd --database_path=/usr/local/zentral/osquery/db --tls_hostname=censored --enroll_tls_endpoint=/osquery/enroll --enroll_secret_path=/usr/local/zentral/osquery/enroll_secret.txt --config_plugin=tls --config_tls_endpoint=/osquery/config --config_tls_refresh 120 --logger_plugin=aws_kinesis,aws_firehose --logger_tls_endpoint=/osquery/log --logger_tls_period 60 --disable_distributed=false --distributed_plugin=tls --distributed_tls_read_endpoint=/osquery/distributed/read --distributed_tls_write_endpoint=/osquery/distributed/write --distributed_interval 60 --tls_server_certs=/usr/local/zentral/tls_server_certs.crt --aws_kinesis_stream=censored-endpoints_stream_alert_kinesis --aws_firehose_stream=censored-endpoints_stream_alert_firehose --aws_access_key_id=censored --aws_secret_access_key=censored --aws_region=censored
Versions:
OS X 10.13.1
Please let me know if there's any other information I can include to assist in debugging.
The text was updated successfully, but these errors were encountered: