Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On some shell script that start java cannot open shared object file: Too many open files #435

Closed
yodatak opened this issue Dec 6, 2018 · 10 comments

Comments

@yodatak
Copy link

@yodatak yodatak commented Dec 6, 2018

Have you tried the latest master version from Git? Yes and 0.2.3
Centos 7.3 and RHEL 7.3
Python versions 2.7

In using some shell script that start some java i got some errors like this
cannot open shared object file: Too many open files

i try to run them with:

  • name: Arret du node
    vars:
    mitogen_task_isolation: fork
    become: true
    become_user: "{{ user }}"
    shell: XXXXXXX

But i don't fix the issue is there is a solution for this ?

@yodatak yodatak changed the title On some shell script cannot open shared object file: Too many open files On some shell script that start java cannot open shared object file: Too many open files Dec 6, 2018
dw added a commit that referenced this issue Dec 6, 2018
This is a temporary solution at best.
@dw
Copy link
Member

@dw dw commented Dec 6, 2018

This is due to the performance fix in https://github.com/dw/mitogen/issues/362 .. clearly 512 file descriptors is far too low. I had not considered this.

The problem is that Red Hat ships with a default file descriptor limit of 1,048,576, and Python 2 iterates the entire possible file descriptor space when starting subprocesses. The result is that a single child process takes >500 ms to start, which becomes extremely expensive when called in a loop.

As an initial fix, simply increasing this to a higher number might be a good start. 4096 might be a good option: it is still 256x less work than default Red Hat (i.e. 2.4 ms vs. 600 ms from issue #362), while still 8x higher than the previous default.

I have increased the limit on Git master to 4096 -- please let me know if this is still too low.

The real problem is that the child process is inheriting a non-default descriptor limit from Ansible/Mitogen, and that is a behavioural difference in Mitogen that must be fixed properly. There are at least two options available, but neither is very appealing. I will think about a better permanent solution.

Thanks for reporting this!

@jbeisser
Copy link

@jbeisser jbeisser commented Dec 7, 2018

As a bypass, I've been invoking the client environment, and not using become if it can be avoided.

shell: sh -i -c "XXXXX"
@dw
Copy link
Member

@dw dw commented Dec 7, 2018

Python >3.2 subprocess module requires no help at all: https://bugs.python.org/issue21618

So the issue is Python <=2.7 that requires help.

The approach I am mulling over is monkey-patching 'subprocess.Popen.close_fds' to work the same as in 3.x ( https://github.com/python/cpython/blob/master/Modules/_posixsubprocess.c#L317 ) -- by iterating /proc/self/fd.

As far as monkey-patches go, it is pretty safe, but really it would be nice to avoid one at all. So far, it seems either having a slow Ansible on Red Hat, or using an annoying, but nonetheless safe monkey-patch.

@jbeisser
Copy link

@jbeisser jbeisser commented Dec 7, 2018

I'm running from Python 2.7.13, with 2.6.6 on CentOS6 on remote systems. If there's an easy way to increase the FDs, or have the execution honor the system ulimit settings, I'm all ears. I'd rather not set the per module mitogen_task_isolation, to keep the playbooks fairly generic.

@dw
Copy link
Member

@dw dw commented Dec 7, 2018

@jbeisser the relevant code block appears on line 94 of ansible_mitogen/target.py. You can literally just delete that block and things will work as you desire. But per the original problem, this creates huge latency on default Red Hat installs.

dw added a commit that referenced this issue Dec 8, 2018
This replaces the previous method for capping poorly Popen()
performance, instead entirely monkey-patching the problem function
rather than simply working around it.
dw added a commit that referenced this issue Dec 8, 2018
@dw
Copy link
Member

@dw dw commented Dec 8, 2018

I've checked in a hopefully permanent fix on master. Would it be possible for you both to re-test?

Thanks again for reporting!

@jbeisser
Copy link

@jbeisser jbeisser commented Dec 9, 2018

I'll pull down from master to run a test. Checking the user account I need to run against, the ulimit -u output says 64k FDs, which seems pretty reasonable.

@yodatak
Copy link
Author

@yodatak yodatak commented Dec 10, 2018

Thanks for me it fix the problems i have !

@dw
Copy link
Member

@dw dw commented Dec 10, 2018

This is now on the master branch and will make it into the next release. To be updated when a new release is made, subscribe to https://networkgenomics.com/mail/mitogen-announce/

Thanks for the confirmation and thanks again for reporting this!

@dw dw closed this Dec 10, 2018
@jbeisser
Copy link

@jbeisser jbeisser commented Dec 10, 2018

Yep, ran a quick test against master. Works!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants