Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demultiplexer may cause `OSError: [Errno 24] Too many open files` on large sample runs #320

Open
jakereps opened this issue Jul 22, 2018 · 4 comments

Comments

@jakereps
Copy link

commented Jul 22, 2018

On large sample count runs (or possibly smaller if the system itself is using a large amount of files at the time of analysis), any pipeline using the Demultiplixer class may run into an operating system error due to having too many open file descriptors.

This was brought up by a user on the QIIME 2 forum and appears to be identical to the issue that the initial qiime2/q2-demux implementation was facing that was fixed here.

The way it was solved on QIIME 2's demultiplexer was by randomly closing X% of the sample files to keep it below the system limit. I would gladly lend a hand in adding a patch and opening a pull request into cutadapt if you would like.

This issue pertains to any version of cutadapt that uses these lines.

@marcelm

This comment has been minimized.

Copy link
Owner

commented Jul 24, 2018

Good point, thanks for noticing. I’d of course be happy about a patch. One problem though is that the seqio.open function doesn’t support mode='a' at the moment, so that will also need to be implemented.

I’ve started to rewrite the FASTA/FASTQ parsers and writers (code that was previously in the seqio module) to make them more efficient and moved them to a separate project (https://github.com/marcelm/dnaio/). Either you submit a PR there as well to add an append option, or you wait for me to do this when I’m back to work around middle of August.

@marcelm marcelm added the bug label Aug 30, 2018

@marcelm

This comment has been minimized.

Copy link
Owner

commented Sep 2, 2019

I’m wondering how to solve this. Closing some of the files and re-opening them when necessary sounds like it should work, but then I wonder how to actually do this when working with compressed output. For gzip files, we would get a multipart gzip in the end. It’s probably not easily possible to just close the underlying file descriptor and re-attach a new one later.

However, one thought just crossed my mind: If the problematic limit is a soft limit, we could just raise it ourselves with resource.setrlimit(). Then the problem would only arise when we hit the hard limit.

marcelm added a commit that referenced this issue Sep 4, 2019
Circumvent soft limit on the number of open files when demultiplexing
When opening a file during demultiplexing fails because of too many open
files, we simply raise the soft limit and re-try. This should reduce the
number of times the user sees a "Too many open files" error, but it does not
help of course if the hard limit is too low.

See #320
@marcelm

This comment has been minimized.

Copy link
Owner

commented Sep 4, 2019

I’ve now changed the demultiplexer such that it will raise the soft limit if it gets the "Too many open files" error.

@fasterius

This comment has been minimized.

Copy link

commented Sep 4, 2019

Okay, cool, thanks for fixing it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.