Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Demultiplexer may cause `OSError: [Errno 24] Too many open files` on large sample runs #320
On large sample count runs (or possibly smaller if the system itself is using a large amount of files at the time of analysis), any pipeline using the
This was brought up by a user on the QIIME 2 forum and appears to be identical to the issue that the initial
The way it was solved on QIIME 2's demultiplexer was by randomly closing X% of the sample files to keep it below the system limit. I would gladly lend a hand in adding a patch and opening a pull request into
This issue pertains to any version of
Good point, thanks for noticing. I’d of course be happy about a patch. One problem though is that the
I’ve started to rewrite the FASTA/FASTQ parsers and writers (code that was previously in the seqio module) to make them more efficient and moved them to a separate project (https://github.com/marcelm/dnaio/). Either you submit a PR there as well to add an append option, or you wait for me to do this when I’m back to work around middle of August.
I’m wondering how to solve this. Closing some of the files and re-opening them when necessary sounds like it should work, but then I wonder how to actually do this when working with compressed output. For gzip files, we would get a multipart gzip in the end. It’s probably not easily possible to just close the underlying file descriptor and re-attach a new one later.
However, one thought just crossed my mind: If the problematic limit is a soft limit, we could just raise it ourselves with