Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

io: Do not use multiple threads to write compressed files #731

Open
huddlej opened this issue Jun 1, 2021 · 0 comments
Open

io: Do not use multiple threads to write compressed files #731

huddlej opened this issue Jun 1, 2021 · 0 comments
Labels
bug Something isn't working

Comments

@huddlej
Copy link
Contributor

huddlej commented Jun 1, 2021

Current Behavior

The io.open_file function uses xopen as its backend to transparently support compressed inputs and outputs. By default, xopen uses multiple threads in a separate process to write some compressed file formats. When processing large files like the full GISAID SARS-CoV-2 database and writing these out to a gzip file, it is easy for xopen's subprocesses (igzip) to use all available CPUs (e.g., on a laptop).

Expected behavior

Augur should always use a single CPU per command unless otherwise requested by the user through an argument like --nthreads.

Possible solution

Add a threads keyword argument to io.open_file with a default value of 1 and pass this argument to the xopen function call.

@huddlej huddlej added the bug Something isn't working label Jun 1, 2021
@huddlej huddlej changed the title Do not use multiple threads to write compressed files io: Do not use multiple threads to write compressed files Jun 4, 2021
huddlej added a commit to nextstrain/ncov that referenced this issue Jun 4, 2021
As described in a separate Augur issue [1], xopen will attempt to call a
subprocess with multiple threads to write compressed output, depending
on the compression format. To avoid this issue during the ncov workflow,
we explicitly request 1 thread while writing out sanitized sequences.

[1] nextstrain/augur#731
huddlej added a commit to nextstrain/ncov that referenced this issue Jun 4, 2021
As described in a separate Augur issue [1], xopen will attempt to call a
subprocess with multiple threads to write compressed output, depending
on the compression format. To avoid this issue during the ncov workflow,
we explicitly request 1 thread while writing out sanitized sequences.

[1] nextstrain/augur#731
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant