New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q: Is there a way to control threads used bei pigz? #290

Open
sklages opened this Issue Feb 13, 2018 · 4 comments

Comments

Projects
None yet
3 participants
@sklages

sklages commented Feb 13, 2018

I'd like to have control over the number of threads used by pigz without modifying the source code.
It seems that all cores are used for (de)compression even if I specify --cores?

That's not a very "social approach" when more users sharing one single server .. 😉

@marcelm

This comment has been minimized.

Owner

marcelm commented Feb 13, 2018

Yes, the --cores option isn’t passed on to pigz. I thought about this, but I’m not certain whether I really want to do that. I cannot control how many cores are used anyway. Even now, --cores=x doesn’t give you that much control over how many CPU resources are used since that only sets the number of worker threads. There is the main thread and an additional thread that writes data; there’s a subprocess that decompresses gzipped input (or two for paired-end data); and then there can be two output pigz processes. Not all of the threads are busy 100% of the time, so ensuring that all provided cores are used is difficult.

I would probably just use nice to start the process.

You can also restrict a process to run on a certain set of CPU cores with taskset. Example:

taskset -c 0-3 cutadapt ...

This would run cutadapt on the first four cores (cores 0 to 3). Note that if a second user does the same and enforces use of cores 0-3, then they would share these cores, so this is not quite the same as requiring that a process is limited to 400% CPU.

If you want to be thorough, you could also configure your server so that one user cannot take CPU resources from another. I have no idea how to do that, but I know it’s possible because our cluster is configured this way. It probably involves using cgroups.

@marcelm

This comment has been minimized.

Owner

marcelm commented Apr 30, 2018

No feedback, closing. Feel free to re-open if you have further thoughts.

@marcelm marcelm closed this Apr 30, 2018

@Tjitskedv

This comment has been minimized.

Tjitskedv commented Jul 30, 2018

I just started to use cutadapt with the cores option and was a bit negatively surprised that the process took far more than the 8 cores which I had set. I think it would be better to pass the number of maximum cores to pigz. And in case of paired end data just divide the number of set cores by two and pass that over to pigz. most often it isn't a problem if you use 1 additional thread, but using 20 instead of 8 might be a problem and in my opinion not very social to other users of the same server. I use cutadapt in combination with snakemake, so using taskset isn't such a nice way to implement in the pipeline. I hope you will take it in consideration.

@marcelm

This comment has been minimized.

Owner

marcelm commented Aug 16, 2018

Thanks for the feedback. I agree that cutadapt shouldn’t use a much greater number of cores than specified with -j. Are you sure the pigz subprocess was really the problem? How did you measure CPU usage?

@marcelm marcelm reopened this Aug 16, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment