GitHub - stevegrubb/tasker: A program that parallelizes tasks from a list on stdin

stevegrubb / tasker Public

Notifications You must be signed in to change notification settings
Fork 0
Star 1

A program that parallelizes tasks from a list on stdin

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
AUTHORS		AUTHORS
ChangeLog		ChangeLog
LICENSE		LICENSE
Makefile.am		Makefile.am
NEWS		NEWS
README		README
autogen.sh		autogen.sh
configure.ac		configure.ac
tasker.1		tasker.1
tasker.c		tasker.c
tasker.spec		tasker.spec

Repository files navigation

TASKER

BUILDING
========
To build the project run:

./autogen.sh
./configure
make
make install


BACKGROUND
==========
Have you noticed that we have CPUs with 16 cores and nothing takes advantage
of it? Take a look at htop. What do you see? A few processes linearly doing
things. Tasker is here to help.

If you find yourself doing scripting like:

files=(ls /usr/lib64/*)
for f in files
do
	process $f
done

or

dirs=(find /usr/share -type d)
for d in dirs
do
	process $d
done

then tasker can make a huge difference. Rewrite the above to

ls /usr/lib64/* | tasker process @@

and

find /usr/share -type d | tasker process @@

and look at htop now. NOTE: tasker wants the full path to the command to run.

Tasker is kind of like xarg, but parallelizes the jobs. It determines how
many cores to use, sets affinity of individual tasks to specific hyperthreads,
and keeps all hyperthreads busy until the input pipeline is complete. It also
waits for all child processes to finish so that no extra barrier is needed
for synchronization.

The @@ is used to denote where to place the input from stdin into the
command to be executed. It can be a value to an argument, an insert to
an otherwise complete path name, or the only argument to the process to be
run:

cat list | tasker command -m @@
cat list | tasker command /opt/test-results/@@/output.json
cat list | tasker command @@

Here is a contrived example to show how powerful it can be. Suppose we have
some analysis program that takes about 1 second to run on each input and the
files are processed linearly:

time -p (ls /usr/lib/systemd/system/*.service | (while IFS='$\n' read -r f; do /usr/bin/sleep 1 ; done))
real	6m8.601s
user	0m0.192s
sys	0m0.296s

Now with tasker:

ls /usr/lib/systemd/system/*.service | time -p tasker /usr/bin/sleep 1
real 12.01
user 0.08
sys 0.07

How is this so? There are 368 service files on the test system and it has 32
hyperthreads. Therefore, 368 / 32 = 11. So, 12 is in the ballpark. Even if
you have a 4 core system, using tasker should get the contrived example down
to 46 seconds from 6 minutes.