Skip to content

A program that parallelizes tasks from a list on stdin

License

Notifications You must be signed in to change notification settings

stevegrubb/tasker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TASKER

BUILDING
========
To build the project run:

./autogen.sh
./configure
make
make install


BACKGROUND
==========
Have you noticed that we have CPUs with 16 cores and nothing takes advantage
of it? Take a look at htop. What do you see? A few processes linearly doing
things. Tasker is here to help.

If you find yourself doing scripting like:

files=(ls /usr/lib64/*)
for f in files
do
	process $f
done

or

dirs=(find /usr/share -type d)
for d in dirs
do
	process $d
done

then tasker can make a huge difference. Rewrite the above to

ls /usr/lib64/* | tasker process @@

and

find /usr/share -type d | tasker process @@

and look at htop now. NOTE: tasker wants the full path to the command to run.

Tasker is kind of like xarg, but parallelizes the jobs. It determines how
many cores to use, sets affinity of individual tasks to specific hyperthreads,
and keeps all hyperthreads busy until the input pipeline is complete. It also
waits for all child processes to finish so that no extra barrier is needed
for synchronization.

The @@ is used to denote where to place the input from stdin into the
command to be executed. It can be a value to an argument, an insert to
an otherwise complete path name, or the only argument to the process to be
run:

cat list | tasker command -m @@
cat list | tasker command /opt/test-results/@@/output.json
cat list | tasker command @@

Here is a contrived example to show how powerful it can be. Suppose we have
some analysis program that takes about 1 second to run on each input and the
files are processed linearly:

time -p (ls /usr/lib/systemd/system/*.service | (while IFS='$\n' read -r f; do /usr/bin/sleep 1 ; done))
real	6m8.601s
user	0m0.192s
sys	0m0.296s

Now with tasker:

ls /usr/lib/systemd/system/*.service | time -p tasker /usr/bin/sleep 1
real 12.01
user 0.08
sys 0.07

How is this so? There are 368 service files on the test system and it has 32
hyperthreads. Therefore, 368 / 32 = 11. So, 12 is in the ballpark. Even if
you have a 4 core system, using tasker should get the contrived example down
to 46 seconds from 6 minutes.