fjd makes it easy to run computational jobs on many CPUs.
There are several powerful tools for dispatching dynamic lists of computational jobs to multiple, possibly distributed CPUs. However, for simple use cases, the effort of installation and setup is often too high.
fjd, the hurdle to get started is very low. Installation is easy. Pushing jobs into the queue only requires to put an executable script in a directory. Per default, all CPUs on your computer are used. Other computers can be used very easily, as well. Plus, your jobs can be written in any language.
You can call
$ fjd --exe "mktemp XXX.tmp" --repeat 5
This simple example creates five temporary files with random names. Each of the five jobs will be done on a different CPU (if you have that many).
You can also supply a list with parameters:
$ fjd --exe 'touch bla$1.txt' --parameters 1,2,3,4
This will create four files:
bla1.txt, bla2.txt, bla3.txt, bla4.txt.
fjd will select by itself how many CPUs on your machine it should use.
Note that if you use the placeholder
$1, use single quotes around the
To use several computers, you can configure a number of hosts in your network and how many CPUs should be running on each (see an example of this below).
For illustration, here is the session output from the first example:
$ fjd --exe "mktemp XXX.tmp" --repeat 5 [fjd-recruiter] Hired 5 workers in project "default". [fjd-dispatcher] Started on project "default". [fjd-dispatcher] Job queue is empty and all jobs have finished. [fjd-recruiter] Fired 5 worker(s) in project "default".
fjd can also be controlled in more detail.
Most importantly, you can write an executable bash script for each job and put it in the queue. You have total freedom what each job is doing and when you put it in the queue. In the usual case your queue directory is
default could be changed to a specific project name). Another option is to describe a job as a configuration file. We'll have an example below.
Then, you can start one or more
fjd-worker threads, like this:
$ fjd-recruiter hire [<number of workers>]
Per default, this starts n-1 worker threads, where n is the number of CPUs on your machine.
Finally, start a dispatcher:
fjd-dispatcher assigns jobs to
fjd-worker threads who are currently not busy, until the job queue is empty.
You can cancel the
fjd-dispatcher process at any time (i.e. hit CTRL-C) and it will ask you if workers should be fired or not.
By the way, if screen sessions are running and you want them to stop,
then you can always fire workers by hand:
$ fjd-recruiter fire
or, to fire only the ones running one specific project (more on that option below):
$ fjd-recruiter --project <my-project> fire
If you start a new dispatcher, it will first clean up ("fire") old screen sessions.
First, you need to have python 2.7, which the default python on almost all systems these days (note: python 3.x support is not there yet, but close; see issue #10 on github). Then:
$ pip install fjd
If you do not have enough privileges (look for something like "Permission denied" in the output), install locally (for your user account only):
$ pip install fjd --user
If you do not have
pip installed (I can't wait for everyone running Python 3.4), I made a small script, which should help to install all needed things. Download it and make it executable:
$ wget https://raw.github.com/nhoening/fjd/master/fjd/scripts/INSTALL $ chmod +x INSTALL
Now you can install system-wide:
or, if you do not have root privileges, you can also install locally:
$ source INSTALL --user
Note - If you installed locally, this should be added to your
Note - Installing locally could be the better choice, actually, because it might save you
fjd on each machine you want to use.
If they all share the home directory, they will all know about
fjd once you are logged in.
Using several machines in your network
We can tell
fjd about other machines in the network and how many workers we'd like
to employ on them. To do that, we place a file called
remote.conf in the project's
directory. Here is an example:
[host1] name: localhost workers: 3 [host2] name: hyuga.sen.cwi.nl workers: 5
There is an advanced example
in the github repo which you can run and inspect. Use the scripts
run-remote-example.sh to execute.
fjd works under the assumption that all CPUs are in a local network and can access a shared home directory.
Note - If you normally have to type in a password to login to a remote machine via SSH,
you'll have to do this here, as well. You can configure passwordless login by
putting a public key in ~/.ssh/authorized_keys. For the shared-home directory
setting we use
fjd for, this makes a lot of sense, as you stay within your LAN anyway.
In general, some SSH configuration can go a long way to ease your life,
e.g. by connection sharing through the ControlAuto option. Search the web or ask your local IT guy.
Inspecting the workers
Workers are Unix screen sessions, you can see them by typing:
$ screen -ls
and inspect them if you want. As attaching to screen sessions is cumbersome
fjd can also close them before you have a chance to see what went wrong
(this is an option you can set, see next example below),
fjd logs screen output to
~/.fjd/<project>/screenlogs (each worker has
its own log file).
Here is an example log from a screen session of a worker:
$ fjd-worker --project default [fjd-worker] Started with ID nics-macbook.fritz.box_1382522062.31. [fjd-worker] Worker nics-macbook.fritz.box_1382522062.31: I found a job. [fjd-worker] Worker nics-macbook.fritz.box_1382522062.31: Finished my job. [fjd-worker] Worker nics-macbook.fritz.box_1382522062.31: I found a job. [fjd-worker] Worker nics-macbook.fritz.box_1382522062.31: Finished my job.
Running several projects
Normally, the project directory is
~/.fjd/default. But you can tell
to use a different project identifier (this way, you could have several projects
running without them getting into each other's way, i.e. stopping one project
wouldn't stop the workers of the other and you wouldn't override the first project
if you start another).
For instance, you'd recruite workers and dispatch jobs with the
$ fjd-recruiter --project remote-example hire $ fjd-dispatcher --project remote-example
Or, if you call
fjd from code:
recruiter = Recruiter(project=project) recruiter.hire() Dispatcher(project=project)
Jobs as configuration files
When a job is heavily parameterised, a bash script might not be convenient. I sometimes prefer a configuration file in such cases.
In this case, a job file should adhere to the general INI-file standard.
You can write in there what you want -
fjd only has one requirement. Add a
fjd section, and specify which
command to execute. Here is an example:
[fjd] executable: python example/ajob.py [params] logfile: logfiles/job0.dat my_param: value0
Your executable script gets this configuration file passed as a command line argument, so this would be called on the shell:
python example/ajob.py <absolute path to the job file>
In addition, you can put other job-specific configuration in there for the executable
to see, as I did here in the
[params]-section (I repeat: only the
is required by
Take care to get relative paths correct (or simply make them absolute):
If paths are relative, they should be relative to the directory in which you
The advanced example also shows how this works.
You can use the script
run-config-example.sh to run it.
- I know an existing simple tool with comparable features: Gnu Parallel. How do you position
fjdwith respect to that?
- First off, Gnu Parallel is an awesome tool and very powerful. I found out about it just recently. It fits into the Unix workflow perfectly and you can do almost anything with it, if you're knowledgeable on the Unix command line (for instance if you use
xargsa lot, you can become productive in Gnu parallel quickly).
fjddelivers several functionalities that Gnu parallel delivers, but not all of them. I'd say there are three small aspects which are different in
fjdis more explicit about worker processes (they are Unix screen sessions, which you can join live, but they also have their own log files each). Second,
fjdlives in the Python ecosystem (which means you can edit it easier if you prefer Python over Perl and depend on it in Python programs you write). Third, jobs with many parameters can parametrised in config files, which I feel is quite convenient sometimes. In addition,
fjdhas one interesting feature to offer: the queue of jobs can be sorted on the fly, which can be useful for some use cases where it is important to always perform the job first which is most likely to lead to positive outcomes (e.g. in an optimisation context).
- Can I pass more than one parameter per job with the
- Yes. Separate items in lists per job with
--exe 'cp $1 $2' --parameters "a.txt#bckp/a.txt,b.txt#bckp/b.txt". If your
--exeparameter contains the $-index (e.g.
--exe 'echo $1' --parameters 'Hello!,Bye!'), then the parameter will replace it (i.e.
Hello!for one job and
Bye!for the second.
- How would I use
fjdon a computation cluster?
- I use
fjdto great effect on a PBS cluster (a scheduling system many computation clusters are running). The computation nodes on this system all have access to a shared home directory, so
fjdcan work well there. All I do is fill the job queue in my local directory and for each computation node I order (via PBS), I issue a
fjd-recruiter hire Xcommand, where
Xis the number of cores that the requested node has. For illustration, I have an example script, which I use to run >600K small jobs (In order to run a brute-force benchmark).
- How does
fjdwork, in a nutshell?
Small files in your home directory are used to indicate which jobs have to be done (these are created by you) and which workers are available (these are created automatically). Files are also used by
fjdto assign workers to jobs.
This simple file-based approach makes
fjdvery easy to use.
For CPUs from several machines to work on your job queue, we make one necessary assumption: We assume that there is a shared home directory for logged-in users, which all machines can access. This setting is very common now in universities and companies.
A little bit more detail about the
fjd-recruitercreates worker threads on one or more machines (a worker thread is a Unix screen session, which remains even if you log out). The
fjd-workerprocesses announce themselves in the
fjd-dispatcherfinds your jobs in the
jobqueuedirectory and pairs a job with an available worker. It then removes those entries from the
workerqueuedirectories and creates a new entry in
jobpods, where workers will pick up their assignments.
Then, the dispatcher calls your executable script and passes the file that describes the job to it as parameter on the shell. Your script simply has to read the job file and act accordingly.
All of these directories mentioned above exist in
~/.fjdand will of course be created if they do not yet exist.