Because we just love British comedies.
Delete the following directory: rm $HOME/.bigjob/python/lib/python2./site-packages/BigJob--py2.X.egg/
To update a bigjob package execute:
easy_install -U bigjob
Currently, SAGA Context is supported. If SSH is used, different credentials can be configured in the ~/.ssh/config file (see man ssh_config). SAGA-Python (Bliss) currently does not support Globus.
Redis is the most stable and fast backend (requires Python >2.5) and the recommended way of using BigJob. Redis can easily be run in user space. It can be downloaded at: http://redis.io/download (just ~500 KB). Once you downloaded and compiled Redis, start a Redis server on the machine of your choice:
$ redis-server  13 Sep 10:11:28 # Warning: no config file specified, using the default config. In order to specify a config file use 'redis-server /path/to/redis.conf'  13 Sep 10:11:28 * Server started, Redis version 2.2.12  13 Sep 10:11:28 * The server is now ready to accept connections on port 6379  13 Sep 10:11:28 - 0 clients connected (0 slaves), 922160 bytes in use
Then set the COORDINATION_URL parameter (on top of most examples) in the example to the Redis endpoint of your Redis installation, e.g.
The coordination url is passed to the constructor of the
PilotComputeService respectively to the
pilot_compute_service = PilotComputeService(coordination_url="redis://<hostname>:6379")
It is recommend to setup a password for your Redis server. Otherwise, other users will be able to access and manipulate your data stored in the Redis server.
screen tool can / should be used to re-connect to a running BigJob session on a remote machine. For documentation on screen, please see Screen Manpage.
You should not just submit a BigJob from your local machine to a remote host and then close the terminal without the use of screen.
It is recommended to have SAGA-Python (Bliss) is installed on the resources running the pilot. BJ will work with SAGA-Python on the resource, but will not support file staging.
Please make sure that the resource has a suitable Python version installed. The following command should return a valid Python version (Python 2.7 in the optimal case):
$ ssh localhost "python -V" Python 2.7.2
Yes, there is SSH-based support for file stage-in.
The BigJob manager expects a URL to a SAGA Job Service as a parameter (lrms_url). The respective SAGA adaptor needs to be installed and working (please test the adaptor properly with SAGA before using BJ). Currently, BigJob works with the following SAGA Job adaptors:
SAGA/PBS: lrms_url = "pbs://localhost" SAGA/SSH: lrms_url = "ssh://oliver2.loni.org" SAGA/PSB+SSH: lrms_url = "pbs+ssh://oliver2.loni.org" SAGA/Globus: lrms_url = "gram://oliver1.loni.org/jobmanager-pbs" (only SAGA C++, deprecated)
Bliss (>0.2.3) is the best support SAGA version for BigJob. It is the default version!
Yes, but it is deprecated.
BigJob utilizes ssh for the execution of sub-jobs. Please ensure that your local SSH daemon is up and running and that you can login without password.
BigJob attempts to install itself, if it can't find a valid BJ installation on a resource (i.e. if import bigjob fails). By default BigJob search for
$HOME/.bigjob/python for a working BJ installation. Please, make sure that the correct Python is found in your default paths. If BJ attempts to install itself despite being already installed on a resource this can be a sign that the wrong Python is found.
BigJob utilizes a configuration file named
bigjob.conf located in the root of the BigJob installation (e.g.
# Logging config # logging.DEBUG, logging.INFO, logging.WARNING, logging.ERROR, logging.CRITICAL logging.level=logging.INFO
Alternatively you can set the logging level in the code:
import logging from bigjob import logger logger.setLevel(logging.FATAL)
or via the environment variable
BIGJOB_VERBOSE. For example, for full debug log output use:
BigJob logger can be obtained and can further handlers can be added to the logger object. For example with the below set of instructions, the application and BigJob debug messages are written to namd_bigwork.log file
logger = logging.getLogger('bigjob') fh = logging.FileHandler('namd_bigwork.log',mode='w') fh.setLevel(logging.DEBUG) logger.addHandler(fh) logger.debug("Logging to namd_bigwork.log at DEBUG level")
BigJob expands the tokens
~ in the Compute Unit Description in the attribute
working_directory with the home directory of the respective resource.
Yes, if your BigJob manager (or application) terminates before all ComputeUnits terminate, you can reconnect to a running pilot by
pilot_url to the
PilotCompute constructor. For example:
pilot = PilotCompute(pilot_url="redis://localhost:6379/bigjob:bj-a7bfae68-25a0-11e2-bd6c-705681b3df0f:localhost")
By default, BJ creates a directory structure relative to the BJ working directory specified in
For each sub-job a own directory is created. Subjobs can be executed in any directory by setting the working directory to the desired directory in the sub-job description:
jd.working_directory = "<your directory of choice>"
Yes, it works. However, there are limitations: Kraken requires the user to use aprun to launch jobs. Aprun can only be called once per batch job - BJ compute unit launch mechanism which spawns 1 process per compute unit is not compatible with aprun.
You can however execute a single compute unit concurrently by setting the
NUMBER_SUBJOBS variable to:
Very likely, the SAGA C++ adaptor is not correctly configured. If PBSPro adaptor is used export
PBS_HOME or if Torque adaptors are used export
TORQUE_HOME environment variable to the corresponding scheduling installation location. For example:
$ which qsub /usr/local/bin/qsub $ export PBS_HOME=/usr/local
Yes. It is possible. The pre-requisites are 1. SAGA 2. Globus client tools 3. Add cluster(LONI) CA to list of trusted certificates authority. The below three steps are used to setup LONI certificates and for more information please use https://docs.loni.org/wiki/LONI_Certificates.
cd $HOME/.globus/certificates wget https://docs.loni.org/mediawiki-1.13.3-docsloni/images/9/9c/a3bf9f3c.0 --no-check-certificate wget https://docs.loni.org/mediawiki-1.13.3-docsloni/images/d/d9/a3bf9f3c.signing_policy --no-check-certificate
Last edited by melrom,