PCJ-blast is small piece of software which allows to run sequence alignment in parallel in highly scalable manner. PCJ-blast reads input sequence and compares it with the reference database using NCBI-BLAST. Due to the dynamic load balancing PCJ-blast is couple of times faster than solutions based on the static partitioning of input data and reference database. Moreover PCJ-blast can be run efficiently without partitioning reference database which significantly simplifies installation and usage. The PCJ-blast allows to run analysis on different hardware, starting from workstation, thorugh Hadoop clusters up to large supercomputers with thousands of cores. The observed speedup is almost linear which can reduce analysis time from weeks to single hours.
java <JVM_PARAMS> -jar PCJ-blast.jar <BLAST_PARAMS>
There are some parameters for PCJ-blast that can be used as JVM parameters (
nodes=<path>- path to nodes file with description of nodes (and threads) to use. It is necessary to have at least 2 lines in the file (first for dispatcher, next for processors). Default: nodes.txt
input=<path>- path to FASTA input file. Default: blast-test.fasta
output=<path>- path to output directory. Default: .
blast=<path>- path to BLAST executable file. Default: blastn
blastDb=<path>- path to BLAST database file. Default: nt. Can be overriden by BLAST -db parameter
hdfsConf=<path>[:<path>...]- paths for HDFS configurations (separated by path separator character, i.e. colon (:) for Linux). Default: none
sequenceCount=<int>- number of sequences in one block to submit to processors. Default: 1
blastThreads=<int>- number of BLAST threads. Default: 1. Can be overriden by BLAST -num_threads parameter
-outfmt parameter is not set, the PCJ-blast will process it using its output processor.
The usage should be acknowledged by reference to the papers:
- Marek Nowicki, Davit Bzhalava, and Piotr Bała. "Massively Parallel Implementation of Sequence Alignment with Basic Local Alignment Search Tool Using Parallel Computing in Java Library." Journal of Computational Biology (2018).
- Marek Nowicki, Davit Bzhalava, and Piotr Bała. "Massively Parallel Sequence Alignment with BLAST Through Work Distribution Implemented using PCJ Library." International Conference on Algorithms and Architectures for Parallel Processing. Springer, Cham, 2017, p. 503-512.