No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
gradle/wrapper
script
src/main
.gitignore
README.md
build.gradle
gradlew
gradlew.bat
jaxb.gradle
settings.gradle

README.md

PCJ-blast

PCJ-blast is small piece of software which allows to run sequence alignment in parallel in highly scalable manner. PCJ-blast reads input sequence and compares it with the reference database using NCBI-BLAST. Due to the dynamic load balancing PCJ-blast is couple of times faster than solutions based on the static partitioning of input data and reference database. Moreover PCJ-blast can be run efficiently without partitioning reference database which significantly simplifies installation and usage. The PCJ-blast allows to run analysis on different hardware, starting from workstation, thorugh Hadoop clusters up to large supercomputers with thousands of cores. The observed speedup is almost linear which can reduce analysis time from weeks to single hours.

PCJ-blast requires NCBI-BLAST installed and PCJ library. To obtain the library visit PCJ Homepage or GitHub repository. The NCBI-BLAST can be obtained form NCBI repository.

Usage

java <JVM_PARAMS> -jar PCJ-blast.jar <BLAST_PARAMS>

There are some parameters for PCJ-blast that can be used as JVM parameters (-D<parameter>=<value>):

  • nodes=<path> - path to nodes file with description of nodes (and threads) to use. It is necessary to have at least 2 lines in the file (first for dispatcher, next for processors). Default: nodes.txt
  • input=<path> - path to FASTA input file. Default: blast-test.fasta
  • output=<path> - path to output directory. Default: .
  • blast=<path> - path to BLAST executable file. Default: blastn
  • blastDb=<path> - path to BLAST database file. Default: nt. Can be overriden by BLAST -db parameter
  • hdfsConf=<path>[:<path>...] - paths for HDFS configurations (separated by path separator character, i.e. colon (:) for Linux). Default: none
  • sequenceCount=<int> - number of sequences in one block to submit to processors. Default: 1
  • blastThreads=<int> - number of BLAST threads. Default: 1. Can be overriden by BLAST -num_threads parameter

If BLAST -outfmt parameter is not set, the PCJ-blast will process it using its output processor.

Reference

The usage should be acknowledged by reference to the papers: