GitHub - matt-shenton/blasr: This is an unsupported fork of the PacBio blasr aligner. It contains my (very beta) optimizations and new functionality. It may disappear at any time.

##Installation##

###1. Requirements###

To build BLASR, you must have hdf 1.8.0 or above installed and configured with c++ support (you should have the library libhdf5_cpp.a). If you are intalling the entire PacBio secondary analysis software suite, appropriate hdf libraries are already distributed and no configuration is necessary. Otherwise, it is necessary to point two environment variables, HDF5INCLUDEDIR and HDF5LIBDIR to the locations of the HDF5 libraries.

For example:

export HDF5INCLUDEDIR=/usr/include/hdf
export HDF5LIBDIR=/usr/lib/hdf

###2. Build the source tree###

make

###3. The executable will be in alignment/bin/blasr###

cd alignment/bin/blasr

##Running BLASR##

Typing blasr -h or blasr -help on the command line will give you a list of options. At the least, provide a fasta, fastq, or bas.h5 file, and a genome.

Some typical use cases:

Align reads from reads.bas.h5 to ecoli_K12 genome, and output in SAM format.
```
blasr reads.bas.h5  ecoli_K12.fasta -sam
```

Same as above, but with soft clipping

blasr reads.bas.h5  ecoli_K12.fasta -sam -clipping soft

Create sam output that may be used to resolve structural variation using local assembly
```
blasr reads.bas.h5  ecoli_K12.fasta -sam -clipping subread -bestn 2 
```

Use multiple threads

blasr reads.bas.h5  ecoli_K12.fasta -sam -clipping soft -out alignments.sam -nproc 16

Include a larger minimal match, for faster but less sensitive alignments
```
blasr reads.bas.h5  ecoli_K12.fasta -sam -clipping soft -minMatch 15
```
Produce alignments in a pairwise human readable format
```
blasr reads.bas.h5  ecoli_K12.fasta -m 0
```

Use a precomputed suffix array for faster startup

sawriter hg19.fasta.sa hg19.fasta #First precompute the suffix array
blasr reads.bas.h5 hg19.fasta -sa hg19.fasta.sa

Map assembled contigs (multiple megabases) to a reference

blasr human.ctg.fasta  hg19.fasta -alignContigs -sam -out alignments.sam

Use a precomputed BWT-FM index for smaller runtime memory footprint, but slower alignments.

sa2bwt hg19.fasta hg19.fasta.sa hg19.fasta.bwt
blasr reads.bas.h5 hg19.fasta -bwt hg19.fasta.bwt

Output formats

The most universally compatible output is the SAM format, specified by ''-sam''. Other formats specified by the ''-m'' option conform to different applications, and as such the meanings of columns are not consistent between formats. Alignments reported on the reverse strand may be converted to the forward strand using forward_start = length - reverse_end, reverse_start = length - forward_start. All output except for SAM is half-open zero based.

```
   -m 0
```

A human readable version

```
   -m 1
```

1.	query name
2.	ref contig name
3.	query strand
4.	ref strand
5.	align score
6.	alignment percent identity
7.	ref align start
8.	ref align end
9.	ref length
10.	query align start
11.	query align end
12.	query length
13.	alignment space usage

Reverse strand alignments are reported starting at the 3' end of the reverse strand.

-m 2

XML based output. Reverse strand alignments are reported starting at the 3' end of the reverse strand.

-m 3

VULGAR alignment format from EXONERATE (deprecated)

-m 4

1.	query name
2.	ref contig name
3.	align score
4.	alignment percent identity
5.	query strand
6.	query align start
7.	query align end
8.	query length
9.	ref strand
10.	ref align start
11.	ref align end
12.	ref length
13.	alignment space usage
14.	mapping quality

Reverse strand alignments are reported starting at the 3' end of the reverse strand.

```
   -m 5
```

This alignment format contains the full representation of the pairwise alignment of the two sequences in a verbose (easily parsed) stick format.

1.	query name
2.	query length
3.	query align start
4.	query align end
5.	query strand
6.	ref name
7.	ref length
8.	ref align start
9.	ref align end
10.	ref strand
11.	score
12.	nMatch
13.	nMismatch
14.	nIns
15.	nDel
16.	mapping quality
17.	ref align string
18.	query align string
19.	stick string
20.	ref align string

For reverse strand alignments, the coordinates are reported starting at the 5' end of the forward strand.

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
alignment		alignment
assembly		assembly
bwtutils		bwtutils
cdna		cdna
common		common
gffutils		gffutils
gold_standard		gold_standard
jabon		jabon
pbihdfutils		pbihdfutils
samutils		samutils
sequtils		sequtils
simulator		simulator
studies		studies
testing		testing
utils		utils
.travis.yml		.travis.yml
BinifyMakedep.pl		BinifyMakedep.pl
LICENSE.txt		LICENSE.txt
Makefile		Makefile
MoveODependenciesToBin.pl		MoveODependenciesToBin.pl
README.md		README.md
README.txt		README.txt
common.mk		common.mk
make.rules		make.rules

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Output formats

About

Releases

Packages

Languages

License

matt-shenton/blasr

Folders and files

Latest commit

History

Repository files navigation

Output formats

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages