Skip to content
SAPLING: Suffix Array Piecewise Linear INdex for Genomics
C++ C Python Java Shell Makefile
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Complete-Striped-Smith-Waterman-Library @ cabdab1
NN
eval
src
suffixarray
.gitignore
.gitmodules
LICENSE
README.md

README.md

SAPLING: Suffix Array Piecewise Linear INdex for Genomics

A method for achieving faster suffix array queries compared to binary search.

Building:

cd src
make

Pre-building the Suffix Array:

suffixarray/refToSuffixArray.sh <genome file (Fasta format)>

Running SAPLING test:

src/sapling_example <genome file (Fasta format)> 
  [saFn=<suffix array file (format described in manual, default <genome file>.sa)>] 
  [sapFn=<sapling file (format described in manual, default <genome file>.sap>] 
  [nb=<log number of buckets - overrides maxMem below, default None>] 
  [maxMem=<max number of buckets will be (genome size)/val, default 10>] 
  [k=<k, default 21>] 
  [nq=<number of queries, default 5000000>] 
  [errFn=<errors file if outputting them, default None>]

Running aligner:

src/align <query (Fastq format)> <ref (Fasta format)> <outfile (Sam format)> 
  [num_seeds=<number of seeds to use for exact matching, default 7>]
  [sapling_k=<size of k-mers to use when building Sapling, default 16>]
  [flanking_sequence=<amount of padding to include when aligning region around seeds, default 2>]
  [max_hits=<maximum number of matches to try to extend per seed, default 32>]

Running binary search:

src/binarysearch <genome> <suffix array file>
    genome is a fasta file containing the genome to query
    suffix array file is where the suffix array will be written to

Sapling Manual

Information about the files in this repository can be found in the manual.

You can’t perform that action at this time.