Skip to content

Command line tool for DNA short read alignment using the Burrows-Wheeler transform.

Notifications You must be signed in to change notification settings

k-time/bwt-aligner

Repository files navigation

bwt-aligner

Sequence alignment: https://en.wikibooks.org/wiki/Next_Generation_Sequencing_(NGS)/Alignment
BWT for sequence alignment: http://bioinformatics.oxfordjournals.org/content/25/14/1754.long

Files

We have included reference genome files and read files containing a random sample of 100 aligned reads for 3 different viruses. These genomes and reads are real data taken from the NCBI’s Sequence Read Archive.

The program files are described as follows:

  • bwt.py: Implementation of BWT and exact search.

  • align_reads.py: Takes a reference genome file and aligned reads file (and optional threshold level) as input. Uses our aligner to predict the positions of reads, and then compares them to the actual positions.
    Usage: python align_reads.py <genome file name> <read file name> [-t <threshold level>]

  • search_bwt.py: Implementation of inexact search algorithm. Can map a single read.
    Usage: python search_bwt.py [--no-indels] <reference_file> <read_file>

Recommended Usages

Here are our recommended usages to analyze our included data (with no -t flag, a default threshold of 3 is used):

  • python align_reads.py data/ebola_genome.fasta data/ebola_reads.fasta
  • python align_reads.py data/coronavirus_genome.fasta data/coronavirus_reads.fasta
  • python align_reads.py data/rsv_genome.fasta data/rsv_reads.fasta -t 7
  • python search_bwt.py test (test example of read mapping)

Other Details

Entire read files tend to be rather large; they can be found by going to http://www.ncbi.nlm.nih.gov/sra, searching for an organism/virus, filtering by “DNA” and “aligned data”, and clicking on a result and downloading the aligned reads from the run. You can also can download the SRA Toolkit to download results through a command line interface.

Implemented using Python 2.7. We ran our program on a MacBook; a more powerful machine will obviously speed up our results.

About

Command line tool for DNA short read alignment using the Burrows-Wheeler transform.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages