Clone this wiki locally
A Ruby binding to the Burrows-Wheeler Aligner (BWA) built using Ruby FFI.
Documentation can be found here http://fstrozzi.github.com/bioruby-bwa/
For more information on BWA check http://bio-bwa.sourceforge.net/
For more information on Ruby FFI check https://github.com/ffi/ffi
Burrows-Wheeler Aligner (BWA) is an efficient program that aligns relatively short nucleotide sequences against a long reference sequence such as the human genome. It implements two algorithms, bwa-short and BWA-SW. The former works for query sequences shorter than 200bp and the latter for longer sequences up to around 100kbp. Both algorithms do gapped alignment. They are usually more accurate and faster on queries with low error rates. (from http://bio-bwa.sourceforge.net/)
This package allows using BWA functions directly from Ruby. BWA source code (v. 0.5.9) is compiled into a shared library that is accessed using the Ruby Foreign Function Interface (FFI).
The package was tested and it should properly work with Ruby 1.8.7, 1.9.1, 1.9.2 and JRuby 1.6.0 .
Notes on BWA functions parameters
The Ruby methods are bound directly to BWA functions and accept different parameters through a simple Hash. The Ruby methods work with all the standard parameters of the BWA functions. So, for example, if a BWA function runs with threads and uses
to set the number of threads, the corresponding Ruby methods will accept
:t => 4
in the parameters list.
For parameters that do not accept a specific value in BWA, but work on a presence/absence logic, you MUST set them in this way:
:b => true
An exception to parameters name correspondence between BWA and Ruby binding is represented only by few parameters. In the "bwa aln" function (Bio::BWA.short_read_alignment in Ruby) the following parameters names have been changed:
-0 use single-end reads only (effective with -b) => in Ruby is :single -1 use the 1st read in a pair (effective with -b) => in Ruby is :first -2 use the 2nd read in a pair (effective with -b) => in Ruby is :second
Also, to specify parameters like the database prefix, input and output files Ruby methods use keywords like
:prefix :file_in :file_out :fastq :sai
depending on the method used.
These few changes were done to improve Ruby code readability. At the same time all the others BWA parameters names are exactly the same in the Ruby binding, so Ruby Bio::BWA methods can be called with the same parameters BWA users are already familiar with. To have a better idea of Ruby Bio::BWA methods and parameters, see examples below.
For the full list of BWA functions parameters please check http://bio-bwa.sourceforge.net/bwa.shtml
Indexing a sequence database
bwa index -a bwtsw database.fasta
Bio::BWA.make_index(:file_in => "database.fasta", :a => "bwtsw")
Indexing a sequence database in colorspace
bwa index -p colorspace_db -c -a bwtsw database.fasta
Bio::BWA.make_index(:file_in=>"database.fasta", :prefix => "colospace_db", :a => 'bwtsw',:c => true)
Running an alignment with short query sequences
bwa aln database.fasta short_read.fastq > aln_sa.sai
Bio::BWA.short_read_alignment(:prefix => "database.fasta", :file_in => "short_read.fastq", :file_out => "aln_sa.sai")
Running an alignment with long query sequences
bwa bwasw database.fasta long_read.fastq > aln.sam
Bio::BWA.long_read_alignment(:prefix => "database.fasta", :file_in => "long_read.fastq", :file_out => "aln.sam")
Running an alignment using threads and input in the Illumina 1.3+ FASTQ-like format
bwa aln -t 10 -I database.fasta short_read.fastq > aln_sa.sai
Bio::BWA.short_read_alignment(:prefix => "database.fasta", :file_in => "short_read.fastq", :file_out => "aln_sa.sai", :t => 10, :I => true)
Convert alignment output in SAM format (single end)
bwa samse database.fasta aln_sa.sai short_read.fastq > aln.sam
Bio::BWA.sai_to_sam_single(:prefix => "database.fasta", :sai => "aln_sa.sai", :fastq => "short_read.fastq", :file_out => "aln.sam")
Convert alignment output in SAM format (paired ends)
bwa sampe database.fasta aln_sa1.sai aln_sa2.sai read1.fq read2.fq > aln.sam
Bio::BWA.sai_to_sam_paired(:prefix => "database.fasta", :sai => ["aln_sa1.sai","aln_sa2.sai"], :fastq => ["read1.fq","read2.fq"], :file_out => "aln.sam")
Real test run with
Illumina dataset of 2 Million reads from a human RNA-seq experiment downloaded from ArrayExpress database at EBI.
Human genome sequence downloaded from ftp.1000genomes.ebi.ac.uk
Tests were performed on a Linux server with an Intel(R) Xeon(R) CPU E5420 @ 2.50GHz with 8 cores and 32 Gb of RAM.
bwa aln -t 3 -f aln.sai human_v37.gz sample.fastq
real 3m45.392s user 10m59.970s sys 0m2.990s
Bio::BWA.short_read_alignment(:prefix => "human_v37.gz", :file_in => "sample.fastq", :file_out => "aln-ruby.sai", :t => 3)
real 3m45.344s user 10m59.820s sys 0m3.180s
The alignment output is exactly the same for BWA and Ruby binding, as expected
323faa19c6e3aa4ff77257d8ec346f58 aln.sai 323faa19c6e3aa4ff77257d8ec346f58 aln-ruby.sai
Ruby binding works with threads
The Ruby binding works nicely with threads, since they are implemented directly in the BWA functions.
In this screenshot you can see the benchmark scaling on 3 threads