Home

fstrozzi edited this page Mar 22, 2011 · 29 revisions
Clone this wiki locally

Bio::BWA


A Ruby binding to the Burrows-Wheeler Aligner (BWA) built using Ruby FFI.

Documentation can be found here http://fstrozzi.github.com/bioruby-bwa/

For more information on BWA check http://bio-bwa.sourceforge.net/

For more information on Ruby FFI check https://github.com/ffi/ffi

Introduction


Burrows-Wheeler Aligner (BWA) is an efficient program that aligns relatively short nucleotide sequences against a long reference sequence such as the human genome. It implements two algorithms, bwa-short and BWA-SW. The former works for query sequences shorter than 200bp and the latter for longer sequences up to around 100kbp. Both algorithms do gapped alignment. They are usually more accurate and faster on queries with low error rates. (from http://bio-bwa.sourceforge.net/)

This package allows using BWA functions directly from Ruby. BWA source code (v. 0.5.9) is compiled into a shared library that is accessed using the Ruby Foreign Function Interface (FFI).

The package was tested and it should properly work with Ruby 1.8.7, 1.9.1, 1.9.2 and JRuby 1.6.0 .

Notes on BWA functions parameters


The Ruby methods are bound directly to BWA functions and accept different parameters through a simple Hash. The Ruby methods work with all the standard parameters of the BWA functions. So, for example, if a BWA function runs with threads and uses

-t 4

to set the number of threads, the corresponding Ruby methods will accept

:t => 4

in the parameters list.

For parameters that do not accept a specific value in BWA, but work on a presence/absence logic, you MUST set them in this way:

:b => true

An exception to parameters name correspondence between BWA and Ruby binding is represented only by few parameters. In the "bwa aln" function (Bio::BWA.short_read_alignment in Ruby) the following parameters names have been changed:

          -0        use single-end reads only (effective with -b)    => in Ruby is :single

          -1        use the 1st read in a pair (effective with -b)   => in Ruby is :first

          -2        use the 2nd read in a pair (effective with -b)   => in Ruby is :second

Also, to specify parameters like the database prefix, input and output files Ruby methods use keywords like

:prefix 
:file_in 
:file_out 
:fastq
:sai 

depending on the method used.

These few changes were done to improve Ruby code readability. At the same time all the others BWA parameters names are exactly the same in the Ruby binding, so Ruby Bio::BWA methods can be called with the same parameters BWA users are already familiar with. To have a better idea of Ruby Bio::BWA methods and parameters, see examples below.

For the full list of BWA functions parameters please check http://bio-bwa.sourceforge.net/bwa.shtml

Examples


Indexing a sequence database

With BWA

bwa index -a bwtsw database.fasta

With Bio::BWA

Bio::BWA.make_index(:file_in => "database.fasta", :a => "bwtsw")

Indexing a sequence database in colorspace

With BWA

bwa index -p colorspace_db -c -a bwtsw database.fasta

With Bio::BWA

Bio::BWA.make_index(:file_in=>"database.fasta",
                    :prefix => "colospace_db",
                    :a => 'bwtsw',:c => true)

Running an alignment with short query sequences

With BWA

bwa aln database.fasta short_read.fastq > aln_sa.sai

With Bio::BWA

Bio::BWA.short_read_alignment(:prefix => "database.fasta", 
                              :file_in => "short_read.fastq", 
                              :file_out => "aln_sa.sai")

Running an alignment with long query sequences

With BWA

bwa bwasw database.fasta long_read.fastq > aln.sam

With Bio::BWA

Bio::BWA.long_read_alignment(:prefix => "database.fasta", 
                             :file_in => "long_read.fastq", 
                             :file_out => "aln.sam")

Running an alignment using threads and input in the Illumina 1.3+ FASTQ-like format

With BWA

bwa aln -t 10 -I database.fasta short_read.fastq > aln_sa.sai

With Bio::BWA

Bio::BWA.short_read_alignment(:prefix => "database.fasta", 
                              :file_in => "short_read.fastq", 
                              :file_out => "aln_sa.sai", 
                              :t => 10, :I => true)

Convert alignment output in SAM format (single end)

With BWA

bwa samse database.fasta aln_sa.sai short_read.fastq > aln.sam

With Bio::BWA

Bio::BWA.sai_to_sam_single(:prefix => "database.fasta", :sai => "aln_sa.sai", 
                           :fastq => "short_read.fastq", :file_out => "aln.sam")

Convert alignment output in SAM format (paired ends)

With BWA

bwa sampe database.fasta aln_sa1.sai aln_sa2.sai read1.fq read2.fq > aln.sam

With Bio::BWA

Bio::BWA.sai_to_sam_paired(:prefix => "database.fasta", 
                           :sai => ["aln_sa1.sai","aln_sa2.sai"], 
                           :fastq => ["read1.fq","read2.fq"], 
                           :file_out => "aln.sam")

Benchmarks


Real test run with

  • Illumina dataset of 2 Million reads from a human RNA-seq experiment downloaded from ArrayExpress database at EBI.

  • Human genome sequence downloaded from ftp.1000genomes.ebi.ac.uk

Tests were performed on a Linux server with an Intel(R) Xeon(R) CPU E5420 @ 2.50GHz with 8 cores and 32 Gb of RAM.

BWA

bwa aln -t 3 -f aln.sai human_v37.gz sample.fastq

Time

real    3m45.392s
user    10m59.970s
sys     0m2.990s

Bio::BWA

Bio::BWA.short_read_alignment(:prefix => "human_v37.gz",
                              :file_in => "sample.fastq",
                              :file_out => "aln-ruby.sai", 
                              :t => 3)

Time

real    3m45.344s
user    10m59.820s
sys     0m3.180s

Results comparison

The alignment output is exactly the same for BWA and Ruby binding, as expected

323faa19c6e3aa4ff77257d8ec346f58  aln.sai
323faa19c6e3aa4ff77257d8ec346f58  aln-ruby.sai

Ruby binding works with threads

The Ruby binding works nicely with threads, since they are implemented directly in the BWA functions.

In this screenshot you can see the benchmark scaling on 3 threads

ruby bwa