Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Big data bioinformatics repository
Ruby
tree: 2af026adf3

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
bin
doc
lib
spec
test
.gitignore
.travis.yml
Gemfile
Gemfile.lock
LICENSE
README.md
Rakefile
VERSION
bio-bigbio.gemspec

README.md

BIGBIO

BigBio = BIG DATA for Ruby

BigBio is an initiative to a create high performance libraries for big data computing in biology.

BigBio may use BioLib C/C++/D functions for increasing performance and reducing memory consumption.

This is an experimental project. If you wish to contribute subscribe to the BioRuby and/or BioLib mailing lists.

Overview

  • BigBio can translate nucleotide sequences to amino acid sequences using an EMBOSS C function, or BioRuby's translator.
  • BigBio has an ORF emitter which parses DNA/RNA sequences and emits ORFs between START_STOP or STOP_STOP codons.
  • BigBio has a FASTA file emitter, with iterates FASTA files and iterates sequences without loading everything in memory.

Warning: this software is experimental. Chech the issue list first.

Examples

Iterate through a FASTA file

Read a file without loading the whole thing in memory

fasta = FastaReader.new(fn)
fasta.each do | rec |
  print rec.descr,rec.seq
end

Write a FASTA file. The simple way

fasta = FastaWriter.new(fn)
fasta.write('Test',"agtcta")

Any object can be passed in, however, as long as it responds to 'descr' and 'seq.to_s', or 'id' and 'seq.to_s'. E.g.

class StorageObject
  attr_accessor :descr, :seq
end

mysequence = StorageObject.new
mysequence.descr = 'Test'
mysequence.seq = "agtcta"

write the FASTA file

fasta = FastaWriter.new(fn)
fasta.write(mysequence)

Fetch ORFs from a sequence

BigBio can parse a sequence for ORFs. Together with the FastaReader little memory gets used

predictorf = PredictORF.new(id,descr,"ATCATTAGCAACACCAGCTTCCTCTCTCTCGCTTCAAAGTTCACTACTCGTGGATCTCGT")
# get all ORFs between start and stop codons, longer than 30 bps
orfs = predictorf.startstop(30)
# get all sequences between stop codons
seqs = predictorf.stopstop(0)

Rapid DNA/RNA to amino acid translation

Translate with EMBOSS C library, if linked, otherwise use BioRuby

trn_table = Bio::Big::TranslationAdapter.translation_table(1)
translate = Nucleotide::Translate.new(trn_table)
aa_frames = translate.aa_6_frames("ATCATTAGCAACACCAGCTTCCTCTCTCTCGCTTCAAAGTTCACTACTCGTGGATCTCGT")

Command line tools

Some functionality comes also as executable command line tools (see the ./bin directory). Use the -h switch to get information. Current tools are

  1. getorf: fetch all areas between start-stop and stop-stop codons in six frames (using EMBOSS when biolib is available)
  2. nt2aa.rb: translate in six frames (using EMBOSS when biolib is available)

Install

The easy way

gem install bio-bigbio

in your code

require 'bigbio'

Copyright

Copyright (c) 2011-2012 Pjotr Prins. See LICENSE for further details.

Something went wrong with that request. Please try again.