GFF3 plugin for BioRuby, aimed at parsing big data
# Take GFF (genome browser) information and digest mRNA and CDS sequences # Options for low memory use and caching of records # Support for external FASTA files
You can use this plugin in two ways. First as a standalone program, next as a plugin library to BioRuby.
For example, fetch mRNA and CDS information from GFF3 files and output to FASTA:
./bin/gff3-fetch mrna test/data/gff/test.gff3 ./bin/gff3-fetch cds test/data/gff/test.gff3
Or clone this repository and add the 'lib' dir to the Ruby search path and
You can also run RSpec with something like
rspec -I ../bioruby/lib/ spec/*.rb
This implementation depends on BioRuby's basic GFF3 parser, with the possible advantage that the plugin is faster and does not consume all memory. The Gff3 specs are based on the output of the Wormbase genome browser.
For a write-up see thebird.nl/bioruby/BioRuby_GFF3.html
Fetch and assemble mRNAs, or CDS and print in FASTA format. gff3-fetch [--no-cache] mRNA|CDS [filename.fa] filename.gff Where: --no-cache : do not load everything in memory (slower) mRNA : assemble mRNA CDS : assemble CDS Multiple GFF3 files can be used. For external FASTA files, always the last one before the GFF file is used. Examples: Find mRNA and CDS information from test.gff3 (which includes sequence information) gff3-fetch mRNA test/data/gff/test.gff3 gff3-fetch CDS test/data/gff/test.gff3 Find CDS from external FASTA file gff3-fetch CDS test/data/gff/MhA1_Contig1133.fa test/data/gff/MhA1_Contig1133.gff3 Find mRNA from external FASTA file, without loading everything in RAM gff3-fetch --no-cache mRNA test/data/gff/test-ext-fasta.fa test/data/gff/test-ext-fasta.gff3 If you use this software, please cite http://dx.doi.org/10.1093/bioinformatics/btq475
Copyright (C) 2010,2011 Pjotr Prins <email@example.com>