So you want to parse a fasta file...
Add this line to your application's Gemfile:
And then execute:
Or install it yourself as:
$ gem install parse_fasta
ParseFasta doesn't work with JRuby for now
Provides nice, programmatic access to fasta and fastq files. It's faster and more lightweight than BioRuby. And more fun!
It takes care of a lot of whacky edge cases like parsing multi-blob gzipped files, and being strict on formatting by default.
Checkout parse_fasta docs for the full api documentation.
Here are some examples of using ParseFasta. Don't forget to
require "parse_fasta" at the top of your program!
Print header and length of each record.
ParseFasta::SeqFile.open(ARGV).each_record do |rec| puts [rec.header, rec.seq.length].join "\t" end
You can parse fastQ files in exatcly the same way.
ParseFasta::SeqFile.open(ARGV).each_record do |rec| printf "Header: %s, Sequence: %s, Description: %s, Quality: %s\n", rec.header, rec.seq, rec.desc, rec.qual end
Record#qual will be
nil if the file you are parsing is a fastA file.
ParseFasta::SeqFile.open(ARGV).each_record do |rec| if rec.qual # it's a fastQ record else # it's a fastA record end end
You can also check this with
ParseFasta::SeqFile.open(ARGV).each_record do |rec| if rec.fastq? # it's a fastQ record else # it's a fastA record end end
And there is a nice
#to_s method, that does what it should whether the record is fastA or fastQ like. Check out the docs for info on the fancy
ParseFasta::SeqFile.open(ARGV).each_record do |rec| puts rec.to_s end
But of course, since it is a
#to_s override...you don't even have to call it directly!
ParseFasta::SeqFile.open(ARGV).each_record do |rec| puts rec end
Sometimes your fasta file might have record separators (
>) withen the "sequence". For example, CD-HIT's
.clstr files have headers within what would be the sequence part of the record.
ParseFasta is really strict about formatting and will raise an error when trying to read these types of files. If you would like to parse them, use the
check_fasta_seq: false flag like so:
ParseFasta::SeqFile.open(ARGV, check_fasta_seq: false).each_record do |rec| puts rec end