Skip to content
Browse files

Fasta

  • Loading branch information...
1 parent 2af026a commit 5926f383fda1dce9508f469492510c9db8bca9ab @pjotrp committed Nov 6, 2012
Showing with 44 additions and 0 deletions.
  1. +20 −0 README.md
  2. +24 −0 bin/fasta_sort.rb
View
20 README.md
@@ -29,12 +29,32 @@ Warning: this software is experimental. Chech the issue list first.
Read a file without loading the whole thing in memory
```ruby
+require 'bigbio'
+
fasta = FastaReader.new(fn)
fasta.each do | rec |
print rec.descr,rec.seq
end
```
+Since FastaReader parses the ID, write a tab file with id and sequence
+
+```ruby
+i = 1
+print "num\tid\tseq\n"
+FastaReader.new(fn).each do | rec |
+ if rec.id =~ /(AT\w+)/
+ print i,"\t",$1,"\t",rec.seq,"\n"
+ i += 1
+ end
+end
+```
+
+wich, for example, can be turned into RDF with the
+[bio-table](https://github.com/pjotrp/bioruby-table) biogem.
+
+## Write a FASTA file
+
Write a FASTA file. The simple way
```ruby
View
24 bin/fasta_sort.rb
@@ -0,0 +1,24 @@
+#!/usr/bin/env ruby
+#
+# fasta_sort: Sorts a FASTA file and outputs sorted unique records as FASTA again
+#
+# Usage:
+#
+# fasta_sort inputfile(s)
+
+require 'bio'
+
+include Bio
+
+table = Hash.new
+ARGV.each do | fn |
+ Bio::FlatFile.auto(fn).each do | seq |
+ table[seq.definition] ||= seq.data
+ end
+end
+
+table.sort.each do | definition, data |
+ rec = Bio::FastaFormat.new('> '+definition.strip+"\n"+data)
+ print rec
+end
+

0 comments on commit 5926f38

Please sign in to comment.
Something went wrong with that request. Please try again.