Permalink
Browse files

FASTA: allow multiple columns from description

  • Loading branch information...
1 parent b6af0e7 commit 665d40cb4991e8a8530511e54b60a158245d7534 @pjotrp committed Oct 17, 2013
Showing with 11 additions and 7 deletions.
  1. +6 −3 README.md
  2. +5 −4 lib/bio-table/parsers/fastareader.rb
View
@@ -211,7 +211,7 @@ gem
gem install statsample
```
-(statsample is not loaded by default, as it has a host of
+(statsample is not loaded by default because it has a host of
dependencies)
Thereafter, to calculate the stats for columns 1 and 2 (rowname is column 0)
@@ -325,9 +325,12 @@ a flexible regular expression to fetch the IDs
bio-table --fasta '^(\S+)' test/data/input/aa.fa
```
-notice the parentheses.
+notice the parentheses - these capture the ID and create the first
+column. If two captures are defined another column gets added. Try
-(more soon)
+```sh
+ bio-table --fasta '^(\S+).*?(\d+) aa' test/data/input/aa.fa
+```
### Using STDIN
@@ -22,6 +22,7 @@ def initialize fn, regex = nil
@fread_once = false
@regex = regex
@regex = '^(\S+)' if @regex == nil
+ @regex = '('+regex+')' if regex !~ /\(/
@logger.info "Parsing FASTA with ID regex '"+@regex+"'"
end
@@ -105,10 +106,10 @@ def get_by_index idx
def digest_tag tag
if tag =~ /^>/
descr = $'.strip
- if descr =~ /#{@regex}/
- id = $1
- # p [descr,id]
- return id, descr
+ matches = /#{@regex}/.match(descr).captures
+ if matches.size > 0
+ # p matches
+ return matches.join("\t"), descr
end
p descr # do not remove these
p @regex

0 comments on commit 665d40c

Please sign in to comment.