Permalink
Browse files

README

  • Loading branch information...
1 parent 7eb94a1 commit 5ca172a3d5952fc9b8fced4ccbb8d2c1138d28c2 @pjotrp committed May 3, 2013
Showing with 85 additions and 51 deletions.
  1. +77 −50 README.md
  2. +8 −1 bin/fasta_filter.rb
View
127 README.md
@@ -25,7 +25,70 @@ instead.
ORFs between START_STOP or STOP_STOP codons.
* BigBio has a Phylip (PAML style) emitter and writer
-# Examples
+# Installation
+
+The easy way
+
+```sh
+gem install bio-bigbio
+```
+
+in your code
+
+```ruby
+require 'bigbio'
+```
+
+# Command line tools
+
+Some functionality comes also as executable command line tools (see the
+./bin directory). Use the -h switch to get information. Current tools
+are
+
+1. getorf: fetch all areas between start-stop and stop-stop codons in six frames (using EMBOSS when biolib is available)
+2. nt2aa.rb: translate in six frames (using EMBOSS when biolib is available)
+3. fasta_filter.rb
+
+## Command line Fasta Filter
+
+The CLI filter accepts standard Ruby commands.
+
+Filter sequences that contain more than 25% C's
+
+```sh
+fasta_filter.rb --filter "rec.seq.count('C') > rec.seq.size*0.25" test/data/fasta/nt.fa
+```
+
+Look for IDs containing -126 and sequences ending on CCC
+
+```sh
+fasta_filter.rb --filter "rec.id =~ /-126/ or rec.seq =~ /CCC$/" test/data/fasta/nt.fa
+```
+
+Filter out all masked sequences that contain more than 10% masked
+nucleotides
+
+```sh
+fasta_filter.rb --filter "rec.seq.count('N')<rec.seq.size*0.10"
+```
+
+Next to rec.id and rec.seq, you have rec.descr and 'num' as variables,
+so to skip every other record
+
+```sh
+fasta_filter.rb --filter "num % 2 == 0"
+```
+
+Rewrite all sequences to lower case, you can use the useful rewrite
+option
+
+```sh
+fasta_filter.rb --rewrite 'rec.seq = rec.seq.downcase'
+```
+
+Filters and rewrites can be combined. The rest is up to your imagination!
+
+# API Examples
## Iterate through a FASTA file
@@ -146,63 +209,27 @@ translate = Nucleotide::Translate.new(trn_table)
aa_frames = translate.aa_6_frames("ATCATTAGCAACACCAGCTTCCTCTCTCTCGCTTCAAAGTTCACTACTCGTGGATCTCGT")
```
-# Command line tools
-
-Some functionality comes also as executable command line tools (see the
-./bin directory). Use the -h switch to get information. Current tools
-are
-
-1. getorf: fetch all areas between start-stop and stop-stop codons in six frames (using EMBOSS when biolib is available)
-2. nt2aa.rb: translate in six frames (using EMBOSS when biolib is available)
-3. fasta_filter.rb
-
-## Command line Fasta Filter
-
-The CLI filter accepts standard Ruby commands.
-
-Filter sequences that contain more than 25% C's
-
-```sh
-fasta_filter.rb --filter "rec.seq.count('C') > rec.seq.size*0.25" test/data/fasta/nt.fa
-```
-
-Look for IDs containing -126 and sequences ending on CCC
-
-```sh
-fasta_filter.rb --filter "rec.id =~ /-126/ or rec.seq =~ /CCC$/" test/data/fasta/nt.fa
-```
-
-Filter out all masked sequences that contain more than 10% masked
-nucleotides
-
-```sh
-fasta_filter.rb --filter "rec.seq.count('N')<rec.seq.size*0.10"
-```
-
-Next to rec.id and rec.seq, you have rec.descr and 'num' as variables,
-so to skip every other record
+# Project home page
-```sh
-fasta_filter.rb --filter "num % 2 == 0"
-```
+Information on the source tree, documentation, examples, issues and
+how to contribute, see
-The rest is up to your imagination!
+ http://github.com/pjotrp/bigbio
-# Install
+The BioRuby community is on IRC server: irc.freenode.org, channel: #bioruby.
-The easy way
+# Cite
-```sh
-gem install bio-bigbio
-```
+If you use this software, please cite one of
+
+* [BioRuby: bioinformatics software for the Ruby programming language](http://dx.doi.org/10.1093/bioinformatics/btq475)
+* [Biogem: an effective tool-based approach for scaling up open source software development in bioinformatics](http://dx.doi.org/10.1093/bioinformatics/bts080)
-in your code
+# Biogems.info
-```ruby
-require 'bigbio'
-```
+This Biogem is published at [#bio-table](http://biogems.info/index.html)
# Copyright
-Copyright (c) 2011-2012 Pjotr Prins. See LICENSE for further details.
+Copyright (c) 2011-2013 Pjotr Prins. See LICENSE for further details.
View
@@ -30,6 +30,10 @@ def self.parse(args)
options.filter = expr
end
+ opts.on("--rewrite expression","Rewrite expression") do |expr|
+ options.rewrite = expr
+ end
+
opts.on("--codonize",
"Trim sequence to be at multiple of 3 nucleotides") do |b|
options.codonize = b
@@ -55,6 +59,7 @@ def self.parse(args)
opts.separator " fasta_filter.rb --filter \"rec.seq.count('C') > rec.seq.size*0.25\" test/data/fasta/nt.fa"
opts.separator " fasta_filter.rb --filter \"rec.descr =~ /C. elegans/\" test/data/fasta/nt.fa"
opts.separator " fasta_filter.rb --filter \"num % 2 == 0\" test/data/fasta/nt.fa"
+ opts.separator " fasta_filter.rb test/data/fasta/nt.fa --rewrite 'rec.seq.downcase!'"
opts.separator ""
opts.separator "Other options:"
opts.separator ""
@@ -87,7 +92,9 @@ def self.parse(args)
next if options.min and rec.seq.size < options.min
# --- Truncate description to ID
rec.descr = rec.id if options.id
-
+
+ # --- rewrite
+ eval(options.rewrite) if options.rewrite
print rec.to_fasta
}

0 comments on commit 5ca172a

Please sign in to comment.