[[#]] bio-cd-hit-report
Clustering sequences with CD-HIT produces a cluster file(.clstr) containing sequence names and their respective clusters. This plugin provides methods for parsing this file.
Note: this plugin is under active development!
gem install bio-cd-hit-report
require 'bio-cd-hit-report'
cluster_file = "cluster95.clstr"
report = Bio::CdHitReport.new(cluster_file)
#print total number of clusters in the report
puts report.total_clusters
#print the cluster members for cluster with id 1
puts report.get_cluster(1)
#information for each cluster
report.each_cluster do |c|
puts c.name #print the full cluster name
puts c.members #print respective sequence names in the cluster
puts c.cluster_id #print the cluster id only
puts c.size #print the total number of entries in the cluster
puts c.rep_seq #print the name of the representative sequence in this cluster
end
Information on the source tree, documentation, examples, issues and how to contribute, see
http://github.com/georgeG/bioruby-cd-hit-report
The BioRuby community is on IRC server: irc.freenode.org, channel: #bioruby.
If you use this software, please cite one of
- BioRuby: bioinformatics software for the Ruby programming language
- Biogem: an effective tool-based approach for scaling up open source software development in bioinformatics
This Biogem is published at #bio-cd-hit-report
Copyright (c) 2013 George Githinji. See LICENSE.txt for further details.