Clustering sequences with CD-HIT produces a cluster file(.clstr) containing sequence names and their respective clusters. This plugin provides methods for parsing this file.
Note: this plugin is under active development!
gem install bio-cd-hit-report
require 'bio-cd-hit-report' cluster_file = "cluster95.clstr" report = Bio::CdHitReport.new(cluster_file) #print total number of clusters in the report puts report.total_clusters #print the cluster members for cluster with id 1 puts report.get_cluster(1) #information for each cluster report.each_cluster do |c| puts c.name #print the full cluster name puts c.members #print respective sequence names in the cluster puts c.cluster_id #print the cluster id only puts c.size #print the total number of entries in the cluster puts c.rep_seq #print the name of the representative sequence in this cluster end
Project home page
Information on the source tree, documentation, examples, issues and how to contribute, see
The BioRuby community is on IRC server: irc.freenode.org, channel: #bioruby.
If you use this software, please cite one of
- BioRuby: bioinformatics software for the Ruby programming language
- Biogem: an effective tool-based approach for scaling up open source software development in bioinformatics
This Biogem is published at #bio-cd-hit-report
Copyright (c) 2013 George Githinji. See LICENSE.txt for further details.