Ankusa is a Naive Bayes classifier in Ruby that uses Hadoop's HBase for storage. Because it uses HBase as a backend, the training corpus can be many terabytes in size.


First, install hbaserb:

git clone git://
cd hbaserb
gem build hbaserb.gemspec && gem install hbaserb

Then, install ankusa:

git clone git://
cd ankusa
gem build ankusa.gemspec && gem install ankusa

Basic Usage

require 'rubygems'
require 'ankusa'
require 'hbaserb'

# connect to HBase 
client = 'localhost'

c = client
c.train :spam, "This is some spammy text"
c.train :good, "This is not the bad stuff"

# This will return the most likely class (as symbol)
puts c.classify "This is some spammy text"

# This will return Hash with classes as keys and 
# membership probability as values
puts c.classifications "This is some spammy text"

# get a list of all classes
puts c.classes
