Skip to content

Commit

Permalink
added initial structure / API
Browse files Browse the repository at this point in the history
  • Loading branch information
Brian Muller committed Nov 29, 2010
1 parent 4e7d2c4 commit 24ba779
Show file tree
Hide file tree
Showing 5 changed files with 77 additions and 0 deletions.
Empty file removed README
Empty file.
33 changes: 33 additions & 0 deletions README.rdoc
@@ -0,0 +1,33 @@
= ankusa

Ankusa is a Naive Bayes classifier in Ruby that uses Hadoop's HBase for storage. Because it uses HBase as a backend, the training corpus can be many terabytes in size.

== Installation
First, install hbaserb:
git clone git://github.com/bmuller/hbaserb.git
cd hbaserb
gem build hbaserb.gemspec && gem install hbaserb

Then, install ankusa:
git clone git://github.com/livingsocial/ankusa.git
cd ankusa
gem build ankusa.gemspec && gem install ankusa

== Basic Usage
require 'rubygems'
require 'ankusa'
require 'hbaserb'

# connect to HBase
client = HBaseRb::Client.new 'localhost'

c = Classifier.new client
c.train :spam, "This is some spammy text"
c.train :good, "This is not the bad stuff"

# This will return the most likely class (as symbol)
puts c.classify "This is some spammy text"

# This will return Hash with classes as keys and
# membership probability as values
puts c.classes "This is some spammy text"
18 changes: 18 additions & 0 deletions ankusa.gemspec
@@ -0,0 +1,18 @@
Gem::Specification.new do |s|
s.name = "ankusa"
s.version = "0.0.1"
s.authors = ["Brian Muller"]
s.date = %q{2010-11-29}
s.description = "Naive Bayes classifier with HBase storage"
s.summary = "Naive Bayes classifier in Ruby that uses Hadoop's HBase for storage"
s.email = "brian.muller@livingsocial.com"
s.files = [
"lib/ankusa.rb",
"lib/ankusa/classifier.rb",
]
s.homepage = "https://github.com/livingsocial/ankusa"
s.require_paths = ["lib"]
s.rubygems_version = "1.3.5"
s.add_dependency('hbaserb', '>= 0.0.1')
s.add_dependency('stemmer', '>= 1.0.1')
end
2 changes: 2 additions & 0 deletions lib/ankusa.rb
@@ -0,0 +1,2 @@
$:.unshift File.dirname(__FILE__)
require 'ankusa/classifier'
24 changes: 24 additions & 0 deletions lib/ankusa/classifier.rb
@@ -0,0 +1,24 @@
require 'stemmer'

module Ankusa

class Classifier
def initialize(hbase_client)
@hbase = hbase_client
end

def train(klass, text)
# word.stem
end

def untrain(klass, text)
end

def classify(text)
end

def classes(text)
end
end

end

0 comments on commit 24ba779

Please sign in to comment.