Permalink
Browse files

added initial structure / API

  • Loading branch information...
1 parent 4e7d2c4 commit 24ba7795ed225b40c5e7f967fac5ba9d4124fc44 @bmuller bmuller committed Nov 29, 2010
Showing with 77 additions and 0 deletions.
  1. 0 README
  2. +33 −0 README.rdoc
  3. +18 −0 ankusa.gemspec
  4. +2 −0 lib/ankusa.rb
  5. +24 −0 lib/ankusa/classifier.rb
View
0 README
No changes.
View
@@ -0,0 +1,33 @@
+= ankusa
+
+Ankusa is a Naive Bayes classifier in Ruby that uses Hadoop's HBase for storage. Because it uses HBase as a backend, the training corpus can be many terabytes in size.
+
+== Installation
+First, install hbaserb:
+ git clone git://github.com/bmuller/hbaserb.git
+ cd hbaserb
+ gem build hbaserb.gemspec && gem install hbaserb
+
+Then, install ankusa:
+ git clone git://github.com/livingsocial/ankusa.git
+ cd ankusa
+ gem build ankusa.gemspec && gem install ankusa
+
+== Basic Usage
+ require 'rubygems'
+ require 'ankusa'
+ require 'hbaserb'
+
+ # connect to HBase
+ client = HBaseRb::Client.new 'localhost'
+
+ c = Classifier.new client
+ c.train :spam, "This is some spammy text"
+ c.train :good, "This is not the bad stuff"
+
+ # This will return the most likely class (as symbol)
+ puts c.classify "This is some spammy text"
+
+ # This will return Hash with classes as keys and
+ # membership probability as values
+ puts c.classes "This is some spammy text"
View
@@ -0,0 +1,18 @@
+Gem::Specification.new do |s|
+ s.name = "ankusa"
+ s.version = "0.0.1"
+ s.authors = ["Brian Muller"]
+ s.date = %q{2010-11-29}
+ s.description = "Naive Bayes classifier with HBase storage"
+ s.summary = "Naive Bayes classifier in Ruby that uses Hadoop's HBase for storage"
+ s.email = "brian.muller@livingsocial.com"
+ s.files = [
+ "lib/ankusa.rb",
+ "lib/ankusa/classifier.rb",
+ ]
+ s.homepage = "https://github.com/livingsocial/ankusa"
+ s.require_paths = ["lib"]
+ s.rubygems_version = "1.3.5"
+ s.add_dependency('hbaserb', '>= 0.0.1')
+ s.add_dependency('stemmer', '>= 1.0.1')
+end
View
@@ -0,0 +1,2 @@
+$:.unshift File.dirname(__FILE__)
+require 'ankusa/classifier'
View
@@ -0,0 +1,24 @@
+require 'stemmer'
+
+module Ankusa
+
+ class Classifier
+ def initialize(hbase_client)
+ @hbase = hbase_client
+ end
+
+ def train(klass, text)
+ # word.stem
+ end
+
+ def untrain(klass, text)
+ end
+
+ def classify(text)
+ end
+
+ def classes(text)
+ end
+ end
+
+end

0 comments on commit 24ba779

Please sign in to comment.