Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Brian Muller
committed
Nov 29, 2010
1 parent
4e7d2c4
commit 24ba779
Showing
5 changed files
with
77 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
= ankusa | ||
|
||
Ankusa is a Naive Bayes classifier in Ruby that uses Hadoop's HBase for storage. Because it uses HBase as a backend, the training corpus can be many terabytes in size. | ||
|
||
== Installation | ||
First, install hbaserb: | ||
git clone git://github.com/bmuller/hbaserb.git | ||
cd hbaserb | ||
gem build hbaserb.gemspec && gem install hbaserb | ||
|
||
Then, install ankusa: | ||
git clone git://github.com/livingsocial/ankusa.git | ||
cd ankusa | ||
gem build ankusa.gemspec && gem install ankusa | ||
|
||
== Basic Usage | ||
require 'rubygems' | ||
require 'ankusa' | ||
require 'hbaserb' | ||
|
||
# connect to HBase | ||
client = HBaseRb::Client.new 'localhost' | ||
|
||
c = Classifier.new client | ||
c.train :spam, "This is some spammy text" | ||
c.train :good, "This is not the bad stuff" | ||
|
||
# This will return the most likely class (as symbol) | ||
puts c.classify "This is some spammy text" | ||
|
||
# This will return Hash with classes as keys and | ||
# membership probability as values | ||
puts c.classes "This is some spammy text" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
Gem::Specification.new do |s| | ||
s.name = "ankusa" | ||
s.version = "0.0.1" | ||
s.authors = ["Brian Muller"] | ||
s.date = %q{2010-11-29} | ||
s.description = "Naive Bayes classifier with HBase storage" | ||
s.summary = "Naive Bayes classifier in Ruby that uses Hadoop's HBase for storage" | ||
s.email = "brian.muller@livingsocial.com" | ||
s.files = [ | ||
"lib/ankusa.rb", | ||
"lib/ankusa/classifier.rb", | ||
] | ||
s.homepage = "https://github.com/livingsocial/ankusa" | ||
s.require_paths = ["lib"] | ||
s.rubygems_version = "1.3.5" | ||
s.add_dependency('hbaserb', '>= 0.0.1') | ||
s.add_dependency('stemmer', '>= 1.0.1') | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
$:.unshift File.dirname(__FILE__) | ||
require 'ankusa/classifier' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
require 'stemmer' | ||
|
||
module Ankusa | ||
|
||
class Classifier | ||
def initialize(hbase_client) | ||
@hbase = hbase_client | ||
end | ||
|
||
def train(klass, text) | ||
# word.stem | ||
end | ||
|
||
def untrain(klass, text) | ||
end | ||
|
||
def classify(text) | ||
end | ||
|
||
def classes(text) | ||
end | ||
end | ||
|
||
end |