Skip to content

Commit

Permalink
updated docs to add references to mongo availability
Browse files Browse the repository at this point in the history
  • Loading branch information
Brian Muller committed May 23, 2012
1 parent f4aa44b commit 19108a5
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 5 deletions.
8 changes: 5 additions & 3 deletions README.rdoc
Original file line number Diff line number Diff line change
@@ -1,14 +1,16 @@
= ankusa

Ankusa is a text classifier in Ruby that can use either Hadoop's HBase or Cassandra for storage. Because it uses HBase or Cassandra as a backend, the training corpus can be many terabytes in size (though additional memory and single file storage abilities also exist for smaller corpora).
Ankusa is a text classifier in Ruby that can use either Hadoop's HBase, Mongo, or Cassandra for storage. Because it uses HBase/Mongo/Cassandra as a backend, the training corpus can be many terabytes in size (though additional memory and single file storage abilities also exist for smaller corpora).

Ankusa currently provides both a Naive Bayes and Kullback-Leibler divergence classifier. It ignores common words (a.k.a, stop words) and stems all others. Additionally, it uses Laplacian smoothing in both classification methods.

== Installation
First, install HBase/Hadoop or Cassandra (>= 0.7.0-rc2). Then, install the appropriate gem:
First, install HBase/Hadoop, Mongo, or Cassandra (>= 0.7.0-rc2). Then, install the appropriate gem:
gem install hbaserb
# or
gem install cassandra
# or
gem install mongo

If you're using HBase, make sure the HBase Thrift interface has been started as well. Then:
gem install ankusa
Expand Down Expand Up @@ -81,7 +83,7 @@ The API is the same as the NaiveBayesClassifier, except rather than calling "cla
storage.close

== Storage Methods
Ankusa has a generalized storage interface that has been implemented for HBase, Cassandra, single file, and in-memory storage.
Ankusa has a generalized storage interface that has been implemented for HBase, Cassandra, Mongo, single file, and in-memory storage.

Memory storage can be used when you have a very small corpora
require 'ankusa/memory_storage'
Expand Down
4 changes: 2 additions & 2 deletions ankusa.gemspec
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ Gem::Specification.new do |s|
s.version = Ankusa::VERSION
s.authors = ["Brian Muller"]
s.date = Date.today.to_s
s.description = "Text classifier with HBase or Cassandra storage"
s.summary = "Text classifier in Ruby that uses Hadoop's HBase or Cassandra for storage"
s.description = "Text classifier with HBase, Cassandra, or Mongo storage"
s.summary = "Text classifier in Ruby that uses Hadoop's HBase, Cassandra, or Mongo for storage"
s.email = "brian.muller@livingsocial.com"
s.files = FileList["lib/**/*", "[A-Z]*", "Rakefile", "docs/**/*"]
s.homepage = "https://github.com/livingsocial/ankusa"
Expand Down

0 comments on commit 19108a5

Please sign in to comment.