Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Modify Bayes initializer to take the corpus directly #2

Open
wants to merge 3 commits into from

2 participants

@bpot
Owner

This is so we can use hammerspace objects instead of ruby hashes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Mar 6, 2014
  1. @bpot
Commits on Mar 18, 2014
  1. @bpot

    Dont intern strings

    bpot authored
Commits on Mar 22, 2014
  1. @bpot

    Use sum_values

    bpot authored
This page is out of date. Refresh to see the latest.
Showing with 31 additions and 12 deletions.
  1. +30 −11 lib/classifier/bayes.rb
  2. +1 −1  lib/classifier/extensions/word_hash.rb
View
41 lib/classifier/bayes.rb
@@ -5,13 +5,32 @@
module Classifier
class Bayes
- # The class can be created with one or more categories, each of which will be
- # initialized and given a training method. E.g.,
- # b = Classifier::Bayes.new 'Interesting', 'Uninteresting', 'Spam'
- def initialize(*categories)
- @categories = Hash.new
- categories.each { |category| @categories[category.prepare_category_name] = Hash.new }
- @total_words = 0
+ attr_reader :categories
+
+ # Create classfier based on classifier data.
+ #
+ # This expects a hash which maps category names to Hash-like objects
+ # that map strings to word counts (Fixnum).
+ #
+ # Ex.
+ #
+ # Bayes.new({
+ # 'Member' => {
+ # 'cow' => 4,
+ # 'cat' => 3,
+ # 'dog' => 8
+ # },
+ # 'Not Member' => {
+ # 'yeti' => 6,
+ # 'sasquatch' => 2,
+ # 'chupacabra' => 4
+ # }
+ # })
+ #
+ # @param categories [<String,#[]>] Map of categories to word hashes.
+ def initialize(categories = {})
+ @categories = categories
+ @total_words = total_member_count_correct + total_nonmember_count_correct
end
#
@@ -182,19 +201,19 @@ def reset_correct_counts!
end
def total_member_count_correct
- @total_member_count_correct ||= @categories[:Member].values.inject(0) {|sum, element| sum+element}
+ @total_member_count_correct ||= @categories[:Member].sum_values
end
def total_nonmember_count_correct
- @total_nonmember_count_correct ||= @categories[:"Not member"].values.inject(0) {|sum, element| sum+element}
+ @total_nonmember_count_correct ||= @categories[:"Not member"].sum_values
end
def total_member_count
- @total_member_count ||= @categories[:Member].values.inject(0) {|sum, element| sum+element}
+ @total_member_count ||= @categories[:Member].sum_values
end
def total_nonmember_count
- @total_nonmember_count ||= @categories[:"Not member"].values.inject(0) {|sum, element| sum+element}
+ @total_nonmember_count ||= @categories[:"Not member"].sum_values
end
end
View
2  lib/classifier/extensions/word_hash.rb
@@ -35,7 +35,7 @@ def word_hash_for_words(words)
#key = word.stem.intern
# Ignore words if they have no word chars, are in the skip list, all numbers or length <= 2
if word =~ /\w/ && word !~ /\d+/ && word.length > 2 && !CORPUS_SKIP_WORDS.include?(word)
- key = word.intern
+ key = word
d[key] ||= 0
d[key] += 1
end
Something went wrong with that request. Please try again.