Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse files

Fix text example (closes #6)

  • Loading branch information...
commit 52994ecf06dea78c2b0fe54e351632c6dc2a778b 1 parent 2b9e0f8
@febeling authored
Showing with 13 additions and 11 deletions.
  1. +13 −11 examples/text.rb
View
24 examples/text.rb
@@ -11,22 +11,23 @@
# Let take our documents and create word vectors out of them.
# I've included labels for these already. 1 signifies that
-# the document was funny 0 means that it wasn't
+# the document was funny, 0 means that it wasn't.
#
documents = [[1, "Why did the chicken cross the road? Because a car was coming"],
[0, "You're an elevator tech? I bet that job has its ups and downs"]]
# Lets create a dictionary of unique words and then we can
-# create our vectors. This is a very simple example. If
-# you were doing this in a production system you'd do things
-# like stemming and removing all punctuation.
+# create our vectors. This is a very simple example. If you
+# were doing this in a production system you'd do things like
+# stemming and removing all punctuation (in a less casual way).
#
-dictionary = documents.map(&:last).flatten.uniq
+dictionary = documents.map(&:last).map(&:split).flatten.uniq
dictionary = dictionary.map { |x| x.gsub(/\?|,|\.|\-/,'') }
training_set = []
-docments.each do |doc|
- training_set << [doc.first, Libsvm::Node.features(dictionary.map { |x| doc.include?(x) ? 1 : 0 })]
+documents.each do |doc|
+ features_array = dictionary.map { |x| doc.last.include?(x) ? 1 : 0 }
+ training_set << [doc.first, Libsvm::Node.features(features_array)]
end
# Lets set up libsvm so that we can test our prediction
@@ -39,7 +40,7 @@
parameter.eps = 0.001
parameter.c = 10
-# train classifier using training set
+# Train classifier using training set
#
problem.set_examples(training_set.map(&:first), training_set.map(&:last))
model = Libsvm::Model.train(problem, parameter)
@@ -47,7 +48,8 @@
# Now lets test our classifier using the test set
#
test_set = [1, "Why did the chicken cross the road? To get the worm"]
-test_document = test_set.last.split(' ').map{ |x| x.gsub(/\?|,|\.|\-/,'') }
+test_document = test_set.last.split.map{ |x| x.gsub(/\?|,|\.|\-/,'') }
-pred = model.predict(Libsvm::Node.features(dictionary.map{|x| test_document.include?(x) }))
-puts "Predicted #{pred}"
+doc_features = dictionary.map{|x| test_document.include?(x) ? 1 : 0 }
+pred = model.predict(Libsvm::Node.features(doc_features))
+puts "Predicted #{pred==1 ? 'funny' : 'not funny'}"
Please sign in to comment.
Something went wrong with that request. Please try again.