Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Ruby library for k-fold cross-validation of machine learning classifiers. It also provides a confusion matrix for interpreting classifier results. NOTE: API is unstable and subject to change before v1.0.
Ruby
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
lib Version 0.0.2
test
.gitignore
.travis.yml
Gemfile Initial commit at 30,000 feet, between EWR and DEN
LICENSE.txt
README.md
Rakefile
cross_validation.gemspec

README.md

CrossValidation

Build Status Code Climate

This gem provides a k-fold cross-validation routine and confusion matrix for evaluating machine learning classifiers. See below for usage or jump to the documentation.

Installation

Add this line to your application's Gemfile:

gem 'cross_validation'

And then execute:

$ bundle install --binstubs .bin

Or install it yourself as:

$ gem install cross_validation

Usage

To cross-validate your classifier, you need to configure a run as follows:

require 'cross_validation'

runner = CrossValidation::Runner.create do |r|
  r.documents = my_array_of_documents
  r.folds = 10
  # or if you'd rather test on 10%
  # r.percentage = 0.1
  r.classifier = lambda { SpamClassifier.new }
  r.fetch_sample_class = lambda { |sample| sample.klass }
  r.fetch_sample_value = lambda { |sample| sample.value }
  r.matrix = CrossValidation::ConfusionMatrix.new(method(:keys_for))
  r.training = lambda { |classifier, doc|
    classifier.train doc.klass, doc.value
  }
  r.classifying = lambda { |classifier, doc|
    classifier.classify doc
  }
end

With the run configured, just invoke #run to return a confusion matrix:

mat = runner.run

With a confusion matrix in hand, you can compute many statistics about your classifier:

  • mat.accuracy
  • mat.f1
  • mat.fscore(beta)
  • mat.precision
  • mat.recall

Please see the respective documentation for each method for more details.

Defining keys_for

The ConfusionMatrix class requires a keys_for Proc that returns a symbol. In this method, you specify what constitutes a true positive (:tp), true negative (:tn), false positive (:fp), and false negative (:fn). For example, in spam classification, you can construct the following table to write the keys_for method:

                        actual
          +---------------------------------
 expected | correct        | not correct
----------+----------------+----------------
 spam     | true positive  | false positive
 ham      | true negative  | false negative

You can then implement this table with nested hashes or just a few conditionals:

def keys_for(expected, actual)
  if expected == :spam
    actual == :spam ? :tp : :fp
  elsif expected == :ham
    actual == :ham ? :tn : :fn
  end
end

Once you have your keys_for method implemented, pass it into the ConfusionMatrix with method(:keys_for), or if it's a class-method, MyClass.method(:keys_for). (You can also implement the method as a lambda.)

Roadmap

For v1.0:

  • Implement configurable, parallel cross-validation
  • Include more complete examples

Author

Jon-Michael Deldin, dev@jmdeldin.com

Something went wrong with that request. Please try again.