Skip to content


Subversion checkout URL

You can clone with
Download ZIP
Ruby library for k-fold cross-validation of machine learning classifiers. It also provides a confusion matrix for interpreting classifier results. NOTE: API is unstable and subject to change before v1.0.
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
lib Version 0.0.2
Gemfile Initial commit at 30,000 feet, between EWR and DEN


Build Status Code Climate

This gem provides a k-fold cross-validation routine and confusion matrix for evaluating machine learning classifiers. See below for usage or jump to the documentation.


Add this line to your application's Gemfile:

gem 'cross_validation'

And then execute:

$ bundle install --binstubs .bin

Or install it yourself as:

$ gem install cross_validation


To cross-validate your classifier, you need to configure a run as follows:

require 'cross_validation'

runner = CrossValidation::Runner.create do |r|
  r.documents = my_array_of_documents
  r.folds = 10
  # or if you'd rather test on 10%
  # r.percentage = 0.1
  r.classifier = lambda { }
  r.fetch_sample_class = lambda { |sample| sample.klass }
  r.fetch_sample_value = lambda { |sample| sample.value }
  r.matrix = = lambda { |classifier, doc|
    classifier.train doc.klass, doc.value
  r.classifying = lambda { |classifier, doc|
    classifier.classify doc

With the run configured, just invoke #run to return a confusion matrix:

mat =

With a confusion matrix in hand, you can compute many statistics about your classifier:

  • mat.accuracy
  • mat.f1
  • mat.fscore(beta)
  • mat.precision
  • mat.recall

Please see the respective documentation for each method for more details.

Defining keys_for

The ConfusionMatrix class requires a keys_for Proc that returns a symbol. In this method, you specify what constitutes a true positive (:tp), true negative (:tn), false positive (:fp), and false negative (:fn). For example, in spam classification, you can construct the following table to write the keys_for method:

 expected | correct        | not correct
 spam     | true positive  | false positive
 ham      | true negative  | false negative

You can then implement this table with nested hashes or just a few conditionals:

def keys_for(expected, actual)
  if expected == :spam
    actual == :spam ? :tp : :fp
  elsif expected == :ham
    actual == :ham ? :tn : :fn

Once you have your keys_for method implemented, pass it into the ConfusionMatrix with method(:keys_for), or if it's a class-method, MyClass.method(:keys_for). (You can also implement the method as a lambda.)


For v1.0:

  • Implement configurable, parallel cross-validation
  • Include more complete examples


Jon-Michael Deldin,

Something went wrong with that request. Please try again.