Skip to content
Ruby library for k-fold cross-validation of machine learning classifiers. It also provides a confusion matrix for interpreting classifier results. NOTE: API is unstable and subject to change before v1.0.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


Build Status Code Climate

This gem provides a k-fold cross-validation routine and confusion matrix for evaluating machine learning classifiers. See below for usage or jump to the documentation.


Add this line to your application's Gemfile:

gem 'cross_validation'

And then execute:

$ bundle install --binstubs .bin

Or install it yourself as:

$ gem install cross_validation


To cross-validate your classifier, you need to configure a run as follows:

require 'cross_validation'

runner = CrossValidation::Runner.create do |r|
  r.documents = my_array_of_documents
  r.folds = 10
  # or if you'd rather test on 10%
  # r.percentage = 0.1
  r.classifier = lambda { }
  r.fetch_sample_class = lambda { |sample| sample.klass }
  r.fetch_sample_value = lambda { |sample| sample.value }
  r.matrix = = lambda { |classifier, doc|
    classifier.train doc.klass, doc.value
  r.classifying = lambda { |classifier, doc|
    classifier.classify doc

With the run configured, just invoke #run to return a confusion matrix:

mat =

With a confusion matrix in hand, you can compute many statistics about your classifier:

  • mat.accuracy
  • mat.f1
  • mat.fscore(beta)
  • mat.precision
  • mat.recall

Please see the respective documentation for each method for more details.

Defining keys_for

The ConfusionMatrix class requires a keys_for Proc that returns a symbol. In this method, you specify what constitutes a true positive (:tp), true negative (:tn), false positive (:fp), and false negative (:fn). For example, in spam classification, you can construct the following table to write the keys_for method:

 expected | correct        | not correct
 spam     | true positive  | false positive
 ham      | true negative  | false negative

You can then implement this table with nested hashes or just a few conditionals:

def keys_for(expected, actual)
  if expected == :spam
    actual == :spam ? :tp : :fp
  elsif expected == :ham
    actual == :ham ? :tn : :fn

Once you have your keys_for method implemented, pass it into the ConfusionMatrix with method(:keys_for), or if it's a class-method, MyClass.method(:keys_for). (You can also implement the method as a lambda.)


For v1.0:

  • Implement configurable, parallel cross-validation
  • Include more complete examples


Jon-Michael Deldin,

You can’t perform that action at this time.