Thoughts about validation #71

marciovicente · 2016-05-30T04:26:55Z

I'm using this gem and it seems awesome. But I have some questions about validation.

What are you using to check the accuracy of a specific model? I've look through the documentation and don't find anything about validation. So I've thinking it's a great feature implement a module to measure the precision of a specific model.

In my academic work, I create a simple model validator to check the accuracy and it's works fine for me. I had thought of implementing initial validate method:

classifier.validate(testing_samples)

that returns

accuracy | positive | negative
   0.89      802          98   | positive
   0.81      128         572   | negative

Mean accuracy: 0.8587

I think it's possible create a "fake" cross validation method yet (or maybe the original), passing optionally, a number of folds that will be the number of iterations to shuffle the testing dataset and re-classify them. So, the output would be something like:

classifier.validate(testing_samples, 5)
Fold 1: 0.891
Fold 2: 0.887
Fold 3: 0.821
Fold 4: 0.798
Fold 5: 0.803
Mean accuracy for 5 folds: 0.84

The text was updated successfully, but these errors were encountered:

Ch4s3 · 2016-05-30T16:01:17Z

That seems interesting. I'm not sure I would know how to do that correctly myself, but if you're willing to take a stab at it, I'm happy to answer questions about the code base.

Ch4s3 · 2016-07-25T17:57:40Z

@marciovicente are you still interested in doing this?

marciovicente · 2016-07-25T18:15:02Z

@Ch4s3 I'm sorry. I was very busy in the last month. I have interest in contributing. I already have some code about this in my personal project and I pretend to move and improve the quality of code, that currently is very poor. ASAP I'll submit a new pull request solving this issue.

Ch4s3 · 2016-07-25T18:44:28Z

@marciovicente Awesome, there's no huge rush to get it in. I don't have any looming release deadlines so I'll look it over as it comes in.

marciovicente · 2016-09-13T04:09:42Z

@Ch4s3 Just to keep you updated, I've created the confusion matrix for binary samples, when the user call the method validate (bayes.validate(validate_sample))

I'm working now in precision, recall and F-measure metrics and I hope soon I'll submit a pull request.

Ch4s3 · 2016-12-30T22:02:24Z

@marciovicente ping

marciovicente · 2017-01-03T23:36:25Z

@Ch4s3 As I've reported in another issue #76 seems there's a problem with the bayes classification.. Unfortunately I can't share my data right now because it's is private. :/

Ch4s3 · 2017-01-03T23:38:31Z

Ok, keep me updated and I'll try to get to the bottom of it.

Ch4s3 · 2017-01-06T16:22:28Z

@marciovicente Can you take a look at #92? We added a big test case using real spam filtering data, and we're able to match the expected results. It would be great if we could find cases where we don't perform as well.

marciovicente · 2017-01-09T03:12:00Z

@Ch4s3 I did create a gist with some samples of my data.. Look at the last line, there's the one record with the class :m. Except it, all the lines are either :f or :t.

https://gist.github.com/marciovicente/f836c302697a786c12bb7721fdd5dd2c

ibnesayeed · 2017-01-16T05:19:07Z

I am glad that I saw this ticket. I was thinking about validation for last few weeks. In fact I wrote some code to get the confusion matrix when I wrote the integration test with real dataset #92. But, I was not very sure where that could reside and what exactly one would want to validate? For example;

User might want to compare more than one classifiers to see how each of them perform in terms of accuracy and their tendency towards false negatives or false positives (depending on the application, one might be less desired than the other) and how much they cost in terms of training and classification time and memory needed.
Another intent could be to find out the statistics of a classifier in action to decide whether enough training is done or more training is required to reach the desired accuracy.

Based on the intent the API might change. Here are a few options that I considered.

Make a rake task to get statistics of a classifier against a given dataset. We can use the included SMS dataset, allow user to pass data file formatted the same way, load multi-line data from individual text or html files organized in sub-folders where the corresponding folder name is the class. This mechanism can only satisfy the first intent.
Write a test/benchmark to get statistics of a classifier against some included datasets. This is very similar to the Rake task, except it is more limiting as the user can't pass custom dataset. This can be helpful to get the cost associated with each algorithm.
Add evaluation method in each classifier class to be called (say evaluate()) on the trained object with supplied test data as an array of arrays [[category, record]...]. This would allow interaction with the method from a program while the classifier instance (model) is being trained. This will also allow separation of the concern of how the supplied data is originally stored (spreadsheet, CSV/TSV, individual files, or URLs etc.) as the required formatting would be done before calling the evaluate() method. This can serve the second intent described above very well, but it has a few limitations though, the logic of the evaluation will be duplicated in each classifier class we implement and the k-fold style cross-validation won't be possible (or at least not in a sensible way).
Add a separate module for validation. This module should expose methods as illustrated below:

module ClassifierReborn
  module ClassifierValidator

    module_function

    def evaluate(classifier, test_data)
      conf_mat = {}
      categories = classifier.categories
      categories.each do |actual|
        categories.each do |predicted|
          conf_mat[actual][predicted] = 0
        end
      end
      test_data.each do |rec|
        conf_mat[rec.first][classifier.classify(rec.last)] += 1
      end
      conf_mat
    end

    def validate(classifier, training_data, test_data)
      classifier.reset()
      training_data.each do |rec|
        classifier.train(rec.first, rec.last)
      end
      evaluate(classifier, test_data)
    end

    def cross_validate(classifier, sample_data, fold=k, *options)
      classifier = ClassifierReborn::const_get(classifier).new(options) if classifier.is_a?(String)
      sample_data.shuffle!
      partition_size = sample_data.length / fold
      partitioned_data = sample_data.each_slice(partition_size)
      conf_mats = []
      fold.times do |i|
        training_data = partitioned_data.take(fold)
        test_data = training_data.slice!(i)
        conf_mats << validate(classifier, training_data.flatten!(1), test_data)
      end
      classifier.reset()
      generate_stats(conf_mats)
      # Optionally, generate time and memory profiles for individual and accumulated iterations
    end

    def generate_stats(*conf_mats)
      # Derive various statistics for one or more supplied confusion matrices
      # Report summary based on individual and accumulated confusion matrices
    end
  end
end

In my opinion, this is the best way to go as it covers all the intents described earlier. Additionally, Rake tasks, tests, and benchmarks can utilize it if needed.

To measure the performance of a populated classifier model one can use evaluate() by supplying the classifier instance and a test data to it (for instance to know when to stop training or to check if more training is needed for a desired accuracy).
To validate based on manually decided set of training_data and test_data or to implement one of many known validation methods use validate() for an initialized classifier.
To validate using most popular k-fold validation method, use cross_validate() by supplying an initialized classifier instance or optionally by the name of the classifier (such as Bayes or LSI to let the method create the classifier instance) and a sample_data. Partitioning the sample_data and calling validate() method on them k times is done internally.

One or more confusion matrices generated by manually called evaluate() or validate() methods can be supplied to generate_stats() method to get a nice statistical summary, while this method is automatically called if cross_validate() method is utilized.

This implementation assumes that a classifier is validatable if it responds to categories(), train(), classify(), and reset() methods. The latter is important because now we have persistent storage backend support that needs to be cleared before any training is done for validation. This can be made optional so that it is only called if the classifier responds to it. Unfortunately, LSI does not implement a train() method, instead it has a corresponding add_item() method which has different parameter fingerprint. However, to make the API uniform, we can add a method with train([categories...], text) signature that internally calls add_item().

Ch4s3 · 2017-01-17T15:39:00Z

I think the last approach seems to be the best.

ibnesayeed · 2017-01-17T15:43:28Z

@Ch4s3: I think the last approach seems to be the best.

I will send a PR later this week then.

Ch4s3 · 2017-01-17T16:32:15Z

awesome thanks @ibnesayeed!

marciovicente · 2017-01-19T14:26:25Z

Ow! Nice job! @ibnesayeed 👏

ibnesayeed · 2017-01-20T07:46:47Z

Thanks @marciovicente, I would appreciate if you could have a look at #142 and provide feedback on what else can possibly be done beyond what is implemented and planned so far.

ibnesayeed · 2017-02-15T17:42:34Z

@marciovicente would you like to have a look at the validation documentation and provide any feedback.

@Ch4s3 if the documentation is satisfactory then this issue can be closed.

Ch4s3 · 2017-02-22T16:19:06Z

thanks @ibnesayeed

ibnesayeed mentioned this issue Jan 18, 2017

Deprecate magic train untrain methods #140

Open

3 tasks

ibnesayeed mentioned this issue Jan 19, 2017

Classifier evaluation and validation #142

Merged

Ch4s3 closed this as completed Feb 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thoughts about validation #71

Thoughts about validation #71

marciovicente commented May 30, 2016

Ch4s3 commented May 30, 2016

Ch4s3 commented Jul 25, 2016

marciovicente commented Jul 25, 2016 •

edited

Loading

Ch4s3 commented Jul 25, 2016

marciovicente commented Sep 13, 2016 •

edited

Loading

Ch4s3 commented Dec 30, 2016

marciovicente commented Jan 3, 2017

Ch4s3 commented Jan 3, 2017

Ch4s3 commented Jan 6, 2017

marciovicente commented Jan 9, 2017

ibnesayeed commented Jan 16, 2017 •

edited

Loading

Ch4s3 commented Jan 17, 2017

ibnesayeed commented Jan 17, 2017

Ch4s3 commented Jan 17, 2017

marciovicente commented Jan 19, 2017

ibnesayeed commented Jan 20, 2017

ibnesayeed commented Feb 15, 2017

Ch4s3 commented Feb 22, 2017

Thoughts about validation #71

Thoughts about validation #71

Comments

marciovicente commented May 30, 2016

Ch4s3 commented May 30, 2016

Ch4s3 commented Jul 25, 2016

marciovicente commented Jul 25, 2016 • edited Loading

Ch4s3 commented Jul 25, 2016

marciovicente commented Sep 13, 2016 • edited Loading

Ch4s3 commented Dec 30, 2016

marciovicente commented Jan 3, 2017

Ch4s3 commented Jan 3, 2017

Ch4s3 commented Jan 6, 2017

marciovicente commented Jan 9, 2017

ibnesayeed commented Jan 16, 2017 • edited Loading

Ch4s3 commented Jan 17, 2017

ibnesayeed commented Jan 17, 2017

Ch4s3 commented Jan 17, 2017

marciovicente commented Jan 19, 2017

ibnesayeed commented Jan 20, 2017

ibnesayeed commented Feb 15, 2017

Ch4s3 commented Feb 22, 2017

marciovicente commented Jul 25, 2016 •

edited

Loading

marciovicente commented Sep 13, 2016 •

edited

Loading

ibnesayeed commented Jan 16, 2017 •

edited

Loading