Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thoughts about validation #71

Closed
marciovicente opened this issue May 30, 2016 · 18 comments
Closed

Thoughts about validation #71

marciovicente opened this issue May 30, 2016 · 18 comments

Comments

@marciovicente
Copy link

I'm using this gem and it seems awesome. But I have some questions about validation.

What are you using to check the accuracy of a specific model? I've look through the documentation and don't find anything about validation. So I've thinking it's a great feature implement a module to measure the precision of a specific model.

In my academic work, I create a simple model validator to check the accuracy and it's works fine for me. I had thought of implementing initial validate method:

classifier.validate(testing_samples)

that returns

accuracy | positive | negative
   0.89      802          98   | positive
   0.81      128         572   | negative

Mean accuracy: 0.8587  

I think it's possible create a "fake" cross validation method yet (or maybe the original), passing optionally, a number of folds that will be the number of iterations to shuffle the testing dataset and re-classify them. So, the output would be something like:

classifier.validate(testing_samples, 5)
Fold 1: 0.891
Fold 2: 0.887
Fold 3: 0.821
Fold 4: 0.798
Fold 5: 0.803
Mean accuracy for 5 folds: 0.84
@Ch4s3
Copy link
Member

Ch4s3 commented May 30, 2016

That seems interesting. I'm not sure I would know how to do that correctly myself, but if you're willing to take a stab at it, I'm happy to answer questions about the code base.

@Ch4s3
Copy link
Member

Ch4s3 commented Jul 25, 2016

@marciovicente are you still interested in doing this?

@marciovicente
Copy link
Author

marciovicente commented Jul 25, 2016

@Ch4s3 I'm sorry. I was very busy in the last month. I have interest in contributing. I already have some code about this in my personal project and I pretend to move and improve the quality of code, that currently is very poor. ASAP I'll submit a new pull request solving this issue.

@Ch4s3
Copy link
Member

Ch4s3 commented Jul 25, 2016

@marciovicente Awesome, there's no huge rush to get it in. I don't have any looming release deadlines so I'll look it over as it comes in.

@marciovicente
Copy link
Author

marciovicente commented Sep 13, 2016

@Ch4s3 Just to keep you updated, I've created the confusion matrix for binary samples, when the user call the method validate (bayes.validate(validate_sample))

image

I'm working now in precision, recall and F-measure metrics and I hope soon I'll submit a pull request.

@Ch4s3
Copy link
Member

Ch4s3 commented Dec 30, 2016

@marciovicente ping

@marciovicente
Copy link
Author

@Ch4s3 As I've reported in another issue #76 seems there's a problem with the bayes classification.. Unfortunately I can't share my data right now because it's is private. :/

@Ch4s3
Copy link
Member

Ch4s3 commented Jan 3, 2017

Ok, keep me updated and I'll try to get to the bottom of it.

@Ch4s3
Copy link
Member

Ch4s3 commented Jan 6, 2017

@marciovicente Can you take a look at #92? We added a big test case using real spam filtering data, and we're able to match the expected results. It would be great if we could find cases where we don't perform as well.

@marciovicente
Copy link
Author

@Ch4s3 I did create a gist with some samples of my data.. Look at the last line, there's the one record with the class :m. Except it, all the lines are either :f or :t.

https://gist.github.com/marciovicente/f836c302697a786c12bb7721fdd5dd2c

@ibnesayeed
Copy link
Contributor

ibnesayeed commented Jan 16, 2017

I am glad that I saw this ticket. I was thinking about validation for last few weeks. In fact I wrote some code to get the confusion matrix when I wrote the integration test with real dataset #92. But, I was not very sure where that could reside and what exactly one would want to validate? For example;

  1. User might want to compare more than one classifiers to see how each of them perform in terms of accuracy and their tendency towards false negatives or false positives (depending on the application, one might be less desired than the other) and how much they cost in terms of training and classification time and memory needed.
  2. Another intent could be to find out the statistics of a classifier in action to decide whether enough training is done or more training is required to reach the desired accuracy.

Based on the intent the API might change. Here are a few options that I considered.

  1. Make a rake task to get statistics of a classifier against a given dataset. We can use the included SMS dataset, allow user to pass data file formatted the same way, load multi-line data from individual text or html files organized in sub-folders where the corresponding folder name is the class. This mechanism can only satisfy the first intent.
  2. Write a test/benchmark to get statistics of a classifier against some included datasets. This is very similar to the Rake task, except it is more limiting as the user can't pass custom dataset. This can be helpful to get the cost associated with each algorithm.
  3. Add evaluation method in each classifier class to be called (say evaluate()) on the trained object with supplied test data as an array of arrays [[category, record]...]. This would allow interaction with the method from a program while the classifier instance (model) is being trained. This will also allow separation of the concern of how the supplied data is originally stored (spreadsheet, CSV/TSV, individual files, or URLs etc.) as the required formatting would be done before calling the evaluate() method. This can serve the second intent described above very well, but it has a few limitations though, the logic of the evaluation will be duplicated in each classifier class we implement and the k-fold style cross-validation won't be possible (or at least not in a sensible way).
  4. Add a separate module for validation. This module should expose methods as illustrated below:
module ClassifierReborn
  module ClassifierValidator

    module_function

    def evaluate(classifier, test_data)
      conf_mat = {}
      categories = classifier.categories
      categories.each do |actual|
        categories.each do |predicted|
          conf_mat[actual][predicted] = 0
        end
      end
      test_data.each do |rec|
        conf_mat[rec.first][classifier.classify(rec.last)] += 1
      end
      conf_mat
    end

    def validate(classifier, training_data, test_data)
      classifier.reset()
      training_data.each do |rec|
        classifier.train(rec.first, rec.last)
      end
      evaluate(classifier, test_data)
    end

    def cross_validate(classifier, sample_data, fold=k, *options)
      classifier = ClassifierReborn::const_get(classifier).new(options) if classifier.is_a?(String)
      sample_data.shuffle!
      partition_size = sample_data.length / fold
      partitioned_data = sample_data.each_slice(partition_size)
      conf_mats = []
      fold.times do |i|
        training_data = partitioned_data.take(fold)
        test_data = training_data.slice!(i)
        conf_mats << validate(classifier, training_data.flatten!(1), test_data)
      end
      classifier.reset()
      generate_stats(conf_mats)
      # Optionally, generate time and memory profiles for individual and accumulated iterations
    end

    def generate_stats(*conf_mats)
      # Derive various statistics for one or more supplied confusion matrices
      # Report summary based on individual and accumulated confusion matrices
    end
  end
end

In my opinion, this is the best way to go as it covers all the intents described earlier. Additionally, Rake tasks, tests, and benchmarks can utilize it if needed.

  1. To measure the performance of a populated classifier model one can use evaluate() by supplying the classifier instance and a test data to it (for instance to know when to stop training or to check if more training is needed for a desired accuracy).
  2. To validate based on manually decided set of training_data and test_data or to implement one of many known validation methods use validate() for an initialized classifier.
  3. To validate using most popular k-fold validation method, use cross_validate() by supplying an initialized classifier instance or optionally by the name of the classifier (such as Bayes or LSI to let the method create the classifier instance) and a sample_data. Partitioning the sample_data and calling validate() method on them k times is done internally.

One or more confusion matrices generated by manually called evaluate() or validate() methods can be supplied to generate_stats() method to get a nice statistical summary, while this method is automatically called if cross_validate() method is utilized.

This implementation assumes that a classifier is validatable if it responds to categories(), train(), classify(), and reset() methods. The latter is important because now we have persistent storage backend support that needs to be cleared before any training is done for validation. This can be made optional so that it is only called if the classifier responds to it. Unfortunately, LSI does not implement a train() method, instead it has a corresponding add_item() method which has different parameter fingerprint. However, to make the API uniform, we can add a method with train([categories...], text) signature that internally calls add_item().

@Ch4s3
Copy link
Member

Ch4s3 commented Jan 17, 2017

I think the last approach seems to be the best.

@ibnesayeed
Copy link
Contributor

@Ch4s3: I think the last approach seems to be the best.

I will send a PR later this week then.

@Ch4s3
Copy link
Member

Ch4s3 commented Jan 17, 2017

awesome thanks @ibnesayeed!

@marciovicente
Copy link
Author

Ow! Nice job! @ibnesayeed 👏

@ibnesayeed
Copy link
Contributor

Thanks @marciovicente, I would appreciate if you could have a look at #142 and provide feedback on what else can possibly be done beyond what is implemented and planned so far.

@ibnesayeed
Copy link
Contributor

@marciovicente would you like to have a look at the validation documentation and provide any feedback.

@Ch4s3 if the documentation is satisfactory then this issue can be closed.

@Ch4s3
Copy link
Member

Ch4s3 commented Feb 22, 2017

thanks @ibnesayeed

@Ch4s3 Ch4s3 closed this as completed Feb 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants