-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thoughts about validation #71
Comments
That seems interesting. I'm not sure I would know how to do that correctly myself, but if you're willing to take a stab at it, I'm happy to answer questions about the code base. |
@marciovicente are you still interested in doing this? |
@Ch4s3 I'm sorry. I was very busy in the last month. I have interest in contributing. I already have some code about this in my personal project and I pretend to move and improve the quality of code, that currently is very poor. ASAP I'll submit a new pull request solving this issue. |
@marciovicente Awesome, there's no huge rush to get it in. I don't have any looming release deadlines so I'll look it over as it comes in. |
@Ch4s3 Just to keep you updated, I've created the confusion matrix for binary samples, when the user call the method validate ( I'm working now in precision, recall and F-measure metrics and I hope soon I'll submit a pull request. |
@marciovicente ping |
Ok, keep me updated and I'll try to get to the bottom of it. |
@marciovicente Can you take a look at #92? We added a big test case using real spam filtering data, and we're able to match the expected results. It would be great if we could find cases where we don't perform as well. |
@Ch4s3 I did create a gist with some samples of my data.. Look at the last line, there's the one record with the class https://gist.github.com/marciovicente/f836c302697a786c12bb7721fdd5dd2c |
I am glad that I saw this ticket. I was thinking about validation for last few weeks. In fact I wrote some code to get the
Based on the intent the API might change. Here are a few options that I considered.
module ClassifierReborn
module ClassifierValidator
module_function
def evaluate(classifier, test_data)
conf_mat = {}
categories = classifier.categories
categories.each do |actual|
categories.each do |predicted|
conf_mat[actual][predicted] = 0
end
end
test_data.each do |rec|
conf_mat[rec.first][classifier.classify(rec.last)] += 1
end
conf_mat
end
def validate(classifier, training_data, test_data)
classifier.reset()
training_data.each do |rec|
classifier.train(rec.first, rec.last)
end
evaluate(classifier, test_data)
end
def cross_validate(classifier, sample_data, fold=k, *options)
classifier = ClassifierReborn::const_get(classifier).new(options) if classifier.is_a?(String)
sample_data.shuffle!
partition_size = sample_data.length / fold
partitioned_data = sample_data.each_slice(partition_size)
conf_mats = []
fold.times do |i|
training_data = partitioned_data.take(fold)
test_data = training_data.slice!(i)
conf_mats << validate(classifier, training_data.flatten!(1), test_data)
end
classifier.reset()
generate_stats(conf_mats)
# Optionally, generate time and memory profiles for individual and accumulated iterations
end
def generate_stats(*conf_mats)
# Derive various statistics for one or more supplied confusion matrices
# Report summary based on individual and accumulated confusion matrices
end
end
end In my opinion, this is the best way to go as it covers all the intents described earlier. Additionally, Rake tasks, tests, and benchmarks can utilize it if needed.
One or more confusion matrices generated by manually called This implementation assumes that a classifier is validatable if it responds to |
I think the last approach seems to be the best. |
I will send a PR later this week then. |
awesome thanks @ibnesayeed! |
Ow! Nice job! @ibnesayeed 👏 |
Thanks @marciovicente, I would appreciate if you could have a look at #142 and provide feedback on what else can possibly be done beyond what is implemented and planned so far. |
@marciovicente would you like to have a look at the validation documentation and provide any feedback. @Ch4s3 if the documentation is satisfactory then this issue can be closed. |
thanks @ibnesayeed |
I'm using this gem and it seems awesome. But I have some questions about validation.
What are you using to check the accuracy of a specific model? I've look through the documentation and don't find anything about validation. So I've thinking it's a great feature implement a module to measure the precision of a specific model.
In my academic work, I create a simple model validator to check the accuracy and it's works fine for me. I had thought of implementing initial
validate
method:that returns
I think it's possible create a "fake" cross validation method yet (or maybe the original), passing optionally, a number of folds that will be the number of iterations to shuffle the testing dataset and re-classify them. So, the output would be something like:
The text was updated successfully, but these errors were encountered: