-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model Mining class - Keras blog comparison #99
Comments
Ah yeah that might be fair to further explore. The general point that I'm trying to make is that the rule model certainly seems competative but if the validation size makes a large effect I should investigate that further. |
I just ran the numbers locally and I can confirm your finding. I'll add a comment to the post to take everything with a grain of salt, everything you describe here is fair to add. But it strikes me that the conclusion still remains the same. The Keras blog lists a 0.24 precision score which is terrible. Even at, p=0.75/r=0.63 I would argue that the rule-based system has a lot going for it compared to the Keras numbers. |
I've just pushed a PR with the comment (should be live within a minute). The comment will link to the conversation here so as well. I'm closing the issue but if folks want to keep discussing what might be a fair comparison here: feel free. To be perfectly honest, as much as I like the technique ... the methodology around it has some rough edges. Since there is a human-in-the-loop it can be incredibly tempting to update the rules to optimize for test set performances, which is exactly what we don't want during benchmarking. |
Hi Vincent,
as many times mentioned in private messages, I am a big fan of calm code - an endless source of inspiration what to try, really.
Looking at the Model Mining class today and comparison with Keras claimed performance, I though could not resist to make a note on the fairer comparison.
Problem
Looking at the blog post, the validation set contains 20% of the data, while yours is 50/50 with
shuffle=True
. Not discussing, which way is better, but I think to compare the result more fairly one would aim to have the same test set.Assuming the
fetch_openml
method and following part from the Keras post give the same data (not a different sorting)the performance of human-learn based classifier would drop from
to
Please consider this as an observation, don't want it to look any bad.
Proposed change
Full code used
Update of Calm Code Model Mining to reflect same validation set as Keras blog
Sources
https://calmcode.io/model-mining/benchmark.html
https://keras.io/examples/structured_data/imbalanced_classification/
The text was updated successfully, but these errors were encountered: