Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many tests don't make assertions #77

Open
amitkgupta opened this issue Aug 22, 2014 · 3 comments
Open

Many tests don't make assertions #77

amitkgupta opened this issue Aug 22, 2014 · 3 comments

Comments

@amitkgupta
Copy link
Contributor

A large number of tests don't make any assertions, and instead just print stuff out, e.g. a confusion matrix summary. Seems to defeat the purpose of having automated tests. There is some value to these tests, if they pass, it tells you the algorithm runs end to end without blowing up. But I've noticed when running in verbose mode that some of the algorithms produce 0% accuracy, and when tweaking parameters to improve the accuracy, the tests crashed with a memory panic.

Seems like having actual assertions (in addition to being valuable on their own) would've helped catch that kind of thing earlier. I'm willing to volunteer to go do this, work through the tests and make sure they're all making valuable assertions, someone just needs to assign it to me.

While I'm at it, any ideas for what sorts of things would be valuable to assert on? At minimum, for things that were printing out confusion matrices, asserting on the overall summary is a start.

@Sentimentron
Copy link
Collaborator

0% accuracy? That sounds pretty unusual.

I think you need to be a collaborator to get assigned to this issue. You've had a pull request merged, so I'm sure @sjwhitworth can sort that out.

@amitkgupta
Copy link
Contributor Author

Yup. 0% for one of the three iris labels, not 0% over all. And it wouldn't happen all the time, but about 1 in 4 runs of the test. Increasing the size of the RF helped things, but then I started seeing the memory panics which you know about and fixed.

@Sentimentron
Copy link
Collaborator

So one way I found of correcting the problem is to reduce the significance level for discretisation to 0.6 and also reduce the amount of data used for pruning. This helps the ID3 algorithm to build a better tree and so seems to reduce the variability of the results (at least on my machine).
Relevant branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants