-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proof Ckmeans #124
Comments
Here's a much smaller test case:
It's clear that [(0), (3,4)] has a much smaller cluster distance than [(0,3), (4)], so something is wrong with our algorithm. To find that test case, I wrote some property-based tests with @DRMacIver's excellent hypothesis library. Although the testing code needs work[1], this seems to be a good counterexample that fails on both our algorithms in the same way. [1]: right now if you run it and it finds a counterexample, the next time you run it it may yell at you about flaky tests, because of how I'm generating the data. Working on fixing that right now. |
OK, I improved example generation and updated the gist. Another small example is: |
I also just noticed that the tests in the test_ckmeans.js file are indeed wrong, as the comment in there suggests.
|
I think we're failing to consider the first element of the array, because this code:
never considers |
(edit: no, not plausible) |
It may be useful to note that you can ask hypothesis to give you a Random instance, which will remove any flakiness you get from your use of random.sample |
I think I fixed it! After this commit hypothesis still finds some errors, but they appear to be related to MAX_INT and/or floating point checking. The two errors fixed in that commit are:
|
Okay: ckmeans is polished up! Running out a release. |
The text was updated successfully, but these errors were encountered: