Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kmeans: small fix with inf variance #481

Merged
merged 1 commit into from Nov 23, 2015
Merged

kmeans: small fix with inf variance #481

merged 1 commit into from Nov 23, 2015

Conversation

arnike
Copy link

@arnike arnike commented Nov 22, 2015

Hi there!

There seems to be no check on the cluster size when it's decreased, yet it's used afterwards to update the variance. So, if the cluster size reaches zero, the variance is infinity. As a result, the cluster gets chosen in the next iteration and causes access to furthestPoint == data.n_cols, which is rather sad.
I simply added the same condition as in Precalculate.

rcurtin added a commit that referenced this pull request Nov 23, 2015
kmeans: small fix with inf variance
@rcurtin rcurtin merged commit 1bb8434 into mlpack:master Nov 23, 2015
@rcurtin
Copy link
Member

rcurtin commented Nov 23, 2015

Hi there,

Thanks for the contribution! I've merged it in, which will help reduce general sadness (to me, this seems like a good thing).

I also made a slight change: the way MaxVarianceNewCluster works is that it will find the cluster with maximum variance and pull points from it to use as the centroids of empty clusters. But it does this by precalculating (once each iteration) which cluster has maximum variance. However, the situation you've stumbled upon can only happen if EmptyCluster() is called enough times that every point is taken out of the maximum cluster. Therefore, I've modified things slightly in ec8a151 so that a new cluster with maximum variance is found if the maximum variance cluster has only one point in it.

@rcurtin
Copy link
Member

rcurtin commented Nov 23, 2015

One more thought -- would you like me to add your name/email to the contributors list? If so, let me know what name/email I should use. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants