Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero assigned clusters leading to zero means #5

Open
queenp opened this issue Oct 7, 2017 · 1 comment
Open

Zero assigned clusters leading to zero means #5

queenp opened this issue Oct 7, 2017 · 1 comment

Comments

@queenp
Copy link

queenp commented Oct 7, 2017

The current design chooses the first k points as starting values.

If any of these data points are identical this leads the first to be assigned all the points and the second to be assigned no points (and then generating a NaN mean over its 0 members, and derailing the whole clustering algorithm).

There are 2 solutions I can think of to avoid this condition:

  • Select the first k distinct points for centres.
  • Move any centre which ends up with a cluster of size 0 to a random other point.

The first one seems simple and more predictably performant to start from.

@Stunkymonkey
Copy link

today the same problem appeared today. i have not tested #6 yet, but @huonw please look into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants