You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current design chooses the first k points as starting values.
If any of these data points are identical this leads the first to be assigned all the points and the second to be assigned no points (and then generating a NaN mean over its 0 members, and derailing the whole clustering algorithm).
There are 2 solutions I can think of to avoid this condition:
Select the first k distinct points for centres.
Move any centre which ends up with a cluster of size 0 to a random other point.
The first one seems simple and more predictably performant to start from.
The text was updated successfully, but these errors were encountered:
The current design chooses the first k points as starting values.
If any of these data points are identical this leads the first to be assigned all the points and the second to be assigned no points (and then generating a NaN mean over its 0 members, and derailing the whole clustering algorithm).
There are 2 solutions I can think of to avoid this condition:
The first one seems simple and more predictably performant to start from.
The text was updated successfully, but these errors were encountered: