Initial kmeans centroids without replacement (Trac #1078) #1605

scipy-gitbot opened this Issue Apr 25, 2013 · 1 comment


None yet

1 participant


Original ticket on 2009-12-28 by @kwgoodman, assigned to unknown.

The kmeans function has two modes. In one of the modes the initial
guesses for the centroids are randomly selected from the input data.
The selection is currently done with replacement:

guess = take(obs, randint(0, No, k), 0)

That means some of the centroids in the intial guess might be the
same. I think it would be better to select without replacement:

guess = take(obs, rand(No).argsort()[:k], 0)

Here's an extreme example of what can go wrong if the selection is
done with replacement:

>> obs

array([[ 1,  1],
      [-1, -1],
      [-1,  1],
      [ 1, -1]])
>> vq.kmeans(obs, k_or_guess=4)

(array([[-1, -1],
      [-1,  1],
      [ 1, -1],
      [ 1,  1]]), 0.0) # <--- good
>> k_or_guess = obs[[1,1,1,1],:]
>> k_or_guess

array([[-1, -1],
      [-1, -1],
      [-1, -1],
      [-1, -1]])
>> vq.kmeans(obs, k_or_guess)
  (array([[0, 0]]), 1.4142135623730951) # <--- not as good

In most cases it won't make any difference. But the cost of the code
change is small.


Attachment added by @kwgoodman on 2009-12-28: patch.diff

@tonysyu tonysyu pushed a commit to tonysyu/scipy that referenced this issue May 9, 2013
@pv pv BUG: spatial/qhull: discard degenerate barycentric transforms (fixes #…

For some point sets, the Delaunay triangulation produces nearly
degenerate simplices. These have singular barycentric transforms.

Previously, we discarded exactly singular transforms.  However,
computing the transform may succeed, but the result may still be invalid
and cause problems later on e.g. in _find_simplex_directed.

This commit adds a better check for validity of the transform: we
explicitly check the condition number, and drop those for which it is
large compared to machine precision.
@tonysyu tonysyu pushed a commit to tonysyu/scipy that referenced this issue May 9, 2013
@pv pv TST: spatial/qhull: add tests for finding simplices in nearly degener…
…ate cases

Related to #1605
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment