Skip to content

Commit

Permalink
replace kmeans() with kmeans2()
Browse files Browse the repository at this point in the history
This might be a solution for issue #10
  • Loading branch information
slowkow committed Feb 2, 2022
1 parent 0e5b69d commit 2fd234e
Show file tree
Hide file tree
Showing 3 changed files with 12 additions and 4 deletions.
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
# 0.0.6 - 2022-02-02

- Replace `scipy.cluster.vq.kmeans` with `scipy.cluster.vq.kmeans2` to address
issue #10 where we learned that kmeans does not always return k centroids,
but kmeans2 does return k centroids. Thanks to @onionpork and @DennisPost10
for reporting this.

# 0.0.5 - 2020-08-11

- Expose `max_iter_harmony` as a new top-level argument, in addition to the
Expand Down
7 changes: 4 additions & 3 deletions harmonypy/harmony.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@

import pandas as pd
import numpy as np
from scipy.cluster.vq import kmeans
# kmeans does not always return k centroids, but kmeans2 does
from scipy.cluster.vq import kmeans2
import logging

# create logger
Expand Down Expand Up @@ -185,8 +186,8 @@ def allocate_buffers(self):

def init_cluster(self):
# Start with cluster centroids
km = kmeans(self.Z_cos.T, self.K, iter=10)
self.Y = km[0].T
km_centroids, km_labels = kmeans2(self.Z_cos.T, self.K, minit='++')
self.Y = km_centroids.T
# (1) Normalize
self.Y = self.Y / np.linalg.norm(self.Y, ord=2, axis=0)
# (2) Assign cluster probabilities
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

setuptools.setup(
name = "harmonypy",
version = "0.0.5",
version = "0.0.6",
author = "Kamil Slowikowski",
author_email = "kslowikowski@gmail.com",
description = "A data integration algorithm.",
Expand Down

0 comments on commit 2fd234e

Please sign in to comment.