You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a precomputed distance matrix that I want to find the medoids for. According to the scikit-learn docs, there's a parameter and attribute that you have to set and call in order to retrieve these medoids. When I set the parameter store_centers="medoid" and call the attribute .medoids_ I receive this error:
Traceback (most recent call last):
File "C:\Users\Desktop\Clustering\Model.py", line 163, in <module>
cluster(df, 'test.txt')
File "C:\Users\Desktop\Clustering\Model.py", line 139, in cluster
clustering = hdb.fit(distance_matrix.tocsr())
File "C:\Users\Desktop\Clustering\venv\lib\site-packages\sklearn\cluster\_hdbscan\hdbscan.py", line 854, in fit
self._weighted_cluster_center(X)
in _weighted_cluster_center
dist_mat = pairwise_distances(
File "C:\Users\Desktop\Clustering\venv\lib\site-packages\sklearn\metrics\pairwise.py", line 2157, in pairwise_distances
X, _ = check_pairwise_arrays(
File "C:\Users\Desktop\Clustering\venv\lib\site-packages\sklearn\metrics\pairwise.py", line 184, in check_pairwise_arrays
raise ValueError(
ValueError: Precomputed metric requires shape (n_queries, n_indexed). Got (9, 2292) for 9 indexed.
I'm unsure as to how my square precomputed matrix is producing a 9x2292 array. Otherwise, the model works fine and I have no issues manually retrieving the medoids through a mse operation. The reason I want to produce the medoid's this way is in hopes of finding the variable eps for each cluster so that I can fit more data to the clusters.
Reproducible example:
from fuzzywuzzy import fuzz
from sklearn.cluster import HDBSCAN
from scipy.sparse import lil_matrix
import itertools
def dis_matrix(word_list):
count = 0
kw_index = {}
index_kw = {}
n = len(word_list)
distance_matrix = lil_matrix((n, n))
for kw in word_list:
kw_index[kw] = count
index_kw[count] = kw
count += 1
for x, y in itertools.product(word_list,word_list):
d = fuzz.ratio(x,y) / 100
distance = 1 - d if d <= 1 else 0.00000000000001
index1 = kw_index[x]
index2 = kw_index[y]
distance_matrix[index1, index2] = distance
distance_matrix[index2, index1] = distance
return distance_matrix, index_kw
CLUSTERING_MIN_SAMPLES = 2
x = ['apple', 'app', 'banana', 'bannana', 'applesauce', 'peaches', 'peach', "appban"]
distance_matrix, index_kw = dis_matrix(x)
hdb = HDBSCAN(cluster_selection_epsilon=.1, metric='precomputed', n_jobs=8, min_samples=CLUSTERING_MIN_SAMPLES,store_centers='medoid')
clustering = hdb.fit(distance_matrix.tocsr())
print(clustering.medoids_)```
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I have a precomputed distance matrix that I want to find the medoids for. According to the scikit-learn docs, there's a parameter and attribute that you have to set and call in order to retrieve these medoids. When I set the parameter store_centers="medoid" and call the attribute .medoids_ I receive this error:
I'm unsure as to how my square precomputed matrix is producing a 9x2292 array. Otherwise, the model works fine and I have no issues manually retrieving the medoids through a mse operation. The reason I want to produce the medoid's this way is in hopes of finding the variable eps for each cluster so that I can fit more data to the clusters.
Reproducible example:
Beta Was this translation helpful? Give feedback.
All reactions