Skip to content

Commit 3eacf94

Browse files
sjtrnyNicolasHug
authored andcommitted
Set diagonal of precomputed matrix to zero in silhoutte_samples (#12258)
1 parent 01ba635 commit 3eacf94

File tree

3 files changed

+35
-2
lines changed

3 files changed

+35
-2
lines changed

doc/whats_new/v0.22.rst

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -224,8 +224,15 @@ Changelog
224224
to return root mean squared error.
225225
:pr:`13467` by :user:`Urvang Patel <urvang96>`.
226226

227+
:mod:`sklearn.metrics`
228+
......................
229+
230+
- |Fix| Raise a ValueError in :func:`metrics.silhouette_score` when a
231+
precomputed distance matrix contains non-zero diagonal entries.
232+
:pr:`12258` by :user:`Stephen Tierney <sjtrny>`.
233+
227234
:mod:`sklearn.model_selection`
228-
...............................
235+
..............................
229236

230237
- |Enhancement| :class:`model_selection.learning_curve` now accepts parameter
231238
``return_times`` which can be used to retrieve computation times in order to

sklearn/metrics/cluster/tests/test_unsupervised.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,22 @@ def test_non_numpy_labels():
168168
silhouette_score(list(X), list(y)) == silhouette_score(X, y))
169169

170170

171+
def test_silhouette_nonzero_diag():
172+
# Construct a zero-diagonal matrix
173+
dists = pairwise_distances(
174+
np.array([[0.2, 0.1, 0.12, 1.34, 1.11, 1.6]]).transpose())
175+
176+
# Construct a nonzero-diagonal distance matrix
177+
diag_dists = dists.copy()
178+
np.fill_diagonal(diag_dists, 1)
179+
180+
labels = [0, 0, 0, 1, 1, 1]
181+
182+
assert_raise_message(ValueError, "distance matrix contains non-zero",
183+
silhouette_samples,
184+
diag_dists, labels, metric='precomputed')
185+
186+
171187
def assert_raises_on_only_one_label(func):
172188
"""Assert message when there is only one label"""
173189
rng = np.random.RandomState(seed=0)

sklearn/metrics/cluster/unsupervised.py

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -185,7 +185,8 @@ def silhouette_samples(X, labels, metric='euclidean', **kwds):
185185
The metric to use when calculating distance between instances in a
186186
feature array. If metric is a string, it must be one of the options
187187
allowed by :func:`sklearn.metrics.pairwise.pairwise_distances`. If X is
188-
the distance array itself, use "precomputed" as the metric.
188+
the distance array itself, use "precomputed" as the metric. Precomputed
189+
distance matrices must have 0 along the diagonal.
189190
190191
`**kwds` : optional keyword parameters
191192
Any further parameters are passed directly to the distance function.
@@ -210,6 +211,15 @@ def silhouette_samples(X, labels, metric='euclidean', **kwds):
210211
211212
"""
212213
X, labels = check_X_y(X, labels, accept_sparse=['csc', 'csr'])
214+
215+
# Check for diagonal entries in precomputed distance matrix
216+
if metric == 'precomputed':
217+
if np.any(np.diagonal(X)):
218+
raise ValueError(
219+
'The precomputed distance matrix contains non-zero '
220+
'elements on the diagonal. Use np.fill_diagonal(X, 0).'
221+
)
222+
213223
le = LabelEncoder()
214224
labels = le.fit_transform(labels)
215225
n_samples = len(labels)

0 commit comments

Comments
 (0)