Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Add error message when X passed to silhouette score is 1d (fixes #6190) #6278

Conversation

yenchenlin
Copy link
Contributor

I've added an error message to tell user that first argument passed into silhouette_score cannot be 1d.

Solve #6190
Can @amueller please review this?

@yenchenlin yenchenlin changed the title Add error message when X passed to silhouette score is 1d [MRG] Add error message when X passed to silhouette score is 1d Feb 5, 2016
@yenchenlin
Copy link
Contributor Author

I've tested this implementation with the following script:

import timeit
import numpy as np
from sklearn.metrics.cluster.unsupervised import silhouette_score

def test_silhouette_score():
    random_state = np.random.RandomState(seed=0)
    random_clusters = random_state.randint(low=0, high=2, size=(10 ** 3, 100))
    random_clusters2 = random_state.randint(low=0, high=2, size=10 ** 3)

    silhouette_score(random_clusters, random_clusters2)

time = timeit.Timer(test_silhouette_score)
print min(time.repeat(number=10 ** 3))

This script calls silhouette_score 1000 times with input data which consists of 1000 samples and 100 features.

original version prints 51.3936669827 s,
while the modified version prints 52.0506219864 s,
which means it barely affects the performance.

@yenchenlin yenchenlin force-pushed the add-silhouette-score-error-message branch 3 times, most recently from e280a1b to 3ff283e Compare February 7, 2016 04:46
@yenchenlin yenchenlin changed the title [MRG] Add error message when X passed to silhouette score is 1d [MRG] Add error message when X passed to silhouette score is 1d (fixes #6190) Feb 8, 2016
@@ -81,6 +82,8 @@ def silhouette_score(X, labels, metric='euclidean', sample_size=None,
<http://en.wikipedia.org/wiki/Silhouette_(clustering)>`_

"""
if np.array(X).ndim == 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should do X = np.asarray(X) here and check to avoid another array conversion inside check_X_y if X is a list etc

@MechCoder
Copy link
Member

LGTM except 2 nitpicks

@yenchenlin yenchenlin force-pushed the add-silhouette-score-error-message branch from 3ff283e to 15f32c7 Compare February 9, 2016 15:00
@yenchenlin yenchenlin force-pushed the add-silhouette-score-error-message branch from 15f32c7 to b9566ca Compare February 9, 2016 15:05
@yenchenlin
Copy link
Contributor Author

Hello @MechCoder ,
Thanks for your opinion.

I've modified the code, can you please have a look?

@@ -81,6 +82,9 @@ def silhouette_score(X, labels, metric='euclidean', sample_size=None,
<http://en.wikipedia.org/wiki/Silhouette_(clustering)>`_

"""
X = np.asarray(X)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do it after check_X_y and you don't have to call asarray. I was hoping there was an option to check_X_y to enforce this, but it doesn't look like this.

@yenchenlin
Copy link
Contributor Author

Close since it is fixed.

@yenchenlin yenchenlin closed this Aug 10, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants