Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example document_clustering.py has broken #6317

Closed
yenchenlin opened this issue Feb 9, 2016 · 5 comments · Fixed by #7170
Closed

Example document_clustering.py has broken #6317

yenchenlin opened this issue Feb 9, 2016 · 5 comments · Fixed by #7170
Labels

Comments

@yenchenlin
Copy link
Contributor

It seems that examples/text/document_clustering.py has broken since silhouette_score don't accept sparse matrix.

Following is the error messages:
Traceback (most recent call last): File "examples/text/document_clustering.py", line 202, in <module> % metrics.silhouette_score(X, km.labels_, sample_size=1000)) File "/Users/YenChen/Desktop/Python/scikit-learn/sklearn/metrics/cluster/unsupervised.py", line 84, in silhouette_score X, labels = check_X_y(X, labels) File "/Users/YenChen/Desktop/Python/scikit-learn/sklearn/utils/validation.py", line 516, in check_X_y ensure_min_features, warn_on_dtype, estimator) File "/Users/YenChen/Desktop/Python/scikit-learn/sklearn/utils/validation.py", line 375, in check_array force_all_finite) File "/Users/YenChen/Desktop/Python/scikit-learn/sklearn/utils/validation.py", line 242, in _ensure_sparse_format raise TypeError('A sparse matrix was passed, but dense ' TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.

@amueller
Copy link
Member

amueller commented Feb 9, 2016

this is weird. ping @MechCoder

@amueller amueller added the Bug label Feb 9, 2016
@yenchenlin
Copy link
Contributor Author

Hi @amueller , this error occurs because a sparse matrix is passed into silhouette_score in examples/text/document_clustering.py.
Here:
https://github.com/scikit-learn/scikit-learn/blob/master/examples/text/document_clustering.py#L202

However, silhouette_score will call check_X_y, which set its argument accept_sparse=None by default, and then produce the error I mentioned above.

@ssaeger
Copy link

ssaeger commented Feb 20, 2016

Hi, I was able to reproduce this and indeed it seems to be the case that accept_sparse=None produces the error here.
If nobody is working on this I would like to fix the bug and and add a test to ensure silhouette_score works with sparse matrix.

@yenchenlin
Copy link
Contributor Author

Hello @ssaeger ,
I'm still working on this.

Does @amueller think silhouette_score should accept sparse matrix?
If so, I'm happy to work on this extension.

@ssaeger
Copy link

ssaeger commented Feb 21, 2016

Hi @yenchenlin1994,
great, then good luck and just in case you stop working on it, just let me know. ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants