Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG +1] Support metric='precomputed' in nearest neighbors #4090

Closed
wants to merge 9 commits into from

Conversation

jnothman
Copy link
Member

This fixes up #2532 (sorry to not offer the changes back there, @robertlayton: the rebase was too messy with my changes to validation in sklearn.metrics.pairwise) to handle queries that differ from the index data. This case is a little awkward in that the index data doesn't matter, but really should be supported for KNNClassifier et al.)

This also provides stronger validation to precomputed metrics in `pairwise_{distances,kernels}

@amueller
Copy link
Member

I think the KNeighborsClassifier then needs a _pairwise property to work with GridSearchCV and cross_val_score, like here

@jnothman
Copy link
Member Author

Good point.

@amueller
Copy link
Member

(Also the regressor and possible other classes I didn't look at the details yet)

@jnothman
Copy link
Member Author

(Also the regressor and possible other classes I didn't look at the details yet)

Of course; anything that can fit over a precomputed metric.

@jnothman
Copy link
Member Author

@amueller I've added _pairwise and a test

@jnothman jnothman force-pushed the pr2532 branch 2 times, most recently from c85c0f3 to 3e22ac1 Compare January 18, 2015 12:04
@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) when pulling 3e22ac1 on jnothman:pr2532 into d9ecf46 on scikit-learn:master.

@@ -919,7 +931,9 @@ def chi2_kernel(X, Y=None, gamma=1.):
'euclidean': euclidean_distances,
'l2': euclidean_distances,
'l1': manhattan_distances,
'manhattan': manhattan_distances, }
'manhattan': manhattan_distances,
'precomputed': None, # HACK: precomputed is always allowed, never called
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this become necessary here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, got it. For some reason github thinks this is part of chi2_kernel.

@amueller
Copy link
Member

I wonder if we can have a good test that ensures whether we forgot to add _pairwise.
We could add a pipeline test on estimator basis, that would at least ensure that it is not broken in the future.

@@ -1092,6 +1106,7 @@ def pairwise_distances(X, Y=None, metric="euclidean", n_jobs=1, **kwds):
"callable" % (metric, _VALID_METRICS))

if metric == "precomputed":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metric= "precomputed" is not documented in the function, right?

@amueller amueller changed the title [MRG] Support metric='precomputed' in nearest neighbors [MRG +1] Support metric='precomputed' in nearest neighbors Jan 20, 2015
@amueller
Copy link
Member

apart from my minor comments LGTM.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) when pulling 178ccb6 on jnothman:pr2532 into d9ecf46 on scikit-learn:master.

@jnothman
Copy link
Member Author

rebased and awaiting final review.

@amueller
Copy link
Member

travis is unhappy. Btw in the initial comment you said something about X=None which we now support in master...

@jnothman
Copy link
Member Author

jnothman commented Jun 3, 2015

Thanks for both those pointers (a while ago), @amueller. I hope this is fixed now.

@jnothman
Copy link
Member Author

jnothman commented Jun 3, 2015

So again, someone else's review is welcome.

@amueller
Copy link
Member

amueller commented Jun 3, 2015

travis is still failing.

@jnothman
Copy link
Member Author

jnothman commented Jun 3, 2015

boo!

@@ -100,6 +102,75 @@ def test_unsupervised_inputs():
assert_array_almost_equal(ind1, ind2)


def test_precomputed():
"""Tests unsupervised NearestNeighbors with a distance matrix."""
# Note: smaller samples may result in spurious test success
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to use a random number generator created in this test, to avoid having side effects across tests.

GaelVaroquaux added a commit that referenced this pull request Aug 30, 2015
Support metric='precomputed' in nearest neighbors [rebased version of #4090]
@GaelVaroquaux
Copy link
Member

Merged via #5188

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants