Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] algorithm='auto' should always work for nearest neighbors (continuation) #9145

Merged
merged 9 commits into from Jun 19, 2017

Conversation

@herilalaina
Copy link
Contributor

@herilalaina herilalaina commented Jun 17, 2017

Reference Issue

Fixes #4931, continuation of #5596

What does this implement/fix? Explain your changes.

Implement test for metric in ['mahalanobis', 'wminkowski', 'seuclidean']

Any other comments?

Should we warn the user when the algorithm is set into brute? (instead of auto)

if metric not in _metrics:
nn = neighbors.NearestNeighbors(n_neighbors=3, algorithm='auto',
metric=metric).fit(Xcsr)
if metric != "precomputed":
Copy link
Contributor Author

@herilalaina herilalaina Jun 17, 2017

Is that normal that I can't call kneighbors method on sparse matrix when metric='precomputed'

Copy link
Member

@jnothman jnothman Jun 17, 2017

That case is not implemented: it would need to be treated a bit differently to the precomputed dense array case; and it doesn't really make sense. An array containing distances which is mostly 0s is pretty useless for nearest neighbors. (Unless we're admitting negative distances, or having a different interpretation of 0s, both of which may be useful.)

Copy link
Member

@jnothman jnothman Jun 17, 2017

But I don't understand why precomputed should be in VALID_METRICS_SPARSE and hence why it's relevant tot his code path.

@herilalaina herilalaina changed the title [WIP] algorithm='auto' should always work for nearest neighbors (continuation) [MRG] algorithm='auto' should always work for nearest neighbors (continuation) Jun 17, 2017
Copy link
Member

@jnothman jnothman left a comment

otherwise LGTM

if metric not in _metrics:
nn = neighbors.NearestNeighbors(n_neighbors=3, algorithm='auto',
metric=metric).fit(Xcsr)
if metric != "precomputed":
Copy link
Member

@jnothman jnothman Jun 17, 2017

That case is not implemented: it would need to be treated a bit differently to the precomputed dense array case; and it doesn't really make sense. An array containing distances which is mostly 0s is pretty useless for nearest neighbors. (Unless we're admitting negative distances, or having a different interpretation of 0s, both of which may be useful.)

if metric not in _metrics:
nn = neighbors.NearestNeighbors(n_neighbors=3, algorithm='auto',
metric=metric).fit(Xcsr)
if metric != "precomputed":
Copy link
Member

@jnothman jnothman Jun 17, 2017

But I don't understand why precomputed should be in VALID_METRICS_SPARSE and hence why it's relevant tot his code path.

@@ -988,6 +990,38 @@ def custom_metric(x1, x2):
assert_array_almost_equal(dist1, dist2)


def test_algo_auto_metrics():
Copy link
Member

@jnothman jnothman Jun 17, 2017

a better name or comment would be appreciated

``'effective_metric_'`` is in the ``'VALID_METRICS'`` list of
``'ball_tree'``. It selects ``'brute'`` if :math:`k < N/2` and the
``'effective_metric_'`` is not in the ``'VALID_METRICS'`` list of
``'kd_tree'`` or ``'ball_tree'``. It selects ``'brute'`` if :math:`k >= N/2`. This choice is based on the assumption that the number of query points is at least the
Copy link
Member

@jnothman jnothman Jun 17, 2017

could you please keep this under 80 chars

@herilalaina
Copy link
Contributor Author

@herilalaina herilalaina commented Jun 17, 2017

Thanks for the review. The change has been done!

@jnothman jnothman changed the title [MRG] algorithm='auto' should always work for nearest neighbors (continuation) [MRG+1] algorithm='auto' should always work for nearest neighbors (continuation) Jun 17, 2017
@@ -988,6 +990,46 @@ def custom_metric(x1, x2):
assert_array_almost_equal(dist1, dist2)


def test_unsupported_metric_for_auto():
Copy link
Member

@jnothman jnothman Jun 17, 2017

I don't know what you mean by unsupported. It's clearly not unsupported if you're testing it works.

@jnothman jnothman added this to the 0.19 milestone Jun 18, 2017
X = rng.rand(12, 12)
Xcsr = csr_matrix(X)

# Metric which don't required any additional parameter
Copy link
Member

@amueller amueller Jun 19, 2017

I find the comment a bit misleading. I would rename the variable and say only test metrics that don't require additional arguments.
And maybe assert that some strange metric is in VALID_METRICS['brute'].

Checking that the test is non-empty, and more didactic variable name
@amueller amueller changed the title [MRG+1] algorithm='auto' should always work for nearest neighbors (continuation) [MRG+2] algorithm='auto' should always work for nearest neighbors (continuation) Jun 19, 2017
@amueller
Copy link
Member

@amueller amueller commented Jun 19, 2017

LGTM, I made minor minor changes to fix my nitpicks, merge on green.

@herilalaina
Copy link
Contributor Author

@herilalaina herilalaina commented Jun 19, 2017

Thank you!

@amueller amueller merged commit 60deaea into scikit-learn:master Jun 19, 2017
2 of 3 checks passed
@herilalaina herilalaina deleted the iss4931 branch Jun 19, 2017
dmohns added a commit to dmohns/scikit-learn that referenced this issue Aug 7, 2017
…ntinuation) (scikit-learn#9145)

* Merge neighbors.rst

* issue scikit-learn#4931

* Improve test implementation

* Update base.py

* Remove unused import

* Customize test for precomputed metric

* Change test function name

* rename _metrics to require_params, add set assert

Checking that the test is non-empty, and more didactic variable name

* Remove blank line
dmohns added a commit to dmohns/scikit-learn that referenced this issue Aug 7, 2017
…ntinuation) (scikit-learn#9145)

* Merge neighbors.rst

* issue scikit-learn#4931

* Improve test implementation

* Update base.py

* Remove unused import

* Customize test for precomputed metric

* Change test function name

* rename _metrics to require_params, add set assert

Checking that the test is non-empty, and more didactic variable name

* Remove blank line
NelleV added a commit to NelleV/scikit-learn that referenced this issue Aug 11, 2017
…ntinuation) (scikit-learn#9145)

* Merge neighbors.rst

* issue scikit-learn#4931

* Improve test implementation

* Update base.py

* Remove unused import

* Customize test for precomputed metric

* Change test function name

* rename _metrics to require_params, add set assert

Checking that the test is non-empty, and more didactic variable name

* Remove blank line
paulha added a commit to paulha/scikit-learn that referenced this issue Aug 19, 2017
…ntinuation) (scikit-learn#9145)

* Merge neighbors.rst

* issue scikit-learn#4931

* Improve test implementation

* Update base.py

* Remove unused import

* Customize test for precomputed metric

* Change test function name

* rename _metrics to require_params, add set assert

Checking that the test is non-empty, and more didactic variable name

* Remove blank line
AishwaryaRK added a commit to AishwaryaRK/scikit-learn that referenced this issue Aug 29, 2017
…ntinuation) (scikit-learn#9145)

* Merge neighbors.rst

* issue scikit-learn#4931

* Improve test implementation

* Update base.py

* Remove unused import

* Customize test for precomputed metric

* Change test function name

* rename _metrics to require_params, add set assert

Checking that the test is non-empty, and more didactic variable name

* Remove blank line
maskani-moh added a commit to maskani-moh/scikit-learn that referenced this issue Nov 15, 2017
…ntinuation) (scikit-learn#9145)

* Merge neighbors.rst

* issue scikit-learn#4931

* Improve test implementation

* Update base.py

* Remove unused import

* Customize test for precomputed metric

* Change test function name

* rename _metrics to require_params, add set assert

Checking that the test is non-empty, and more didactic variable name

* Remove blank line
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this issue Dec 18, 2017
…ntinuation) (scikit-learn#9145)

* Merge neighbors.rst

* issue scikit-learn#4931

* Improve test implementation

* Update base.py

* Remove unused import

* Customize test for precomputed metric

* Change test function name

* rename _metrics to require_params, add set assert

Checking that the test is non-empty, and more didactic variable name

* Remove blank line
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

4 participants