Skip to content

[MRG+1] Issue #8173 - Passing n_neighbors to compute MI #8181

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 19, 2017

Conversation

glemaitre
Copy link
Member

Reference Issue

Fixes #8173

What does this implement/fix? Explain your changes.

The parameters n_neighbors is passed to the function compute_mi in _estimate_mi.
Previously this parameter was not given to compute_mi. The user could not set this
parameter as indicated in the documentation.

Any other comments?

A single test has been added in a classification case.
The computation of the mutual information was in fact correct since it was already tested
for different neighbours

@glemaitre glemaitre changed the title pass n_neighbors in _estimate_mi [MRG] Issue #8173 - Passing n_neighbors to compute MI Jan 10, 2017
@glemaitre
Copy link
Member Author

@jnothman Do you see additional unit test(s) or this is enough.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be sure, can you please remove the default n_neighbors from _compute_mi if this harms nothing?

@@ -5,7 +5,8 @@
from scipy.sparse import csr_matrix

from sklearn.utils.testing import (assert_array_equal, assert_almost_equal,
assert_false, assert_raises, assert_equal)
assert_false, assert_raises, assert_equal,
assert_array_almost_equal)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the preference is for assert_allclose

@glemaitre
Copy link
Member Author

Just to be sure, can you please remove the default n_neighbors from _compute_mi if this harms nothing?

I did it and then I had a second thought. _compute_mi is a wrapper to choose either the discrete or continuous version of the MI. There is no need for n_neighbors for the discrete. So it seems odd to required absolutely n_neighbors when calling the function for discrete case. I would assume that this is the reason of having a default value.

assert_allclose(mi, [0.06987399, 0.03197151, 0.21946924], rtol=1e-6)
mi_7 = mutual_info_classif(X, y, discrete_features=[2], n_neighbors=7,
random_state=0)
assert_allclose(mi_7, [0.0735522, 0.0343685, 0.2194692], rtol=1e-5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like tests that hardcode numerical values that are true only for randomly generated but with fixed see data. Is there a better way to test the impact of n_neighbors without hardcoding the values?

@jnothman
Copy link
Member

jnothman commented Jan 17, 2017 via email

for n_neighbors in [5, 7, 9]:
mi_nn = mutual_info_classif(X, y, discrete_features=[2],
n_neighbors=n_neighbors, random_state=0)
# Check that the continuous values have an higher MI
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would help to say "with greater n_neighbors"

@glemaitre glemaitre force-pushed the is/8173 branch 2 times, most recently from ce35545 to 05d0585 Compare January 18, 2017 18:17
@jnothman
Copy link
Member

LGTM

@jnothman jnothman changed the title [MRG] Issue #8173 - Passing n_neighbors to compute MI [MRG+1] Issue #8173 - Passing n_neighbors to compute MI Jan 18, 2017
@glemaitre
Copy link
Member Author

@ogrisel for a second review

@raghavrv
Copy link
Member

LGTM thanks @glemaitre

@raghavrv raghavrv merged commit aaebee1 into scikit-learn:master Jan 19, 2017
sergeyf pushed a commit to sergeyf/scikit-learn that referenced this pull request Feb 28, 2017
@Przemo10 Przemo10 mentioned this pull request Mar 17, 2017
Sundrique pushed a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017
NelleV pushed a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017
paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

n_neighbors param in mutual_info_regression doesn't do anything
4 participants