-
-
Notifications
You must be signed in to change notification settings - Fork 26.1k
[MRG+1] Issue #8173 - Passing n_neighbors to compute MI #8181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@jnothman Do you see additional unit test(s) or this is enough. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be sure, can you please remove the default n_neighbors from _compute_mi if this harms nothing?
@@ -5,7 +5,8 @@ | |||
from scipy.sparse import csr_matrix | |||
|
|||
from sklearn.utils.testing import (assert_array_equal, assert_almost_equal, | |||
assert_false, assert_raises, assert_equal) | |||
assert_false, assert_raises, assert_equal, | |||
assert_array_almost_equal) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the preference is for assert_allclose
I did it and then I had a second thought. |
assert_allclose(mi, [0.06987399, 0.03197151, 0.21946924], rtol=1e-6) | ||
mi_7 = mutual_info_classif(X, y, discrete_features=[2], n_neighbors=7, | ||
random_state=0) | ||
assert_allclose(mi_7, [0.0735522, 0.0343685, 0.2194692], rtol=1e-5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like tests that hardcode numerical values that are true only for randomly generated but with fixed see data. Is there a better way to test the impact of n_neighbors
without hardcoding the values?
I think checking that the MI changes with n_neighbors would be fine?
…On 18 January 2017 at 02:24, Olivier Grisel ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In sklearn/feature_selection/tests/test_mutual_info.py
<#8181 (review)>
:
> assert_array_equal(np.argsort(-mi), [2, 0, 1])
+ assert_allclose(mi, [0.06987399, 0.03197151, 0.21946924], rtol=1e-6)
+ mi_7 = mutual_info_classif(X, y, discrete_features=[2], n_neighbors=7,
+ random_state=0)
+ assert_allclose(mi_7, [0.0735522, 0.0343685, 0.2194692], rtol=1e-5)
I don't really like tests that hardcode numerical values that are true
only for randomly generated but with fixed see data. Is there a better way
to test the impact of n_neighbors without hardcoding the values?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8181 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6zKiPLJMV8BChk0v2M8CddRWNG5Kks5rTN0xgaJpZM4Lfpn9>
.
|
for n_neighbors in [5, 7, 9]: | ||
mi_nn = mutual_info_classif(X, y, discrete_features=[2], | ||
n_neighbors=n_neighbors, random_state=0) | ||
# Check that the continuous values have an higher MI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would help to say "with greater n_neighbors"
ce35545
to
05d0585
Compare
LGTM |
@ogrisel for a second review |
LGTM thanks @glemaitre |
Reference Issue
Fixes #8173
What does this implement/fix? Explain your changes.
The parameters
n_neighbors
is passed to the functioncompute_mi
in_estimate_mi
.Previously this parameter was not given to
compute_mi
. The user could not set thisparameter as indicated in the documentation.
Any other comments?
A single test has been added in a classification case.
The computation of the mutual information was in fact correct since it was already tested
for different neighbours