New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+2] Fix trustworthiness custom metric #9775
[MRG+2] Fix trustworthiness custom metric #9775
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR.
sklearn/manifold/t_sne.py
Outdated
Returns | ||
------- | ||
trustworthiness : float | ||
Trustworthiness of the low-dimensional embedding. | ||
""" | ||
if precomputed: | ||
warnings.warn("The flag 'precomputed' has been deprecated in version" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check scikit-learn contributing guide to see howto deprecate the precomputed
attribute using a decorator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and precomputed
is not an attribute but a parameter. (my bad. sorry)
Can you add a test ? |
- Correct typos in output messages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please add an entry to doc/whats_new
sklearn/manifold/t_sne.py
Outdated
metric : string, or callable, optional, default 'euclidean' | ||
Which metric to use for computing pairwise distances between samples | ||
from the original input space. If metric is 'precomputed', X must be a | ||
matrix of pairwise distances. Otherwise, see the documentation of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you say "pairwise distances or squared distances" (either will work as this function only uses their rank, not their value).
sklearn/manifold/tests/test_t_sne.py
Outdated
# If a 'metric' different from 'precomputed' is specified, but the flag | ||
# 'precomputed' is set to True, the flag overwrites the parameter 'metric' | ||
# and so the 'precomputed' metric will be used. Indeed, in | ||
# http://scikit-learn.org/stable/developers/contributing.html#deprecation, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this reminder... But yes, this is the what we tend to do.
sklearn/manifold/tests/test_t_sne.py
Outdated
# and so the 'precomputed' metric will be used. Indeed, in | ||
# http://scikit-learn.org/stable/developers/contributing.html#deprecation, | ||
# the old parameter overwrites the new parameter. | ||
assert_raises(Exception, assert_warns, DeprecationWarning, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually we would not test for raising something as broad as Exception. Indeed, usually we'd check for a particular error message.
sklearn/manifold/tests/test_t_sne.py
Outdated
# Other metrics than 'euclidean' and 'precomputed' are unusual and must | ||
# raise a warning | ||
X = np.arange(100).reshape(50, 2) | ||
assert_warns(RuntimeWarning, trustworthiness, X, X, metric='manhattan') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually strongly believe we should not raise a warning here or make any statement that this is abnormal. Even if deriving the manifold from an unusual metric is weird, checking that neighborhoods are maintained in the embedded space is precisely how we determine whether the learnt manifold was appropriate.
- indicate that one can use pairwise squared distances instead of distances - remove useless reminder - remove warning if metric used is unusual - update doc/whats_new
doc/whats_new/v0.20.rst
Outdated
``metric`` should be used with any compatible metric including | ||
'precomputed', in which case the input matrix ``X`` should be a matrix of | ||
pairwise distances or squared distances. :issue:`9736` by | ||
:user:`Joel Nothman <jnothman>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be your name and the issue number should be 9775
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sklearn/manifold/tests/test_t_sne.py
Outdated
# 'precomputed' | ||
X = np.array([[1, 1], [1, 0], [2, 2]]) | ||
assert_almost_equal(trustworthiness(X, X, n_neighbors=1, metric='cosine'), | ||
0.66, decimal=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.66 appears drawn from thin air. Instead benchmark against the corresponding precomputed distances
usually the pr. feel free to contribute it to contributing guide
|
having the pr there makes it easier for us to see how well the changelog
covers the set of commits in a release
|
- change user name and issue number in whats_new - improve test of metric different than euclidean and precomputed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM
sklearn/manifold/t_sne.py
Outdated
"0.20 and will be removed in 0.22. See 'metric' " | ||
"parameter instead.", DeprecationWarning) | ||
metric = 'precomputed' | ||
if metric == 'precomputed': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think pairwise_distances handled precomputed, so we don't need a special case here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I will change this
sklearn/manifold/t_sne.py
Outdated
dist_X = X | ||
elif metric == 'euclidean': | ||
dist_X = pairwise_distances(X, metric='euclidean', squared=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I similarly don't think it's essential to specially handle Euclidean, but at least it saves some computation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change this too
- Also, the error thrown when passing an array of vectors instead of pairwise distances when 'precomputed' is used is not the same, so I changed the IndexError into ValueError in test_trustworthiness_precomputed_deprecation
LGTM. +1 for merge. Will merge once the merge conflicts are resolved! |
# Conflicts: # doc/whats_new/v0.20.rst # sklearn/manifold/t_sne.py
Thanks ! I just resolved the conflicts |
Decomposition, manifold learning and clustering | ||
|
||
- Deprecate ``precomputed`` parameter in function | ||
:func:`manifold.t_sne.trustworthiness`. Instead, the new parameter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have a separate entry for enhancement it should be written as such. For example, trustworthiness now accepts a metric other than Euclidean.
@wdevazelhes I made the what's new changes and a change NOTE to FIXME. I think that we have more chance to find it during maintenance. Merging now. Thanks!!! |
Thanks ! |
Reference Issue
Fixes #9736
What does this implement/fix? Explain your changes.