Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] Fix trustworthiness custom metric #9775

Merged

Conversation

wdevazelhes
Copy link
Contributor

@wdevazelhes wdevazelhes commented Sep 15, 2017

Reference Issue

Fixes #9736

What does this implement/fix? Explain your changes.

  • It changes the way to specify a precomputed metric into a more standard way: metric='precomputed' instead of a flag precomputed=False/True.
  • In the meantime it allows to use a custom metric for the input space
  • However, it warns the user if they want to use another metric than euclidean and precomputed, since this is not standard.
  • If the specified metric is 'euclidean', it keeps the 'squared' parameter in pairwise_distances from the original code.
  • It adds a deprecation warning for the users who will still use the previous flag 'precomputed'

Copy link
Contributor

@massich massich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.

Returns
-------
trustworthiness : float
Trustworthiness of the low-dimensional embedding.
"""
if precomputed:
warnings.warn("The flag 'precomputed' has been deprecated in version"
Copy link
Contributor

@massich massich Sep 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check scikit-learn contributing guide to see howto deprecate the precomputed attribute using a decorator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and precomputed is not an attribute but a parameter. (my bad. sorry)

@massich
Copy link
Contributor

massich commented Sep 15, 2017

Can you add a test ?
You can use test_grid_search_fit_params_deprecation() as example (see here)

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please add an entry to doc/whats_new

metric : string, or callable, optional, default 'euclidean'
Which metric to use for computing pairwise distances between samples
from the original input space. If metric is 'precomputed', X must be a
matrix of pairwise distances. Otherwise, see the documentation of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you say "pairwise distances or squared distances" (either will work as this function only uses their rank, not their value).

# If a 'metric' different from 'precomputed' is specified, but the flag
# 'precomputed' is set to True, the flag overwrites the parameter 'metric'
# and so the 'precomputed' metric will be used. Indeed, in
# http://scikit-learn.org/stable/developers/contributing.html#deprecation,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this reminder... But yes, this is the what we tend to do.

# and so the 'precomputed' metric will be used. Indeed, in
# http://scikit-learn.org/stable/developers/contributing.html#deprecation,
# the old parameter overwrites the new parameter.
assert_raises(Exception, assert_warns, DeprecationWarning,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually we would not test for raising something as broad as Exception. Indeed, usually we'd check for a particular error message.

# Other metrics than 'euclidean' and 'precomputed' are unusual and must
# raise a warning
X = np.arange(100).reshape(50, 2)
assert_warns(RuntimeWarning, trustworthiness, X, X, metric='manhattan')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually strongly believe we should not raise a warning here or make any statement that this is abnormal. Even if deriving the manifold from an unusual metric is weird, checking that neighborhoods are maintained in the embedded space is precisely how we determine whether the learnt manifold was appropriate.

- indicate that one can use pairwise squared distances instead of distances
- remove useless reminder
- remove warning if metric used is unusual
- update doc/whats_new
``metric`` should be used with any compatible metric including
'precomputed', in which case the input matrix ``X`` should be a matrix of
pairwise distances or squared distances. :issue:`9736` by
:user:`Joel Nothman <jnothman>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be your name and the issue number should be 9775

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jnothman 9775 is the id of the PR not the issue. Do we refer to the issue or to the PR? here we are referring to the issue. Is there a defined criteria? If so, I think we should add it in the contributing guide here and be part of the reviewing checklist discussed in #9653

# 'precomputed'
X = np.array([[1, 1], [1, 0], [2, 2]])
assert_almost_equal(trustworthiness(X, X, n_neighbors=1, metric='cosine'),
0.66, decimal=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.66 appears drawn from thin air. Instead benchmark against the corresponding precomputed distances

@jnothman
Copy link
Member

jnothman commented Sep 19, 2017 via email

@jnothman
Copy link
Member

jnothman commented Sep 19, 2017 via email

- change user name and issue number in whats_new
- improve test of metric different than euclidean and precomputed
Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM

"0.20 and will be removed in 0.22. See 'metric' "
"parameter instead.", DeprecationWarning)
metric = 'precomputed'
if metric == 'precomputed':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think pairwise_distances handled precomputed, so we don't need a special case here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I will change this

dist_X = X
elif metric == 'euclidean':
dist_X = pairwise_distances(X, metric='euclidean', squared=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I similarly don't think it's essential to specially handle Euclidean, but at least it saves some computation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change this too

@jnothman jnothman changed the title [MRG] Fix trustworthiness custom metric [MRG+1] Fix trustworthiness custom metric Sep 19, 2017
- Also, the error thrown when passing an array of vectors instead of pairwise distances
when 'precomputed' is used is not the same, so I changed the IndexError into ValueError
in test_trustworthiness_precomputed_deprecation
@GaelVaroquaux GaelVaroquaux changed the title [MRG+1] Fix trustworthiness custom metric [MRG+2] Fix trustworthiness custom metric Apr 18, 2018
@GaelVaroquaux
Copy link
Member

LGTM. +1 for merge.

Will merge once the merge conflicts are resolved!

William de Vazelhes added 2 commits April 25, 2018 14:55
# Conflicts:
#	doc/whats_new/v0.20.rst
#	sklearn/manifold/t_sne.py
@wdevazelhes
Copy link
Contributor Author

wdevazelhes commented Apr 25, 2018

Thanks ! I just resolved the conflicts

Decomposition, manifold learning and clustering

- Deprecate ``precomputed`` parameter in function
:func:`manifold.t_sne.trustworthiness`. Instead, the new parameter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have a separate entry for enhancement it should be written as such. For example, trustworthiness now accepts a metric other than Euclidean.

@glemaitre
Copy link
Member

@wdevazelhes I made the what's new changes and a change NOTE to FIXME. I think that we have more chance to find it during maintenance.

Merging now. Thanks!!!

@glemaitre glemaitre merged commit 4aaf45b into scikit-learn:master Apr 26, 2018
@wdevazelhes
Copy link
Contributor Author

Thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sklearn.manifold.t_sne.trustworthiness should allow custom metric
5 participants