Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] Fix trustworthiness custom metric #9775

Merged

Conversation

Projects
None yet
5 participants
@wdevazelhes
Copy link
Contributor

wdevazelhes commented Sep 15, 2017

Reference Issue

Fixes #9736

What does this implement/fix? Explain your changes.

  • It changes the way to specify a precomputed metric into a more standard way: metric='precomputed' instead of a flag precomputed=False/True.
  • In the meantime it allows to use a custom metric for the input space
  • However, it warns the user if they want to use another metric than euclidean and precomputed, since this is not standard.
  • If the specified metric is 'euclidean', it keeps the 'squared' parameter in pairwise_distances from the original code.
  • It adds a deprecation warning for the users who will still use the previous flag 'precomputed'
@wdevazelhes

This comment has been minimized.

Copy link
Owner Author

wdevazelhes commented on c632f39 Sep 14, 2017

@bellet @nvauquie what do you think ? I tried to:

  • allow to use custom metric for the X (including precomputed), while keeping the square in case of euclidean metric
  • warn the user if they use another metric than euclidean and precomputed
  • warn the deprecation of the flag 'precomputed'
@massich
Copy link
Contributor

massich left a comment

Thanks for the PR.

Returns
-------
trustworthiness : float
Trustworthiness of the low-dimensional embedding.
"""
if precomputed:
warnings.warn("The flag 'precomputed' has been deprecated in version"

This comment has been minimized.

Copy link
@massich

massich Sep 15, 2017

Contributor

Check scikit-learn contributing guide to see howto deprecate the precomputed attribute using a decorator.

This comment has been minimized.

Copy link
@massich

massich Sep 15, 2017

Contributor

and precomputed is not an attribute but a parameter. (my bad. sorry)

@massich

This comment has been minimized.

Copy link
Contributor

massich commented Sep 15, 2017

Can you add a test ?
You can use test_grid_search_fit_params_deprecation() as example (see here)

wdevazelhes added some commits Sep 15, 2017

- Adds tests for deprecation and warnings
- Correct typos in output messages
@jnothman
Copy link
Member

jnothman left a comment

Also, please add an entry to doc/whats_new

metric : string, or callable, optional, default 'euclidean'
Which metric to use for computing pairwise distances between samples
from the original input space. If metric is 'precomputed', X must be a
matrix of pairwise distances. Otherwise, see the documentation of

This comment has been minimized.

Copy link
@jnothman

jnothman Sep 18, 2017

Member

Can you say "pairwise distances or squared distances" (either will work as this function only uses their rank, not their value).

# If a 'metric' different from 'precomputed' is specified, but the flag
# 'precomputed' is set to True, the flag overwrites the parameter 'metric'
# and so the 'precomputed' metric will be used. Indeed, in
# http://scikit-learn.org/stable/developers/contributing.html#deprecation,

This comment has been minimized.

Copy link
@jnothman

jnothman Sep 18, 2017

Member

I don't think we need this reminder... But yes, this is the what we tend to do.

# and so the 'precomputed' metric will be used. Indeed, in
# http://scikit-learn.org/stable/developers/contributing.html#deprecation,
# the old parameter overwrites the new parameter.
assert_raises(Exception, assert_warns, DeprecationWarning,

This comment has been minimized.

Copy link
@jnothman

jnothman Sep 18, 2017

Member

Usually we would not test for raising something as broad as Exception. Indeed, usually we'd check for a particular error message.

# Other metrics than 'euclidean' and 'precomputed' are unusual and must
# raise a warning
X = np.arange(100).reshape(50, 2)
assert_warns(RuntimeWarning, trustworthiness, X, X, metric='manhattan')

This comment has been minimized.

Copy link
@jnothman

jnothman Sep 18, 2017

Member

I actually strongly believe we should not raise a warning here or make any statement that this is abnormal. Even if deriving the manifold from an unusual metric is weird, checking that neighborhoods are maintained in the embedded space is precisely how we determine whether the learnt manifold was appropriate.

Update commit after comments:
- indicate that one can use pairwise squared distances instead of distances
- remove useless reminder
- remove warning if metric used is unusual
- update doc/whats_new
``metric`` should be used with any compatible metric including
'precomputed', in which case the input matrix ``X`` should be a matrix of
pairwise distances or squared distances. :issue:`9736` by
:user:`Joel Nothman <jnothman>`.

This comment has been minimized.

Copy link
@jnothman

jnothman Sep 19, 2017

Member

This should be your name and the issue number should be 9775

This comment has been minimized.

Copy link
@massich

massich Sep 19, 2017

Contributor

@jnothman 9775 is the id of the PR not the issue. Do we refer to the issue or to the PR? here we are referring to the issue. Is there a defined criteria? If so, I think we should add it in the contributing guide here and be part of the reviewing checklist discussed in #9653

# 'precomputed'
X = np.array([[1, 1], [1, 0], [2, 2]])
assert_almost_equal(trustworthiness(X, X, n_neighbors=1, metric='cosine'),
0.66, decimal=1)

This comment has been minimized.

Copy link
@jnothman

jnothman Sep 19, 2017

Member

.66 appears drawn from thin air. Instead benchmark against the corresponding precomputed distances

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Sep 19, 2017

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Sep 19, 2017

Modifications after comments:
- change user name and issue number in whats_new
- improve test of metric different than euclidean and precomputed
@jnothman
Copy link
Member

jnothman left a comment

Otherwise LGTM

"0.20 and will be removed in 0.22. See 'metric' "
"parameter instead.", DeprecationWarning)
metric = 'precomputed'
if metric == 'precomputed':

This comment has been minimized.

Copy link
@jnothman

jnothman Sep 19, 2017

Member

I think pairwise_distances handled precomputed, so we don't need a special case here

This comment has been minimized.

Copy link
@wdevazelhes

wdevazelhes Sep 20, 2017

Author Contributor

Thanks, I will change this

dist_X = X
elif metric == 'euclidean':
dist_X = pairwise_distances(X, metric='euclidean', squared=True)

This comment has been minimized.

Copy link
@jnothman

jnothman Sep 19, 2017

Member

I similarly don't think it's essential to specially handle Euclidean, but at least it saves some computation

This comment has been minimized.

Copy link
@wdevazelhes

wdevazelhes Sep 20, 2017

Author Contributor

I will change this too

@jnothman jnothman changed the title [MRG] Fix trustworthiness custom metric [MRG+1] Fix trustworthiness custom metric Sep 19, 2017

Simplify calling pairwise_distances function
- Also, the error thrown when passing an array of vectors instead of pairwise distances
when 'precomputed' is used is not the same, so I changed the IndexError into ValueError
in test_trustworthiness_precomputed_deprecation

@GaelVaroquaux GaelVaroquaux changed the title [MRG+1] Fix trustworthiness custom metric [MRG+2] Fix trustworthiness custom metric Apr 18, 2018

@GaelVaroquaux

This comment has been minimized.

Copy link
Member

GaelVaroquaux commented Apr 18, 2018

LGTM. +1 for merge.

Will merge once the merge conflicts are resolved!

wdevazelhes added some commits Apr 25, 2018

Merge branch 'master' into 9736-trustworthiness-custom-metric
# Conflicts:
#	doc/whats_new/v0.20.rst
#	sklearn/manifold/t_sne.py
@wdevazelhes

This comment has been minimized.

Copy link
Contributor Author

wdevazelhes commented Apr 25, 2018

Thanks ! I just resolved the conflicts

Decomposition, manifold learning and clustering

- Deprecate ``precomputed`` parameter in function
:func:`manifold.t_sne.trustworthiness`. Instead, the new parameter

This comment has been minimized.

Copy link
@jnothman

jnothman Apr 25, 2018

Member

If we have a separate entry for enhancement it should be written as such. For example, trustworthiness now accepts a metric other than Euclidean.

glemaitre added some commits Apr 26, 2018

@glemaitre

This comment has been minimized.

Copy link
Contributor

glemaitre commented Apr 26, 2018

@wdevazelhes I made the what's new changes and a change NOTE to FIXME. I think that we have more chance to find it during maintenance.

Merging now. Thanks!!!

@glemaitre glemaitre merged commit 4aaf45b into scikit-learn:master Apr 26, 2018

0 of 5 checks passed

ci/circleci: python2 CircleCI is running your tests
Details
ci/circleci: python3 CircleCI is running your tests
Details
continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
lgtm analysis: Python Running analyses for revisions
Details
@wdevazelhes

This comment has been minimized.

Copy link
Contributor Author

wdevazelhes commented Apr 26, 2018

Thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.