[MRG+2] Fix trustworthiness custom metric #9775

wdevazelhes · 2017-09-15T08:31:41Z

Reference Issue

Fixes #9736

What does this implement/fix? Explain your changes.

It changes the way to specify a precomputed metric into a more standard way: metric='precomputed' instead of a flag precomputed=False/True.
In the meantime it allows to use a custom metric for the input space
However, it warns the user if they want to use another metric than euclidean and precomputed, since this is not standard.
If the specified metric is 'euclidean', it keeps the 'squared' parameter in pairwise_distances from the original code.
It adds a deprecation warning for the users who will still use the previous flag 'precomputed'

…-learn#9736

massich

Thanks for the PR.

massich · 2017-09-15T08:57:29Z

sklearn/manifold/t_sne.py

    Returns
    -------
    trustworthiness : float
        Trustworthiness of the low-dimensional embedding.
    """
    if precomputed:
+        warnings.warn("The flag 'precomputed' has been deprecated in version"


Check scikit-learn contributing guide to see howto deprecate ~~the precomputed attribute using a decorator.~~

and precomputed is not an attribute but a parameter. (my bad. sorry)

massich · 2017-09-15T09:12:14Z

Can you add a test ?
You can use test_grid_search_fit_params_deprecation() as example (see here)

- Correct typos in output messages

jnothman

Also, please add an entry to doc/whats_new

jnothman · 2017-09-18T08:44:35Z

sklearn/manifold/t_sne.py

+    metric : string, or callable, optional, default 'euclidean'
+        Which metric to use for computing pairwise distances between samples
+        from the original input space. If metric is 'precomputed', X must be a
+        matrix of pairwise distances. Otherwise, see the documentation of


Can you say "pairwise distances or squared distances" (either will work as this function only uses their rank, not their value).

jnothman · 2017-09-18T08:48:23Z

sklearn/manifold/tests/test_t_sne.py

+    # If a 'metric' different from 'precomputed' is specified, but the flag
+    # 'precomputed' is set to True, the flag overwrites the parameter 'metric'
+    # and so the 'precomputed' metric will be used. Indeed, in
+    # http://scikit-learn.org/stable/developers/contributing.html#deprecation,


I don't think we need this reminder... But yes, this is the what we tend to do.

jnothman · 2017-09-18T08:48:43Z

sklearn/manifold/tests/test_t_sne.py

+    # and so the 'precomputed' metric will be used. Indeed, in
+    # http://scikit-learn.org/stable/developers/contributing.html#deprecation,
+    # the old parameter overwrites the new parameter.
+    assert_raises(Exception, assert_warns, DeprecationWarning,


Usually we would not test for raising something as broad as Exception. Indeed, usually we'd check for a particular error message.

jnothman · 2017-09-18T08:50:42Z

sklearn/manifold/tests/test_t_sne.py

+    # Other metrics than 'euclidean' and 'precomputed' are unusual and must
+    # raise a warning
+    X = np.arange(100).reshape(50, 2)
+    assert_warns(RuntimeWarning, trustworthiness, X, X, metric='manhattan')


I actually strongly believe we should not raise a warning here or make any statement that this is abnormal. Even if deriving the manifold from an unusual metric is weird, checking that neighborhoods are maintained in the embedded space is precisely how we determine whether the learnt manifold was appropriate.

- indicate that one can use pairwise squared distances instead of distances - remove useless reminder - remove warning if metric used is unusual - update doc/whats_new

jnothman · 2017-09-19T07:58:15Z

doc/whats_new/v0.20.rst

+  ``metric`` should be used with any compatible metric including
+  'precomputed', in which case the input matrix ``X`` should be a matrix of
+  pairwise distances or squared distances. :issue:`9736` by
+  :user:`Joel Nothman <jnothman>`.


This should be your name and the issue number should be 9775

@jnothman 9775 is the id of the PR not the issue. Do we refer to the issue or to the PR? here we are referring to the issue. Is there a defined criteria? If so, I think we should add it in the contributing guide here and be part of the reviewing checklist discussed in #9653

jnothman · 2017-09-19T07:59:16Z

sklearn/manifold/tests/test_t_sne.py

+    # 'precomputed'
+    X = np.array([[1, 1], [1, 0], [2, 2]])
+    assert_almost_equal(trustworthiness(X, X, n_neighbors=1, metric='cosine'),
+                        0.66, decimal=1)


.66 appears drawn from thin air. Instead benchmark against the corresponding precomputed distances

jnothman · 2017-09-19T08:38:18Z

usually the pr. feel free to contribute it to contributing guide

jnothman · 2017-09-19T08:39:19Z

having the pr there makes it easier for us to see how well the changelog covers the set of commits in a release

- change user name and issue number in whats_new - improve test of metric different than euclidean and precomputed

jnothman

Otherwise LGTM

jnothman · 2017-09-19T23:02:57Z

sklearn/manifold/t_sne.py

+                      "0.20 and will be removed in 0.22. See 'metric' "
+                      "parameter instead.", DeprecationWarning)
+        metric = 'precomputed'
+    if metric == 'precomputed':


I think pairwise_distances handled precomputed, so we don't need a special case here

Thanks, I will change this

jnothman · 2017-09-19T23:03:44Z

sklearn/manifold/t_sne.py

        dist_X = X
+    elif metric == 'euclidean':
+        dist_X = pairwise_distances(X, metric='euclidean', squared=True)


I similarly don't think it's essential to specially handle Euclidean, but at least it saves some computation

I will change this too

- Also, the error thrown when passing an array of vectors instead of pairwise distances when 'precomputed' is used is not the same, so I changed the IndexError into ValueError in test_trustworthiness_precomputed_deprecation

GaelVaroquaux · 2018-04-18T08:56:12Z

LGTM. +1 for merge.

Will merge once the merge conflicts are resolved!

# Conflicts: # doc/whats_new/v0.20.rst # sklearn/manifold/t_sne.py

…ant KNeighbors

wdevazelhes · 2018-04-25T14:10:31Z

Thanks ! I just resolved the conflicts

jnothman · 2018-04-25T22:03:09Z

doc/whats_new/v0.20.rst

+Decomposition, manifold learning and clustering
+
+- Deprecate ``precomputed`` parameter in function
+  :func:`manifold.t_sne.trustworthiness`. Instead, the new parameter


If we have a separate entry for enhancement it should be written as such. For example, trustworthiness now accepts a metric other than Euclidean.

glemaitre · 2018-04-26T08:30:29Z

@wdevazelhes I made the what's new changes and a change NOTE to FIXME. I think that we have more chance to find it during maintenance.

Merging now. Thanks!!!

wdevazelhes · 2018-04-26T11:02:46Z

Thanks !

William de Vazelhes added 2 commits September 14, 2017 14:06

Allow custom metric for sklearn.manifold.t_sne.trustworthiness scikit…

c632f39

…-learn#9736

Fix the position of the deprecation in docstring.

87036af

massich reviewed Sep 15, 2017

View reviewed changes

William de Vazelhes added 2 commits September 15, 2017 17:49

- Adds tests for deprecation and warnings

0c26be6

- Correct typos in output messages

Fix almost equal error to be compatible with numpy 1.8.0

96d3221

jnothman reviewed Sep 18, 2017

View reviewed changes

Update commit after comments:

a00a28d

- indicate that one can use pairwise squared distances instead of distances - remove useless reminder - remove warning if metric used is unusual - update doc/whats_new

jnothman reviewed Sep 19, 2017

View reviewed changes

Modifications after comments:

1e34405

- change user name and issue number in whats_new - improve test of metric different than euclidean and precomputed

jnothman reviewed Sep 19, 2017

View reviewed changes

jnothman changed the title ~~[MRG] Fix trustworthiness custom metric~~ [MRG+1] Fix trustworthiness custom metric Sep 19, 2017

jnothman mentioned this pull request Sep 19, 2017

sklearn.manifold.t_sne.trustworthiness should allow custom metric #9736

Closed

Simplify calling pairwise_distances function

a080d18

- Also, the error thrown when passing an array of vectors instead of pairwise distances when 'precomputed' is used is not the same, so I changed the IndexError into ValueError in test_trustworthiness_precomputed_deprecation

GaelVaroquaux changed the title ~~[MRG+1] Fix trustworthiness custom metric~~ [MRG+2] Fix trustworthiness custom metric Apr 18, 2018

William de Vazelhes added 2 commits April 25, 2018 14:55

Merge branch 'master' into 9736-trustworthiness-custom-metric

3bbcbaf

# Conflicts: # doc/whats_new/v0.20.rst # sklearn/manifold/t_sne.py

Make random points in trustworthiness to avoid problems with equidist…

609ebfe

…ant KNeighbors

jnothman reviewed Apr 25, 2018

View reviewed changes

glemaitre added 2 commits April 26, 2018 10:27

Update v0.20.rst

c2fb22e

nitpicks

351a9ae

glemaitre merged commit 4aaf45b into scikit-learn:master Apr 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG+2] Fix trustworthiness custom metric #9775

[MRG+2] Fix trustworthiness custom metric #9775

wdevazelhes commented Sep 15, 2017 •

edited

massich left a comment

massich Sep 15, 2017 •

edited

massich Sep 15, 2017

massich commented Sep 15, 2017 •

edited

jnothman left a comment

jnothman Sep 18, 2017

jnothman Sep 18, 2017

jnothman Sep 18, 2017

jnothman Sep 18, 2017

jnothman Sep 19, 2017

massich Sep 19, 2017

jnothman Sep 19, 2017

jnothman commented Sep 19, 2017 via email

jnothman commented Sep 19, 2017 via email

jnothman left a comment

jnothman Sep 19, 2017

wdevazelhes Sep 20, 2017

jnothman Sep 19, 2017

wdevazelhes Sep 20, 2017

GaelVaroquaux commented Apr 18, 2018

wdevazelhes commented Apr 25, 2018 •

edited

jnothman Apr 25, 2018

glemaitre commented Apr 26, 2018

wdevazelhes commented Apr 26, 2018

[MRG+2] Fix trustworthiness custom metric #9775

[MRG+2] Fix trustworthiness custom metric #9775

Conversation

wdevazelhes commented Sep 15, 2017 • edited

Reference Issue

What does this implement/fix? Explain your changes.

massich left a comment

Choose a reason for hiding this comment

massich Sep 15, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

massich commented Sep 15, 2017 • edited

jnothman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnothman commented Sep 19, 2017 via email

jnothman commented Sep 19, 2017 via email

jnothman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GaelVaroquaux commented Apr 18, 2018

wdevazelhes commented Apr 25, 2018 • edited

Choose a reason for hiding this comment

glemaitre commented Apr 26, 2018

wdevazelhes commented Apr 26, 2018

wdevazelhes commented Sep 15, 2017 •

edited

massich Sep 15, 2017 •

edited

massich commented Sep 15, 2017 •

edited

wdevazelhes commented Apr 25, 2018 •

edited