Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] Fix trustworthiness custom metric #9775

Merged
Copy path View file
@@ -65,6 +65,15 @@ Linear, kernelized and related models
underlying implementation is not random.
:issue:`9497` by :user:`Albert Thomas <albertcthomas>`.

Decomposition, manifold learning and clustering

- Deprecate ``precomputed`` parameter in function
:func:`manifold.t_sne.trustworthiness`. Instead, the new parameter

This comment has been minimized.

Copy link
@jnothman

jnothman Apr 25, 2018

Member

If we have a separate entry for enhancement it should be written as such. For example, trustworthiness now accepts a metric other than Euclidean.

``metric`` should be used with any compatible metric including
'precomputed', in which case the input matrix ``X`` should be a matrix of
pairwise distances or squared distances. :issue:`9736` by
:user:`Joel Nothman <jnothman>`.

Bug fixes
.........

@@ -100,3 +109,12 @@ Linear, kernelized and related models
- Deprecate ``random_state`` parameter in :class:`svm.OneClassSVM` as the
underlying implementation is not random.
:issue:`9497` by :user:`Albert Thomas <albertcthomas>`.

Decomposition, manifold learning and clustering

- Deprecate ``precomputed`` parameter in function
:func:`manifold.t_sne.trustworthiness`. Instead, the new parameter
``metric`` should be used with any compatible metric including
'precomputed', in which case the input matrix ``X`` should be a matrix of
pairwise distances or squared distances. :issue:`9736` by
:user:`Joel Nothman <jnothman>`.

This comment has been minimized.

Copy link
@jnothman

jnothman Sep 19, 2017

Member

This should be your name and the issue number should be 9775

This comment has been minimized.

Copy link
@massich

massich Sep 19, 2017

Contributor

@jnothman 9775 is the id of the PR not the issue. Do we refer to the issue or to the PR? here we are referring to the issue. Is there a defined criteria? If so, I think we should add it in the contributing guide here and be part of the reviewing checklist discussed in #9653

Copy path View file
@@ -8,6 +8,7 @@
# * Fast Optimization for t-SNE:
# http://cseweb.ucsd.edu/~lvdmaaten/workshops/nips2010/papers/vandermaaten.pdf

import warnings
from time import time
import numpy as np
from scipy import linalg
@@ -377,7 +378,8 @@ def _gradient_descent(objective, p0, it, n_iter,
return p, error, i


def trustworthiness(X, X_embedded, n_neighbors=5, precomputed=False):
def trustworthiness(X, X_embedded, n_neighbors=5,
precomputed=False, metric='euclidean'):
"""Expresses to what extent the local structure is retained.
The trustworthiness is within [0, 1]. It is defined as
@@ -413,15 +415,33 @@ def trustworthiness(X, X_embedded, n_neighbors=5, precomputed=False):
precomputed : bool, optional (default: False)
Set this flag if X is a precomputed square distance matrix.
..deprecated:: 0.20
``precomputed`` has been deprecated in version 0.20 and will be
removed in version 0.22. Use ``metric`` instead.
metric : string, or callable, optional, default 'euclidean'
Which metric to use for computing pairwise distances between samples
from the original input space. If metric is 'precomputed', X must be a
matrix of pairwise distances or squared distances. Otherwise, see the
documentation of argument metric in sklearn.pairwise.pairwise_distances
for a list of available metrics.
Returns
-------
trustworthiness : float
Trustworthiness of the low-dimensional embedding.
"""
if precomputed:
warnings.warn("The flag 'precomputed' has been deprecated in version "
"0.20 and will be removed in 0.22. See 'metric' "
"parameter instead.", DeprecationWarning)
metric = 'precomputed'
if metric == 'precomputed':

This comment has been minimized.

Copy link
@jnothman

jnothman Sep 19, 2017

Member

I think pairwise_distances handled precomputed, so we don't need a special case here

This comment has been minimized.

Copy link
@wdevazelhes

wdevazelhes Sep 20, 2017

Author Contributor

Thanks, I will change this

dist_X = X
elif metric == 'euclidean':
dist_X = pairwise_distances(X, metric='euclidean', squared=True)

This comment has been minimized.

Copy link
@jnothman

jnothman Sep 19, 2017

Member

I similarly don't think it's essential to specially handle Euclidean, but at least it saves some computation

This comment has been minimized.

Copy link
@wdevazelhes

wdevazelhes Sep 20, 2017

Author Contributor

I will change this too

else:
dist_X = pairwise_distances(X, squared=True)
dist_X = pairwise_distances(X, metric=metric)
dist_X_embedded = pairwise_distances(X_embedded, squared=True)
ind_X = np.argsort(dist_X, axis=1)
ind_X_embedded = np.argsort(dist_X_embedded, axis=1)[:, 1:n_neighbors + 1]
@@ -14,6 +14,8 @@
from sklearn.utils.testing import assert_greater
from sklearn.utils.testing import assert_raises_regexp
from sklearn.utils.testing import assert_in
from sklearn.utils.testing import assert_warns
from sklearn.utils.testing import assert_raises
from sklearn.utils.testing import skip_if_32bit
from sklearn.utils import check_random_state
from sklearn.manifold.t_sne import _joint_probabilities
@@ -286,11 +288,36 @@ def test_preserve_trustworthiness_approximately_with_precomputed_distances():
early_exaggeration=2.0, metric="precomputed",
random_state=i, verbose=0)
X_embedded = tsne.fit_transform(D)
t = trustworthiness(D, X_embedded, n_neighbors=1,
precomputed=True)
t = trustworthiness(D, X_embedded, n_neighbors=1, metric="precomputed")
assert t > .95


def test_trustworthiness_precomputed_deprecation():
# NOTE: Remove this test in v0.23

# Use of the flag `precomputed` in trustworthiness parameters has been
# deprecated, but will still work until v0.23.
X = np.arange(100).reshape(50, 2)
assert_equal(assert_warns(DeprecationWarning, trustworthiness,
pairwise_distances(X), X, precomputed=True), 1.)
assert_equal(assert_warns(DeprecationWarning, trustworthiness,
pairwise_distances(X), X, metric='precomputed',
precomputed=True), 1.)
assert_raises(IndexError, assert_warns, DeprecationWarning,
trustworthiness, X, X, metric='euclidean', precomputed=True)
assert_equal(assert_warns(DeprecationWarning, trustworthiness,
pairwise_distances(X), X, metric='euclidean',
precomputed=True), 1.)


def test_trustworthiness_not_euclidean_metric():
# Test trustworthiness with a metric different from 'euclidean' and
# 'precomputed'
X = np.array([[1, 1], [1, 0], [2, 2]])
assert_almost_equal(trustworthiness(X, X, n_neighbors=1, metric='cosine'),
0.66, decimal=1)

This comment has been minimized.

Copy link
@jnothman

jnothman Sep 19, 2017

Member

.66 appears drawn from thin air. Instead benchmark against the corresponding precomputed distances



def test_early_exaggeration_too_small():
# Early exaggeration factor must be >= 1.
tsne = TSNE(early_exaggeration=0.99)
ProTip! Use n and p to navigate between commits in a pull request.
You can’t perform that action at this time.