Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] Fix trustworthiness custom metric #9775

Merged
Copy path View file
@@ -224,6 +224,15 @@ Linear, kernelized and related models
underlying implementation is not random.
:issue:`9497` by :user:`Albert Thomas <albertcthomas>`.

Decomposition, manifold learning and clustering

- Deprecate ``precomputed`` parameter in function
:func:`manifold.t_sne.trustworthiness`. Instead, the new parameter

This comment has been minimized.

Copy link
@jnothman

jnothman Apr 25, 2018

Member

If we have a separate entry for enhancement it should be written as such. For example, trustworthiness now accepts a metric other than Euclidean.

``metric`` should be used with any compatible metric including
'precomputed', in which case the input matrix ``X`` should be a matrix of
pairwise distances or squared distances. :issue:`9775` by
:user:`William de Vazelhes <wdevazelhes>`.

Utils

- Avoid copying the data in :func:`utils.check_array` when the input data is a
@@ -466,6 +475,15 @@ Linear, kernelized and related models
:class:`linear_model.logistic.LogisticRegression` when ``verbose`` is set to 0.
:issue:`10881` by :user:`Alexandre Sevin <AlexandreSev>`.

Decomposition, manifold learning and clustering

- Deprecate ``precomputed`` parameter in function
:func:`manifold.t_sne.trustworthiness`. Instead, the new parameter
``metric`` should be used with any compatible metric including
'precomputed', in which case the input matrix ``X`` should be a matrix of
pairwise distances or squared distances. :issue:`9775` by
:user:`William de Vazelhes <wdevazelhes>`.

Metrics

- Deprecate ``reorder`` parameter in :func:`metrics.auc` as it's no longer required
Copy path View file
@@ -9,6 +9,7 @@
# http://cseweb.ucsd.edu/~lvdmaaten/workshops/nips2010/papers/vandermaaten.pdf
from __future__ import division

import warnings
from time import time
import numpy as np
from scipy import linalg
@@ -394,7 +395,8 @@ def _gradient_descent(objective, p0, it, n_iter,
return p, error, i


def trustworthiness(X, X_embedded, n_neighbors=5, precomputed=False):
def trustworthiness(X, X_embedded, n_neighbors=5,
precomputed=False, metric='euclidean'):
r"""Expresses to what extent the local structure is retained.
The trustworthiness is within [0, 1]. It is defined as
@@ -431,15 +433,28 @@ def trustworthiness(X, X_embedded, n_neighbors=5, precomputed=False):
precomputed : bool, optional (default: False)
Set this flag if X is a precomputed square distance matrix.
..deprecated:: 0.20
``precomputed`` has been deprecated in version 0.20 and will be
removed in version 0.22. Use ``metric`` instead.
metric : string, or callable, optional, default 'euclidean'
Which metric to use for computing pairwise distances between samples
from the original input space. If metric is 'precomputed', X must be a
matrix of pairwise distances or squared distances. Otherwise, see the
documentation of argument metric in sklearn.pairwise.pairwise_distances
for a list of available metrics.
Returns
-------
trustworthiness : float
Trustworthiness of the low-dimensional embedding.
"""
if precomputed:
dist_X = X
else:
dist_X = pairwise_distances(X, squared=True)
warnings.warn("The flag 'precomputed' has been deprecated in version "
"0.20 and will be removed in 0.22. See 'metric' "
"parameter instead.", DeprecationWarning)
metric = 'precomputed'
dist_X = pairwise_distances(X, metric=metric)
ind_X = np.argsort(dist_X, axis=1)
ind_X_embedded = NearestNeighbors(n_neighbors).fit(X_embedded).kneighbors(
return_distance=False)
@@ -14,6 +14,8 @@
from sklearn.utils.testing import assert_greater
from sklearn.utils.testing import assert_raises_regexp
from sklearn.utils.testing import assert_in
from sklearn.utils.testing import assert_warns
from sklearn.utils.testing import assert_raises
from sklearn.utils.testing import skip_if_32bit
from sklearn.utils import check_random_state
from sklearn.manifold.t_sne import _joint_probabilities
@@ -288,11 +290,39 @@ def test_preserve_trustworthiness_approximately_with_precomputed_distances():
early_exaggeration=2.0, metric="precomputed",
random_state=i, verbose=0)
X_embedded = tsne.fit_transform(D)
t = trustworthiness(D, X_embedded, n_neighbors=1,
precomputed=True)
t = trustworthiness(D, X_embedded, n_neighbors=1, metric="precomputed")
assert t > .95


def test_trustworthiness_precomputed_deprecation():
# NOTE: Remove this test in v0.23

# Use of the flag `precomputed` in trustworthiness parameters has been
# deprecated, but will still work until v0.23.
random_state = check_random_state(0)
X = random_state.randn(100, 2)
assert_equal(assert_warns(DeprecationWarning, trustworthiness,
pairwise_distances(X), X, precomputed=True), 1.)
assert_equal(assert_warns(DeprecationWarning, trustworthiness,
pairwise_distances(X), X, metric='precomputed',
precomputed=True), 1.)
assert_raises(ValueError, assert_warns, DeprecationWarning,
trustworthiness, X, X, metric='euclidean', precomputed=True)
assert_equal(assert_warns(DeprecationWarning, trustworthiness,
pairwise_distances(X), X, metric='euclidean',
precomputed=True), 1.)


def test_trustworthiness_not_euclidean_metric():
# Test trustworthiness with a metric different from 'euclidean' and
# 'precomputed'
random_state = check_random_state(0)
X = random_state.randn(100, 2)
assert_equal(trustworthiness(X, X, metric='cosine'),
trustworthiness(pairwise_distances(X, metric='cosine'), X,
metric='precomputed'))


def test_early_exaggeration_too_small():
# Early exaggeration factor must be >= 1.
tsne = TSNE(early_exaggeration=0.99)
ProTip! Use n and p to navigate between commits in a pull request.
You can’t perform that action at this time.