New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] DOC Fix error in documentation of trustworthiness #9800
Conversation
@tomMoral any opinion on this since you have worked on t-SNE recently? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the key to get across is that any unexpected nearest neighbors in the embedded space are penalised in proportion to their rank in the original space.
sklearn/manifold/t_sne.py
Outdated
@@ -387,10 +387,10 @@ def trustworthiness(X, X_embedded, n_neighbors=5, precomputed=False): | |||
T(k) = 1 - \frac{2}{nk (2n - 3k - 1)} \sum^n_{i=1} | |||
\sum_{j \in U^{(k)}_i} (r(i, j) - k) | |||
|
|||
where :math:`r(i, j)` is the rank of the embedded datapoint j |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's some confusion in "according to", and r is not described as a function of i. Can you write it in your own words? Use multiple sentences. Perhaps describe U before you describe r. You can also say "for each sample i" to make it simpler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that U
should be move before r
and the "for each" should help defining things.
Perhaps you can use "input space" instead of original space and insist on the fact that the data sample j
is the r(i,j)-th
neighbors of the data sample i
Thanks for the comments. I tried to include them in the new commit. I also added @jnothman sum-up of the function in the docstring, tell me if it seems good to you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM It is way clearer with this phrasing.
sklearn/manifold/t_sne.py
Outdated
is the set of points that are in the k nearest neighbors in the embedded | ||
space but not in the original space. | ||
where for each sample i, :math:`U^{(k)}_i` are all samples j that are in | ||
the k-nearest neighbor of i in the embedded space but are the :math:`r(i, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use the "output space" would be clearer? input/output seem clearer when talking about mappings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely, I will do so.
sklearn/manifold/t_sne.py
Outdated
the k-nearest neighbor of i in the embedded space but are the :math:`r(i, | ||
j)`-th nearest neighbor of i in the input space with r(i, j) > k. In other | ||
words, any unexpected nearest neighbors in the embedded space are penalised | ||
in proportion to their rank in the input space. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this sum up! 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree 👍
sklearn/manifold/t_sne.py
Outdated
:math:`U^{(k)}_i` is the set of points that are in the k nearest | ||
neighbors in the embedded space but not in the original space. | ||
where for each sample i, :math:`U^{(k)}_i` are all samples j that are in | ||
the k-nearest neighbor of i in the output space but are the :math:`r(i, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still find this difficult. Can we just get rid of U from above and just have max(0, r(i, j) - k)
or max(k, r(i, j)) - k
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to be annoying. I'm now thinking that the problem with U was that it expressed two things: being in i's embedded neighborhood and being outside of i's original neighborhood. It's now a bit weird that we don't have a function for "the k-neighborhood of i in embedded space".
Otherwise, I think this is a vast improvement. Thank you.
sklearn/manifold/t_sne.py
Outdated
according to the pairwise distances between the embedded datapoints, | ||
:math:`U^{(k)}_i` is the set of points that are in the k nearest | ||
neighbors in the embedded space but not in the original space. | ||
where for each sample i, j is among its k nearest neighbors in the output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
output -> embedded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this change according to @tomMoral comment:
Maybe use the "output space" would be clearer? input/output seem clearer when talking about mappings.
Both ways seem clear to me: "embedded" is more precise, but maybe "output" is clear enough in this case and maybe simpler ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too fussed either way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too fussed either way
No pb, I agree, I will fix that |
wonderful |
Reference Issue
Fixes #9799
What does this implement/fix? Explain your changes.
It fixes an error in the docstring of function manifold.t_sne.trustworthiness: the rank in the formula should be the rank in the original input space.