Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs for the weighted τ #13224

Merged
merged 3 commits into from Jan 22, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
24 changes: 11 additions & 13 deletions scipy/stats/stats.py
Expand Up @@ -4620,25 +4620,23 @@ def weightedtau(x, y, rank=True, weigher=None, additive=True):
unimportant elements [1]_.

The weighting is defined by means of a rank array, which assigns a
nonnegative rank to each element, and a weigher function, which
assigns a weight based from the rank to each element. The weight of an
exchange is then the sum or the product of the weights of the ranks of
the exchanged elements. The default parameters compute
:math:`\tau_\mathrm h`: an exchange between elements with rank
:math:`r` and :math:`s` (starting from zero) has weight
:math:`1/(r+1) + 1/(s+1)`.
nonnegative rank to each element (higher importance ranks being
associated with smaller values, e.g., 0 is the highest possible rank),
and a weigher function, which assigns a weight based on the rank to
each element. The weight of an exchange is then the sum or the product
of the weights of the ranks of the exchanged elements. The default
parameters compute :math:`\tau_\mathrm h`: an exchange between
elements with rank :math:`r` and :math:`s` (starting from zero) has
weight :math:`1/(r+1) + 1/(s+1)`.

Specifying a rank array is meaningful only if you have in mind an
external criterion of importance. If, as it usually happens, you do
not have in mind a specific rank, the weighted :math:`\tau` is
defined by averaging the values obtained using the decreasing
lexicographical rank by (`x`, `y`) and by (`y`, `x`). This is the
behavior with default parameters.

Note that if you are computing the weighted :math:`\tau` on arrays of
ranks, rather than of scores (i.e., a larger value implies a lower
rank) you must negate the ranks, so that elements of higher rank are
associated with a larger value.
Comment on lines -4638 to -4641
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to check my understanding - this comment was intended for users who were not passing in a rank argument directly, right? If the users were passing in their own rank, weightedtau would produce the same result whether the x and y were "scores" or "ranks", right?
Does this test that correctly?

import numpy as np
from scipy.stats import weightedtau, kendalltau, rankdata

np.random.seed(0)

# "scores"
x = np.random.rand(10)
y = np.random.rand(10)

# "ranks", SciPy convention
x2 = rankdata(x)
y2 = rankdata(y)

# "ranks", opposite convention
x3 = 11 - x2
y3 = 11 - y2

rank = np.arange(10)

print(weightedtau(x, y, rank))
print(weightedtau(x2, y2, rank))
print(weightedtau(x3, y3, rank))

The outputs are identical.

So the sentiment of this comment is really "especially if you're not passing in your own ranks, pay attention to how this function calculates weights to make sure it makes sense for your data", right?

Copy link
Contributor

@mdhaber mdhaber Dec 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way to put it is - neither kendalltau nor weightedtau inherently care if the data are specified as "scores" or "ranks" (or which convention for ranks is used); but it may affect the results of weightedtau because of the way it assigns weights by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. If you pass a rank array as an external source of rank, you must follow the conventions of weightedtau(). Or, you can compute the weighted τ between two rank arrays. In the second case, you can pass the rank arrays as they are if they follow the "descending" convention, but you must negate them if the follow the "ascending" convention. It is a subtle point that the need for negation is only due to the fact that I'm assuming ascending ranks. But under the SciPy convention you can pass array of ranks and they will work as scores. That's why I removed that part. But I added some clarification on the fact that the an external rank source follows a different convention. BTW, I don't expect anybody to every supply such a source in real-world applications.

If you change the sign of two scores vectors, τ will not change, because all out-of-order pairs have the same cost. This is not true for a weighted τ if you do not provide an external source of rank, because exchanges between more important elements cost more, and lacking a source of rank, importance will be induced by sorting the scores. If you pass an external rank array, you can change the sign of the score vectors and the weighted τ will not change, because only the relative order of the vectors will be relevant—the importance of the elements will be given by the rank array.

behavior with default parameters. Note that the convention used
here for ranking (lower values imply higher importance) is opposite
to that used by other SciPy statistical functions.

Parameters
----------
Expand Down