New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: implemented kendall tau distance #7205

Open
wants to merge 4 commits into
base: master
from

Conversation

Projects
None yet
6 participants
@evgenyzhurko
Contributor

evgenyzhurko commented Mar 20, 2017

#7089
I put the code of function in scipy.stats, is it right?
What do you think about long name of the method, make it shorter or leave as is?
And is it required nan_policy(raise, propagate, omit) option?

@rgommers

This comment has been minimized.

Member

rgommers commented Mar 26, 2017

Thanks @evgenyzhurko. scipy.stats looks like the right place.

@josef-pkt @ev-br on gh-7089 there's was no discussion about whether Kendall tau rank distance is worth adding. I'm +0.5, seems common enough.

Could make sense to call it rank_distance and allow other metrics within that same function (e.g. https://arxiv.org/abs/1207.2541)

@josef-pkt

This comment has been minimized.

Member

josef-pkt commented Mar 26, 2017

I'm also +0.5 but haven't looked at the details (I'm in different neighborhoods at the moment.)

Reasonably good speed is important for these kind of functions. (I don't have problems writing computationally inefficient functions as a user, and having access to good library functions are very useful.)

In general I think distance measures are fine in stats, but the names should not be confusing
http://jpktd.blogspot.ca/2012/06/non-linear-dependence-measures-distance.html
(once upon a time I was writing blog posts)

@ev-br

This comment has been minimized.

Member

ev-br commented Mar 27, 2017

@rgommers

This comment has been minimized.

Member

rgommers commented Mar 28, 2017

Nice blog post Josef, and I fully agree with "names should not be confusing". I can't make out though which (if any) of {kendalltau_distance, rank_distance} you find confusing. Can you clarify?

@evgenyzhurko

This comment has been minimized.

Contributor

evgenyzhurko commented Apr 10, 2017

@rgommers Is it will be good if i'll create function rank_distance with optoinal param method='kendall' or 'spearman' and put this fucntion in stats module? And how i can be with two type of kendall distance:

  1. Kendall tau rank distance between two list(total number of discordant pairs) Example Wikipedia
  2. Kendall tau rank distance between two rankings. "The Kendall tau distance between two rankings is the number of pairs that are in different order in the two rankings. For example, the Kendall tau distance between 0 3 1 6 2 5 4 and 1 0 3 6 4 2 5 is four because the pairs 0-1, 3-1, 2-4, 5-4 are in different order in the two rankings, but all other pairs are in the same order." (from Wikipedia).

Link https://arxiv.org/abs/1207.2541 say about second type of Kendall method, so i'll plan to put second type into rank_distance. And where i can put first method, maybe scipy.spatial.distance?

@rgommers rgommers added this to the 1.1.0 milestone Sep 17, 2017

@pv pv modified the milestones: 1.1.0, 1.2.0 Apr 12, 2018

assert_raises(ValueError, stats.rank_distance, [1, 2, 3], [1, 2])
assert_raises(ValueError, stats.rank_distance, [1, 1, 1], [1, 2])
assert_raises(ValueError, stats.rank_distance, [1, 2, 3], [1, 2])
assert_raises(ValueError, stats.rank_distance, [1, 2], [1, 2],

This comment has been minimized.

@tylerjereddy

tylerjereddy Oct 25, 2018

Contributor

5 repeated assertions with similar signatures may justify the use of a context manager with parametrization

x, y: array-like
1-D permutations of [1..N] vector
weights: array-like, optional
1-D array of weights. Default None equals to unit weights.

This comment has been minimized.

@tylerjereddy

tylerjereddy Oct 25, 2018

Contributor
Suggested change Beta
1-D array of weights. Default None equals to unit weights.
1-D array of weights. Default `None` is equivalent to unit weights.
------
distance: float
Distance between x and y.

This comment has been minimized.

@tylerjereddy

tylerjereddy Oct 25, 2018

Contributor

Docstring may benefit from i.e., a Notes section briefly describing the source of the algorithm(s). I see you cited a paper in the discussion.

weights = np.ones(x.size - 1)
else:
weights = np.asarray(weights)
if weights.size < (x.size - 1):

This comment has been minimized.

@tylerjereddy

tylerjereddy Oct 25, 2018

Contributor

Two questions here:

  1. What if weights.size is larger than x.size?
  2. Should it be obvious why x.size - 1 instead of just x.size is used? Maybe something simple that I'm missing while reviewing late at night
@tylerjereddy

This comment has been minimized.

Contributor

tylerjereddy commented Nov 6, 2018

Lots of things need to be fleshed out here and not much activity since 2017 so I'm bumping the milestone again.

Also, look like performance is an important concern here and Cythonization looks like it could be helpful for some of those loops I see added.

@tylerjereddy tylerjereddy modified the milestones: 1.2.0, 1.3.0 Nov 6, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment