# Kendall tau distance #7089

Open
opened this Issue Feb 24, 2017 · 6 comments

Projects
None yet
6 participants

### erelsgl commented Feb 24, 2017 • edited

 I do not understand the output of scipy.stats.kendalltau. ``````>>> from scipy import stats >>> stats.kendalltau([1,3,2], [3,1,2]).correlation -1.0000000000000002 `````` Ignoring the rounding error, the correlation is -1, which means that the ranks are totally reversed. But, [1,3,2] and [3,1,2] are only one swap apart. So, shouldn't the correlation be 0.3333333, like here?: `````` >>> stats.kendalltau([1,3,2], [1,2,3]).correlation 0.33333333333333337 ``````
Member

### josef-pkt commented Feb 24, 2017 • edited

 Even if it is only one swap apart in the first example, the ranking is exactly reversed ``````>>> x = np.column_stack([[1,3,2], [3,1,2]]) >>> x[np.argsort(x[:,0])] array([[1, 3], [2, 2], [3, 1]]) `````` e.g. using pearson correlation ``````>>> np.corrcoef(x[np.argsort(x[:,0])].T) array([[ 1., -1.], [-1., 1.]]) ``````
Member

### josef-pkt commented Feb 24, 2017 • edited

 maybe that doesn't answer the issue, I'm thinking of spearman and don't have much intuition for kendall (edit: I guess there should be a theorem that if pearson correlation = 1 or -1, then it's a line and any other correlation should be the same 1 or -1)
Contributor

### mirca commented Feb 25, 2017

 Hi @erelsgl, the results looks correct. In your first example you have `x = [1, 3, 2]` and `y = [3, 1, 2]` which gives `#{discordant pairs} = 3`, `#{concordant pairs} = 0`, resulting in `tau = (0 - 3) / 3 = -1`. In the second example, `x = [1, 3, 2]` and `y = [1, 2, 3]`, so `#{discordant pairs} = 1 ` and `#{concordant pairs} = 2 `, hence `tau = (2 - 1) / 3 = 1 / 3`. Take a look at https://en.wikipedia.org/wiki/Kendall_rank_correlation_coefficient

### erelsgl commented Feb 25, 2017

 Hmmm... the Wikipedia page on https://en.wikipedia.org/wiki/Kendall_tau_distance says that "Kendall tau distance is also called bubble-sort distance since it is equivalent to the number of swaps that the bubble sort algorithm would make to place one list in the same order as the other list". The number of swaps in my example is obviously small (1), while the number of discordant pairs is large (3). They are not the same. Is this a mistake in the Wikipedia page?
Contributor

### evgenyzhurko commented Feb 27, 2017 • edited

 Kendall rank correlation coefficient and Kendall tau distance are the different measurement. @erelsgl say true information about Kendall tau distance. `kendall_tau_distance = m / (n * (n - 1) / 2)`, where `m` - pairs whose values are in opposite order and `n` - size of list. As a result, Kendall tau distance therefore lies in the interval `[0,1]`, because `m` in never less than `0`. @mirca also say true information, but about Kendall rank correlation coefficient. `kendall_tau_correlation = (p - q) / (n * (n - 1) / 2)`, where `p` - number of concordant pairs, `q` - number of discordant pairs, `n` - size of list. `kendall_tau_correlation` lies in the interval `[-1;1]`. `scipy.stats.kendalltau` is a function to find Kendall rank correlation coefficient but not for Kendall tau distance. I didn't find the implementation of Kendall tau distance in SciPy. If the implementation is not exist in SciPy actually, I can implement it.

Open

### masdeseiscaracteres commented Apr 20, 2018

 Hi all! Interesting discussion. For those interested (and answering @erelsgl question) I wrote a post trying to shed some light on what is going on with the Kendall Tau distance and the Wikipedia page. Basically the confusion roots in two distinct usages of the word "ranking". Sometimes it refers to "ranking lists", and, some others, to "ranked lists of items". We get weird results when the type of "ranking" we have at hand does not match the one used in the definition we are applying.