# Competition Metric - Kendall Tau Correlation

The competition is measured with this metric, taken here:

https://www.kaggle.com/code/ryanholbrook/competition-metric-kendall-tau-correlation/notebook

This notebook checks the implementation works with a few simple test cases



In [1]:
import json
import numpy as np
import pandas as pd
import nltk

First we need to find out the inversion number of our solution

In [2]:
from bisect import bisect


# Actually O(N^2), but fast in practice for our data
def count_inversions(a):
    inversions = 0
    sorted_so_far = []
    for i, u in enumerate(a):  # O(N)
        j = bisect(sorted_so_far, u)  # O(log N)
        inversions += i - j
        sorted_so_far.insert(j, u)  # O(N)
    return inversions

In [3]:
## testing
count_inversions([1,2,3,4,5])

0

In [4]:
## testing
count_inversions([1,2,4,3,5])

1

In [5]:
## testing
# 10= 4+3+2+1
# max inversion is n(n-1)/2 = 10 which gives a correlation of -1
count_inversions([5,4,3,2,1])

10

Here is the formula, where $S_i$  is the number of inversions in the predicted ranks and $n_i$ is the number of cells for notebook $i$

$K=1-4 \frac{\sum_i S}{\sum_i n (n-1)}$ 

In [6]:
def kendall_tau(ground_truth, predictions):
    total_inversions = 0  # total inversions in predicted ranks across all instances
    total_2max = 0  # maximum possible inversions across all instances
    for gt, pred in zip(ground_truth, predictions):
        ranks = [gt.index(x) for x in pred]  # rank predicted order in terms of ground truth
        total_inversions += count_inversions(ranks)
        n = len(gt)
        total_2max += n * (n - 1)
    return 1 - 4 * total_inversions / total_2max

In [7]:
import random
testlist = [["ukraine", "UK", "spain", "sweden", "serbia", "italy"],] #eurovision 2022
shuffled_list=[]
for l in testlist:
    shuffled_list.append(l.copy())
    random.shuffle(shuffled_list[-1])

In [8]:
shuffled_list

[['UK', 'italy', 'serbia', 'spain', 'sweden', 'ukraine']]

In [9]:
kendall_tau(testlist, shuffled_list)

-0.33333333333333326