# Bradley-Terry
## Purpose
I created this notebook to gain a better understanding of the [Bradley-Terry Model](https://en.wikipedia.org/wiki/Bradley%E2%80%93Terry_model), which has been [used to allocate scores](https://www.kaggle.com/c/commonlitreadabilityprize/discussion/240423) for the [Common Lit Readability Prize](https://www.kaggle.com/c/commonlitreadabilityprize).
In particular I want to know:
1. Can we reconstruct the target scores from the results of "contests" between documents?
1. Is the reconstruction consistent? How much does it depend on the way the contestants have been sampled to build contests?
1. Can I explain the standard error?


## Bradley-Terry

[1] and [4] give two forms of the equation.
$$\begin{align}P(\text{i beats j}) =& \frac{e^{\beta_i}}{e^{\beta_i}+e^{\beta_j}}\\
logit(P(\text{i beats j})) =& \lambda_i - \lambda_j \text{, where }\\
\lambda_i =& e^{\beta_i} \text{ and }\\
logit(p) =& \log\big(\frac{p}{1-p}\big)\end{align}$$

|Question|Outcome|
|---------------------------------------|-----------------------------------------------------------|
|Since some of the target scores in the context are negative, I conjecture that they correspond to the $\beta_i$||
|Target scores in contest are approximately Gaussian.What does this say about probabilities of contests?||

## Methods

This code contained in this notebook sets up contests between pairs of texts in the training dataset. The opponents for each text are selected at random, and the outcome of each context is chosen at random, with the probability of extract $i$ being deemed easier to read than extract $j$ given by $\frac{e^{\beta_i}}{e^{\beta_i}+e^{\beta_j}}$. 

## Results

The plot shows the spread of computed $\beta_i$ versus the supplied values. The solid lines represent actual data or statistics calculated by this notebook; dotted lines represent selected runs of the analysis. ![](./bt-iterations.png)

It shows wide variability, but the mean of the computed values does match those supplied. Moreover the standard deviation, ($\sigma$) matches the standard deviation.

I have limited the number of samples from the data because the calculation is a bit slow. I plan to test overnight with the full dataset.

## Conclusions

1. Reconstruction is possible, but it mat be too slow to be practicable.

## Further Work

1. Quantify variability to see whether it explains the standard error.
1. Does thae variability constain the accuracy that can be obtained in the contest.

## References
||Title|Author|
|---|-----------------------------------------------|-----------------------------|
|[1]|[Bradley-Terry Model](https://en.wikipedia.org/wiki/Bradley%E2%80%93Terry_model)|Wikipedia Editors|
|[2]|[Log5, the Logit Link and Bradley-Terry](https://www.kaggle.com/jaredcross/log5-logistic-regression-and-bradley-te/comments)|Jared Cross|
|[3]|[Target scores](https://www.kaggle.com/c/commonlitreadabilityprize/discussion/240423)|Scott Crossley|
|[4]|[Bradley-Terry Models in R](https://www.jstatsoft.org/article/view/v012i01/v12i01.pdf)|David Firth|
|[5]|[Common Lit Readability Prize](https://www.kaggle.com/c/commonlitreadabilityprize)|Scott Crossley|
|[6]|[Efficient Bayesian Inference for Generalized Bradley-Terry Models](https://arxiv.org/abs/1011.1761)|Francois Caron and Arnaud Doucet|


In [None]:
from matplotlib.pyplot import figure,bar,xlabel,ylabel,legend,rc, plot,savefig, title, scatter, show
from numpy             import argsort, exp, zeros, int32, sum, sqrt, log, argmin, mean, std
from numpy.random      import rand
from os                import walk
from os.path           import join
from pandas            import read_csv
from random            import random, randrange, gauss,sample
from time              import time


# Hyper parameters

In [None]:
N_TRIALS       = 25
N_CONTESTS     = 20 # Minnimum number of contests for each contestant
N_MAX          = 150 # Limit number of contestants sampled - set to None for no limit
MAX_ITERATIONS = 1000 # Used to limit iteations while computing BT
FREQUENCY      = 5  # For work in progress plots
PLOT_FILE      = 'bt-iterations'
EPSILON        = 1e-6 # Controls iterations



# Load Data

## Data Dictionary

|Train|Public Test|Hidden Test|Description|
|--------------|--------------|----------|----------------------------------------------------|
|id|id|id|Unique ID for excerpt|
|url_legal|url_legal|- |URL of source (Omitted from some records in the test set--see [note](https://www.kaggle.com/c/commonlitreadabilityprize/discussion/238670#1306025))|
|license|license |-|License of source material (Omitted from some records in the test set--see [note](https://www.kaggle.com/c/commonlitreadabilityprize/discussion/238670#1306025))|
|excerpt|excerpt|excerpt|Text for predicting readability|
|target|-|-|Readability|
|standard_error|-|-|Measure of spread of scores among multiple raters for each excerpt|

In [None]:
train_data    = None
df_colours    = None
xkcd_colours  = None
for dirname, _, filenames in walk('/kaggle/input'):
    for filename in filenames:
        path_name = join(dirname, filename)
        if filename.startswith('train'):
            train_data = read_csv(path_name)
        if filename.startswith('colors'):
            df_colours = read_csv(path_name)
            xkcd_colours = df_colours.XKCD_COLORS.dropna()
            

                       

# Estimate Parameters

This is the algorithm from [Wikipedia](https://en.wikipedia.org/wiki/Bradley%E2%80%93Terry_model#Estimating_the_parameters). We will plot progress to convergence, and
compare result with initial values.

## First we need to compute the number of wins for i competing with k

1. We stated above that the probability if _i_ beating _k_ is given by the equation $P(\text{i beats k}) = \frac{e^{\beta_i}}{e^{\beta_i}+e^{\beta_k}}$.
2. For each _i_ we draw a number of contestants, and we randomly assign a winner to each contest using the appropriate probabilty.


In [None]:
def count_wins(Lambdas):
    Wins = zeros((N_MAX,N_MAX))
    for i in range(N_MAX):
        for j in range(N_CONTESTS):
            k = (i+randrange(1,N_MAX)) % N_MAX
            if random() < Lambdas[i]/(Lambdas[i] + Lambdas[k]):
                Wins[i,k] += 1
            else:
                Wins[k,i] += 1
    return Wins

## Update probabilities

The algorithm in [1] involves iterating the following equations until convergence has been achieved:

$$\begin{align} p^\prime_i =& \frac{W_i}{\sum_{i \ne j} \frac{W_{ij}+W_{ji}}{p_i+p_j}} \text{ and}\\
p_i =& \frac{p^\prime_i}{\sum p_j}  \end{align} $$ 

The $p_i$ correspond to $\lambda_i$.

In [None]:
def update(p,w_symmetric,W,N):
    p1 = zeros(N)
    for i in range(N):
        Divisor = 0
        for j in range(N):
            if i!=j and p[i]+p[j]>0:
                Divisor += w_symmetric[i,j]/(p[i]+p[j])
        p1[i] = W[i]/Divisor
    return p1/sum(p1)


def normalize(p):
    return p / sum(p)

# Main calculation

1. Plot the $\beta_i$, which reprsenet the ground truth.
1. Product a number of graphs, each estimating the $\beta_i$ from a set of contests.
1. Since $\beta_i$ is modulo an additive constant, shift each plot so its zero matches the zero of $\beta_i$.

In [None]:
Targets        = train_data.target.to_numpy()
SEs            = train_data.standard_error.to_numpy()
if N_MAX == None:
    N_MAX = len(Targets) 
elif N_MAX<len(Targets):
    Indices = sample(list(range(len(Targets))),N_MAX)
    Targets = [Targets[i] for i in Indices]
    SEs     = [SEs[i] for i in Indices]
else:
    N_MAX = len(Targets)
    
Betas          = sorted(Targets)

# Find an index such that Beta[index] is as close to zero as possible.
# We will chosse that as "the" zero of Beta
index_min_beta = argmin([abs(b) for b in Betas])

Lambdas        = exp(Betas)
N,_            = train_data.shape

fig            = figure(figsize=(10,10))

plot(range(N_MAX),[b - Betas[index_min_beta] for b in Betas],label     = r'$\beta$')
plot(range(N_MAX), SEs, label='Standard Error')
start  = time()
Scores = zeros((N_MAX,N_TRIALS))
for trial in range(N_TRIALS):
    w              = count_wins( Lambdas)
    w_symmetric    = w + w.transpose() # Symmetrize w
    W              = sum(w,axis=1)  # Number won by i
    Ps             = normalize(rand(N_MAX))

    for k in range(MAX_ITERATIONS):
        p1 = update(Ps,w_symmetric,W,N_MAX)
        if (abs(Ps-p1)<EPSILON*p1).all():
            Ps = p1
            break
        Ps  = p1
        
    Beta_Calculated = log(Ps)
    # Work out offset to make this curve match original Betas at Beta==0
    offset = Beta_Calculated[index_min_beta]-Betas[index_min_beta]
    # Shift curve to make it match original Betas at Beta==0
    Beta_Shifted = [l - offset for l in Beta_Calculated]
    Scores[:,trial] = Beta_Shifted
    if trial%FREQUENCY==0:
        plot(range(N_MAX),Beta_Shifted,  linestyle = ':',color=xkcd_colours[trial%len(xkcd_colours)])
    if trial%10==0:
        print (f'Trial {trial}')

mu    = mean(Scores,axis=1)
sigma = std(Scores,axis=1)
plot (mu,    label = r'mean $\forall$ trials')
plot (sigma, label = r'$\sigma$')
legend()
xlabel('index')
ylabel('p')
elapsed = int(time() - start)
title(f'{N_MAX} Contestants, {N_CONTESTS} contests. Time = {elapsed} seconds, eps={EPSILON}, k={k}')


fig.savefig(f'{PLOT_FILE}')

show()