## Kullback-Leibler Divergence of Empirical and Theoretical Probabilities of Rankings 

The Kullback-Leibler(KL) divergence of two probability distributions is a measure of difference between the two probability distributions. For probability distributions E and T, the KL divergence is 

$$ D_{KL}(P, Q) = \sum_{i}Q(i)\log\frac{Q(i)}{P(i)}    $$

where i is the ith term that the probability distribution is defined over. To find the KL divergence between the empirical and theoretical probability distributions of the Ireland 2002 data, we first load in the data as well as the parameters we found for the Mallows and Plackett-Luce models that best fit the data:

In [19]:
import readPreflib
import numpy as np

_, lengths, votes = readPreflib.soiInputwithWeights('data_input/ED-debian-2002.soi')
num_votes = 1.0 * sum(lengths.values())

import pickle

mallows_params  = pickle.load( open('pickle/mallows2002_100k.p','rb') )
sigma, phi = mallows_params
plackett_params = pickle.load( open('pickle/plackett2002_100k.p','rb')) 
pl_weights = plackett_params

mallows_params, plackett_params, num_votes

([array([3, 4, 2, 1]), 0.02973511934833109],
 array([0.39292042, 0.08434978, 0.40365703, 0.11907276]),
 475.0)

We also need to gather the probability functions for the Mallows and Plackett-Luce models

In [23]:
import import_ipynb
from Mallows_Notebook import mallowsProb
from PL_Notebook import probPlackett
import math

Now we can follow the equation for KL divergence to find it.

In [22]:
divergence_mallows = 0
divergence_plackett = 0
for entry in votes:
    num_occurances, vote = entry
    empirical = num_occurances / num_votes
    mallows = mallowsProb(vote, sigma, phi)
    plackett = probPlackett(vote, pl_weights)
    divergence_mallows += insideSum(mallows, empirical)
    divergence_plackett += insideSum(plackett, empirical)
    
def insideSum(Qi,Pi):
    return Qi * math.log(Qi/Pi)

divergence_mallows, divergence_plackett

(351.02543014684534, 44.84313451440723)