## Plackett-Luce params for Debian 2002

The probability of a ranking {0,...,N} given weight vector W is

$$ \frac{W_0}{W_0+W_1+\ldots+W_{N-1}}\times\frac{W_1}{W_1+W_2+\ldots+W_{N-1}}\times\ldots\times\frac{W_{N-2}}{W_{N-2}+W_{N-1}}\times \frac{W_{N-1}}{W_{N-1}}$$ 

In [1]:
import numpy as np
from tqdm import tqdm_notebook
import math
import random
import itertools
import readPreflib
import metropolis

In [42]:
def probPlackett(r, weights):
    product = 1
    for i in range(0,len(r)):
        numer = getWeight(r[i],weights)
        denom = 0
        for j in range(i,len(r)):
            denom += getWeight(r[j],weights)
        if denom == 0:
            product *= numer
        else:
            product *= (1.0 * numer) / denom
    return product
        
def probPlackett2(r, weights):
    print(r, weights)
    return probPlackett(r, weights)
    
# alternatives are 1-indexed in the preflib data
# kept forgetting this so made it a seperate method
def getWeight(num, weights):
    return weights[num-1]

print(probPlackett(np.asarray([1, 2, 3, 4]),np.asarray([0.5,0.25,0.125,0.125])))
# This should be 1/8

0.125


Read in the data:

In [16]:
candidates, length_counts, votes = readPreflib.soiInputwithWeights('data_input/ED-debian-2002.soi')
candidates

{1: 'Branden Robinson',
 2: 'Raphael Hertzog',
 3: 'Bdale Garbee',
 4: 'None Of The Above'}

There are votes in the data that are incomplete. We store a vector with the probabily of each length:

In [11]:
def getLengthProbs(length_counts):
    length_probs = []
    total_votes = 1.0 * sum(length_counts.values())
    for i in range(1,len(length_counts.values())+1):
        length_probs.append(length_counts[i] / total_votes)
    return length_probs
    
def probLength(lengths, n):
    return lengths[n-1]


The votes come in as tuples that look like

* (5, [1,2,3,4,5])
* (2, [4,2,1,3])

The second term in the tuple is a vote, and the first term is the number of terms that vote occurs. Therefore, the sum of the probabilities of all votes in a dataset given a plackett luce model is the following:

In [33]:
def plackettCost(params, dataset, lengths):
    weights = params
    cost = 0
    for tup in dataset:
        num_occurances, r = tup
        cost += probLength(lengths, len(r)) * num_occurances * probPlackett(r, weights)
    return cost

We need a set of weights to start the metropolis algorithm at. We can assign these randomly and then normalize as follows:

In [34]:
def randomWeights(N):
    weights = np.zeros(N)
    for i in range(N):
        weights[i] = np.random.uniform()
        s = np.sum(weights)
        for i in range(N):
            weights[i] = weights[i] / s
    return weights

def uniformWeights(N):
    weights = np.zeros(N)
    for i in range(N):
        weights[i] = 1.0 / N
    return weights

# random_weights = randomWeights(4)
# print(random_weights)

We also need a way to move mass within the weights, which is how we generate a new candidate for the metropolis hastings algorithm. Here we transfer some mass from one alternative, j, to another, i. The limit on mass transfered = Δ'(Wᵢ→Wⱼ) = Argmin(Wᵢ,1-Wⱼ). The mass transfered = Δ = U(0,αΔ') where α is a parameter indicating the aggresiveness of the transfer.

I think we could very easily decrease the aggresiveness over time, similar to how the 'temperature' in simulated annealing works.

In [35]:
def transferMass(weights, aggresiveness = 0.05):
    w = list(weights)
    N = len(w)
    index1 = random.randint(0,N-1)
    index2 = random.randint(0,N-1)
    while (index2 == index1):
        index2 = random.randint(0,N-1)

    initial1 = w[index1]
    initial2 = w[index2]
    limit  = min(initial1, 1.0 - initial2)
    delta = np.random.uniform(0.0, limit * aggresiveness)
    w[index1] = initial1 - delta
    w[index2] = initial2 + delta
    return np.asarray(w)

# print(transferMass(random_weights))

We now have everything we need to find parameters using the Metropolis Hastings algorithm

In [41]:
def runPL(rankings, n_runs, lengths_vector):
    lengths = getLengthProbs(lengths_vector)
    initial_weights = uniformWeights(len(lengths))
    params, cost = metropolis.maximize(plackettCost, initial_weights, lengths, transferMass, rankings, n_runs)
    return params
    
#runPL(votes, 100, length_counts)

In [9]:
# print(params, cost)

The following cell saves the model as a python pickle to disk

In [10]:
import pickle

# pickle.dump(params, open('pickle/plackett2002_3mil_2.p','wb'))

As a more general way to run this externally we use this function: