Experimental part

1) Step 1: Simulate Rankings of Relevance for E and P (5 points)

In the next section, we first create a list of all combinations of relevances. We use itertools.product which gives all possible combinations of a list in any order. Then we use permutation which gives all the combinations of experiment and production relevances. We use itertools.permutations to do this.

In [17]:
import itertools
values = ['N','R','HR'] #possible values of a prediction

relevances = [] #relevances contains all combinations of N/R/HR with length 5
for r in itertools.product(values, repeat=5):
    relevances.append(list(r))

In [18]:
combinations = [] #combinations contains all pairs of relevances
for p in itertools.permutations(relevances, 2):
    combinations.append(list(p)) #we use this to get rid of the permutations object

In [19]:
print(combinations[:10]) #show the first 10 combinations

[[['N', 'N', 'N', 'N', 'N'], ['N', 'N', 'N', 'N', 'R']], [['N', 'N', 'N', 'N', 'N'], ['N', 'N', 'N', 'N', 'HR']], [['N', 'N', 'N', 'N', 'N'], ['N', 'N', 'N', 'R', 'N']], [['N', 'N', 'N', 'N', 'N'], ['N', 'N', 'N', 'R', 'R']], [['N', 'N', 'N', 'N', 'N'], ['N', 'N', 'N', 'R', 'HR']], [['N', 'N', 'N', 'N', 'N'], ['N', 'N', 'N', 'HR', 'N']], [['N', 'N', 'N', 'N', 'N'], ['N', 'N', 'N', 'HR', 'R']], [['N', 'N', 'N', 'N', 'N'], ['N', 'N', 'N', 'HR', 'HR']], [['N', 'N', 'N', 'N', 'N'], ['N', 'N', 'R', 'N', 'N']], [['N', 'N', 'N', 'N', 'N'], ['N', 'N', 'R', 'N', 'R']]]


2) Step 2: Implement Evaluation Measures (10 points)

In the next section we take two assumptions:

1) Values for the prediction relevances are N=0, R=1, HR=2

2) The amount of relevant predictions (overall) is assumed to be the total amount of relevant (R or HR) docs in the prediction set. So we assume there is no overlap between articles in the prediction.

In [20]:
#the first binary evaluation methods: average precision
numeric_map = {'N':0, 'R':1, 'HR':2} #we use this numeric map to map N/R/HR to a numeric value.
prediction = ['R','HR','N','R','N'] #this is a sample prediction to test functions

def count_rel(prediction1,prediction2):
    return sum(1 for i in prediction1 if i != 'N') + sum(1 for i in prediction2 if i != 'N')

def average_precision(prediction, r):
    ap = 0
    relevant_preds = 0
    for i in range(0,len(prediction)):
        if prediction[i] != 'N':
            relevant_preds += 1
            ap += relevant_preds/(i+1)
    return ap/r

ap = average_precision(prediction, count_rel(prediction, prediction))
print(ap)

0.4583333333333333


Now we will implement both multi-graded evaluation methods. 

The first is nDCG@k which requires a optimal prediction to normalize predictions. Here we will use the total amount of HR/R files to create an optimum prediction. Again we assume there is no overlap in predictions.

The second one is ERR, this model does not need any assumptions.

In [21]:
#nDCG@K
import numpy as np #Numpy is amazing right?

def generate_opt(prediction1, prediction2): #generate optimal sequences from two predictions
    opt_pred = []
    num_hr = sum(1 for i in prediction1 if i == 'HR') + sum(1 for i in prediction2 if i == 'HR')
    num_r  = sum(1 for i in prediction1 if i == 'R') + sum(1 for i in prediction2 if i == 'R')
    for i in range(min(num_hr,5)): #check if num_hr exceeds 5, fill with HR's
        opt_pred.append('HR')
    for i in range(min(5-num_hr,num_r)): #check if num_r exceeds the space left, will with R's
        opt_pred.append('R')
    for i in range(5-len(opt_pred)): #fill the rest with N
        opt_pred.append('N')
    return opt_pred

def dcg_k(numeric_map, prediction, opt_pred, k):
    dcg_opt = 0
    dcg = 0
    for i in range(0,k): #for the range until K, we sum both the optimum and prediction dcg
        dcg_opt += (2**numeric_map[opt_pred[i]]-1)/np.log2(1+i+1)
        dcg +=(2**numeric_map[prediction[i]]-1)/np.log2(1+i+1)
    return dcg/dcg_opt #dcg is normalized compared to the optimum
ndcg = dcg_k(numeric_map, prediction, generate_opt(prediction,prediction), 3)
print(ndcg)

0.53641800576


In [22]:
#ERR
def ERR(numeric_map, prediction):
    err = 0
    max_val = 2**max(list(numeric_map.values()))
    thetas = [(2**numeric_map[p]-1)/max_val for p in prediction]
    for i in range(0,len(prediction)):
        prod_val = 1
        for j in range(0,i):
            prod_val *= (1-thetas[j])*thetas[i]
        prod_val *= 1/(i+1)
        err += prod_val
    return err
err = ERR(numeric_map, prediction)
print(err)

1.281982421875


3) Step 3: Calculate the 𝛥measure (0 points)

In [23]:
k = 5

def check_performance(s):
    prediction_e = s[0]
    prediction_p = s[1]
    print(prediction_e, prediction_p)
    r = count_rel(prediction_e, prediction_p)
    ap_e, ap_p = average_precision(prediction_e, r), average_precision(prediction_p, r)
    print('The average prec. scores are ',ap_e,ap_p,' for experiment and production respectively!')
    ERR_e, ERR_p = ERR(numeric_map, prediction_e), ERR(numeric_map, prediction_p)
    print('The ERR scores are ',ERR_e,ERR_p,' for experiment and production respectively!')
    opt_prediction = generate_opt(prediction_e,prediction_p)
    ndcg_e = dcg_k(numeric_map, prediction_e, opt_prediction, k)
    ndcg_p = dcg_k(numeric_map, prediction_p, opt_prediction, k)
    print('The NDCG scores @ k=',k,' are: ',ndcg_e, ndcg_p,' for experiment and production respectively!!!')

check_performance(combinations[20005])

['R', 'N', 'N', 'N', 'R'] ['HR', 'N', 'N', 'N', 'N']
The average prec. scores are  0.4666666666666666 0.3333333333333333  for experiment and production respectively!
The ERR scores are  1.0005859375 1.0  for experiment and production respectively!
The NDCG scores @ k= 5  are:  0.33572413233 0.726228761795  for experiment and production respectively!!!


Step 4: Implement Interleaving (15 points)

In [79]:
def get_A_first(): #This is a function that determines is ranking A goes first or not
    A = np.random.uniform() # Take a random uniform number between 0 and 1    
    if A > 0.5: 
        return True
    else: 
        return False
    
def balanced_interleaving(s):
    
    ranking_A = s[0]
    ranking_B = s[1]
    
    #print ("ranking A is",ranking_A)
    #print ("ranking B is",ranking_B)
    
    # Initialize
    I = []
    k_a, k_b = 0,0
        
    A_first = get_A_first() #Find out if A or B goes first
    
    # We assume that rankA and rankB contain 10 unique documents
    # That is why we can cast rankA and rankB to a dict
    # This makes it easier to return a list of length 9, while adhering to the pseudo code from the slides
    
    rankA = {}
    rankB = {}
    
    A = [i for i in range(0,5)]
    B = [i for i in range(5,10)]
    
    for i in A:
        rankA[i] = ranking_A[i]
        
    for j in B:
        rankB[j] = ranking_B[j-5] 
        
    # This code just follows the pseudo code from the slides
    while k_a+1 <= len(ranking_A) and k_b+1 <= len(ranking_B):
        if (k_a < k_b) or ((k_a == k_b) and A_first):
            if A[k_a] not in I:
                I.append(A[k_a])
            k_a += 1
            
        else:
            if B[k_b] not in I:
                I.append(B[k_b])
            k_b += 1
             
    # I is now filled with unique indices, we now have to convert these back to labels
    
    I_ids = I
    
    for i in range(0,len(I)):
        try:
            I[i] = rankA[I[i]]
        except:
            I[i] = rankB[I[i]]
                    
    return I, I_ids, ranking_A, ranking_B

def define_winner(clicks,ranking_A,ranking_B):
            
    clicks_A = 0 # Number of clicks from result A
    clicks_B = 0 # Number or clicks from result B
    
    A = [i for i in range(0,5)]
    B = [i for i in range(5,10)]        
    
    for click in clicks: # Loop over the clicks
        if click in A:
            clicks_A += 1
        elif click in B:
            clicks_B += 1
            
    if clicks_A > clicks_B:
        return "A"
    elif clicks_B > clicks_A:
        return "B"
    else: 
        return "Tie"
            
test_set = combinations[12347]
clicks = [1,6,7]

I, I_ids, rank_A, rank_B = balanced_interleaving(test_set)
#define_winner(clicks,rank_A,rank_B)

 Step 5: Implement User Clicks Simulation (15 points)

In [80]:
#read search query data
import csv
import re

def read_data():

    answers = []
    query_ids = []
    clicks = []
    click = []
    last_type = 'C'

    with open('YandexRelPredChallenge.txt') as f:
        for line in f:
            vals = re.split(r'\t+', line.rstrip())
            line_type = vals[2] #we look at the type of data line
            if line_type == 'Q': #if type is query, we append the query.
                if len(click) > 0: #we append clicks of last query before we go further
                    clicks.append(click)
                    click = []
                answers.append(list(map(int, vals[5:])))
                query_ids.append(int(vals[3]))
            if last_type == 'Q' and line_type == 'Q': #If last type also was query there are no clicks  
                clicks.append([])
            elif line_type == 'C':
                click.append(int(vals[3]))
            last_type = vals[2]
        clicks.append(click) #we shall not forget the last click sequence...
    
    return answers,query_ids,clicks

answers,query_ids,clicks = read_data()
    
'''
print('Some sample answers and clicks:')
print(query_ids[:5])
print(answers[:5])
print(clicks[:5])
print('We have ',len(clicks),' answers/click sequences in total!')
'''

doubles_list = []
for i in range(0,len(query_ids)):
    if query_ids[i] == 0:
        if answers[i] not in doubles_list:
            doubles_list.append(answers[i])

print(doubles_list)
print (len(doubles_list))


[[59, 89, 29, 61, 25, 2, 63, 42, 94, 71], [2, 394, 59, 89, 29, 94, 867, 876, 42, 377], [29, 61, 94, 13, 53884, 3501, 850, 53882, 53885, 53883], [59, 56888, 56872, 293, 56876, 56889, 56875, 56877, 7697, 16552]]
4


In [81]:
from collections import Counter

#Function to determine rho parameter of random click model given set of documents and clicks
def rcm(documents,clicks):
    unique_docs = []
    unique_clicks = []
    
    assert(len(documents) == len(clicks))
    
    #First we find all unique documents to get the count
    for d in documents:
        for e in d:
            unique_docs.append(e)

    number_of_docs = len(Counter(unique_docs).keys())
    
    #Now we determine for the total number of cliks
    for c in clicks:
        for e in c:
            unique_clicks.append(e)
        
    number_of_clicks = len(Counter(unique_clicks).keys())
    
    rho = number_of_clicks/number_of_docs
    
    return rho

In [82]:
#Dynamic Bayesian network model
#first we will look at sigma, as we can derive this by MLE directly
from operator import itemgetter
query_answers = {}
query_clicks = {}
query_last_clicks = {}
query_sigma = {}
for q in np.unique(query_ids)[:10]:
    indices = [i for i, j in enumerate(query_ids) if j == q]
    query_answers[q] = answers[indices[0]]
    query_clicks[q] = list(itemgetter(*indices)(clicks))
    last_clicks = []
    #print(q, query_clicks[q])
    for i in query_clicks[q]:
        if isinstance(i, int):
            last_clicks.append(query_clicks[q][-1])
            break
        elif len(i) > 0:
            last_clicks.append(i[-1])
        else:
            last_clicks.append(0)
    query_last_clicks[q] = last_clicks
    
print (query_answers)
    

{0: [59, 89, 29, 61, 25, 2, 63, 42, 94, 71], 1: [15, 32, 0, 9, 80, 30, 88, 82, 97, 65], 2: [72, 11, 102, 83, 96, 79, 22, 38, 68, 91], 3: [19, 44, 5, 53, 41, 56, 58, 40, 3, 31], 4: [45, 20, 24, 23, 34, 33, 4, 90, 35, 60], 5: [99, 16, 87, 39, 100, 6, 36, 52, 81, 84], 6: [46, 49, 28, 78, 106, 14, 37, 48, 17, 10], 7: [77, 93, 55, 86, 64, 67, 76, 98, 18, 54], 8: [7, 103, 51, 92, 43, 12, 73, 69, 27, 105], 9: [13, 70, 66, 94, 50, 104, 29, 21, 89, 85]}


In [95]:
def main():
    
    # Initialize random click model first
    documents,_,clicks = read_data()
    rho = rcm(documents,clicks)
    
    A_winner = 0
    B_winner = 0
    Tie = 0
        
    for combination in combinations:
        I,I_ids,rank_A,rank_B = balanced_interleaving(combination)
                
        # We ignore order since it is a stochastic process anyway
        
        clicks = []
        
        for i in range(0,len(I)):
            random_variable = np.random.uniform()
            
            if random_variable < rho:
                clicks.append(i)    
                
        winner = define_winner(clicks,rank_A,rank_B)
        
        if winner == "A":
            A_winner += 1
        elif winner == "B":
            B_winner += 1
        else:
            Tie += 1
        
    total = A_winner + B_winner + Tie    
    print ("A wins",100*A_winner/total, "percent of the time")
    print ("B wins",100*B_winner/total, "percent of the time")
    print ("It is a tie",100*Tie/total, "percent of the time")

In [101]:
main()

A wins 36.20208822229024 percent of the time
B wins 26.102778628031153 percent of the time
It is a tie 37.695133149678604 percent of the time
